[PR #15219] server: add download bandwidth rate limiting via OLLAMA_MAX_DOWNLOAD_SPEED #25625

Open
opened 2026-04-19 18:19:07 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15219
Author: @dhirajlochib
Created: 4/2/2026
Status: 🔄 Open

Base: mainHead: server/rate-limit-download-speed


📝 Commits (2)

  • 71baa05 server: add download bandwidth rate limiting via OLLAMA_MAX_DOWNLOAD_SPEED
  • 388778e server: suppress containedctx lint for rate limited reader

📊 Changes

6 files changed (+277 additions, -1 deletions)

View changed files

📝 envconfig/config.go (+1 -0)
📝 go.mod (+1 -0)
📝 go.sum (+2 -0)
📝 server/download.go (+2 -1)
server/ratelimit.go (+128 -0)
server/ratelimit_test.go (+143 -0)

📄 Description

Summary

Adds support for limiting download bandwidth when pulling models via a new OLLAMA_MAX_DOWNLOAD_SPEED environment variable. This addresses the long-standing request from users on shared or metered networks who need to cap Ollama's download speed.

Closes #2006

Changes

  • server/ratelimit.go: New file implementing a shared rate.Limiter (token bucket) that wraps io.Reader to throttle all concurrent download chunks at a single aggregate bandwidth limit.
  • server/ratelimit_test.go: Tests for speed string parsing (parseDownloadSpeed), nil-limiter passthrough, rate enforcement, context cancellation, and EOF preservation.
  • server/download.go: Wraps resp.Body in downloadChunk() with the rate-limited reader so all 16 concurrent download parts share the same bandwidth budget.
  • envconfig/config.go: Registers OLLAMA_MAX_DOWNLOAD_SPEED in AsMap() so it appears in ollama show and documentation.
  • go.mod/go.sum: Adds golang.org/x/time dependency for the token-bucket rate limiter.

Usage

# Limit downloads to 10 MB/s
OLLAMA_MAX_DOWNLOAD_SPEED=10m ollama pull llama3.2

# Limit to 500 KB/s
OLLAMA_MAX_DOWNLOAD_SPEED=500k ollama pull llama3.2

# Limit to 1 GB/s
OLLAMA_MAX_DOWNLOAD_SPEED=1g ollama pull llama3.2

Accepted formats: plain bytes (1048576), with suffix (10m, 100k, 1g), or with full suffix (10mb, 100kb/s). Case-insensitive.

Design

  • Uses a single shared rate.Limiter across all 16 concurrent download chunks, ensuring the aggregate bandwidth stays within the limit regardless of parallelism.
  • The limiter uses a token-bucket algorithm (golang.org/x/time/rate) with a burst size capped at 512 KB for smooth throughput.
  • When OLLAMA_MAX_DOWNLOAD_SPEED is unset or 0, no limiter is created and downloads run at full speed (zero overhead).

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15219 **Author:** [@dhirajlochib](https://github.com/dhirajlochib) **Created:** 4/2/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `server/rate-limit-download-speed` --- ### 📝 Commits (2) - [`71baa05`](https://github.com/ollama/ollama/commit/71baa0524b14dae10e649ce6b84f91827d0f9512) server: add download bandwidth rate limiting via OLLAMA_MAX_DOWNLOAD_SPEED - [`388778e`](https://github.com/ollama/ollama/commit/388778ebdf3e66e15ff7319011053e012a577e32) server: suppress containedctx lint for rate limited reader ### 📊 Changes **6 files changed** (+277 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+1 -0) 📝 `go.mod` (+1 -0) 📝 `go.sum` (+2 -0) 📝 `server/download.go` (+2 -1) ➕ `server/ratelimit.go` (+128 -0) ➕ `server/ratelimit_test.go` (+143 -0) </details> ### 📄 Description ## Summary Adds support for limiting download bandwidth when pulling models via a new `OLLAMA_MAX_DOWNLOAD_SPEED` environment variable. This addresses the long-standing request from users on shared or metered networks who need to cap Ollama's download speed. Closes #2006 ## Changes - **`server/ratelimit.go`**: New file implementing a shared `rate.Limiter` (token bucket) that wraps `io.Reader` to throttle all concurrent download chunks at a single aggregate bandwidth limit. - **`server/ratelimit_test.go`**: Tests for speed string parsing (`parseDownloadSpeed`), nil-limiter passthrough, rate enforcement, context cancellation, and EOF preservation. - **`server/download.go`**: Wraps `resp.Body` in `downloadChunk()` with the rate-limited reader so all 16 concurrent download parts share the same bandwidth budget. - **`envconfig/config.go`**: Registers `OLLAMA_MAX_DOWNLOAD_SPEED` in `AsMap()` so it appears in `ollama show` and documentation. - **`go.mod`/`go.sum`**: Adds `golang.org/x/time` dependency for the token-bucket rate limiter. ## Usage ```bash # Limit downloads to 10 MB/s OLLAMA_MAX_DOWNLOAD_SPEED=10m ollama pull llama3.2 # Limit to 500 KB/s OLLAMA_MAX_DOWNLOAD_SPEED=500k ollama pull llama3.2 # Limit to 1 GB/s OLLAMA_MAX_DOWNLOAD_SPEED=1g ollama pull llama3.2 ``` Accepted formats: plain bytes (`1048576`), with suffix (`10m`, `100k`, `1g`), or with full suffix (`10mb`, `100kb/s`). Case-insensitive. ## Design - Uses a **single shared** `rate.Limiter` across all 16 concurrent download chunks, ensuring the aggregate bandwidth stays within the limit regardless of parallelism. - The limiter uses a token-bucket algorithm (`golang.org/x/time/rate`) with a burst size capped at 512 KB for smooth throughput. - When `OLLAMA_MAX_DOWNLOAD_SPEED` is unset or `0`, no limiter is created and downloads run at full speed (zero overhead). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:19:07 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25625