[GH-ISSUE #14136] Improve ollama pull to handle large models on slow connections #71281

Open
opened 2026-05-05 01:06:32 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @extreme4all on GitHub (Feb 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14136

Ollama Pull: Improving Downloads for Slow Connections

Transparency: co-created with AI

Currently, ollama pull splits blobs into a fixed number of parts (16 by default) with a minimum/maximum part size of 100 MB–1 GB. This can cause problems for users with slow or unstable internet connections, where downloading a single large part (e.g., 1 GB+) may stall or fail, showing little to no progress. i suggest also changing the defaults to 1MB/100MB minimum/maximum part size.

Proposed Solution

Introduce user-configurable flags for download tuning:

  • --parallelism <n> – maximum number of concurrent part downloads
  • --partial-max-size <size> – maximum size of each part
  • --partial-min-size <size> (optional) – minimum size of each part

Behavior

  • Number of parts is computed dynamically based on blob size and the user-specified partial-max-size.
  • Chunks are evenly sized within the min/max constraints.
  • Users with good internet can increase --partial-max-size for fewer, larger chunks.
  • Users with poor internet can lower --partial-max-size to ensure smaller, resumable chunks.
  • --partial-min-size can optionally be omitted to simplify usage.

Example Implementation in Go

https://github.com/ollama/ollama/blob/main/server/download.go

// https://github.com/ollama/ollama/blob/main/server/download.go#L128
func (b *blobDownload) Prepare(ctx context.Context, requestURL *url.URL, opts *registryOptions) error {
	partFilePaths, err := filepath.Glob(b.Name + "-partial-*")
	if err != nil {
		return err
	}

	b.done = make(chan struct{})

	// TODO: validate if min / max size have changed, if they have than discard existing parts
	for _, partFilePath := range partFilePaths {
		part, err := b.readPart(partFilePath)
		if err != nil {
			return err
		}

		b.Total += part.Size
		b.Completed.Add(part.Completed.Load())
		b.Parts = append(b.Parts, part)
	}
	// https://github.com/ollama/ollama/blob/main/server/download.go#L147
	if len(b.Parts) == 0 {
		resp, err := makeRequestWithRetry(ctx, http.MethodHead, requestURL, nil, nil, opts)
		if err != nil {
			return err
		}
		defer resp.Body.Close()

		b.Total, _ = strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)

		// Compute number of parts dynamically based on user-specified max size
		maxPartSize := opts.PartialMaxSize // TODO: e.g., 100 * format.MegaByte
		minPartSize := opts.PartialMinSize // TODO: e.g., 100 * format.MegaByte
		numParts := int(math.Ceil(float64(blobSize) / float64(maxPartSize)))

		// Compute even part size
		partSize := blobSize / int64(numParts)
		switch {
			case partSize > maxPartSize:
				partSize = maxPartSize
			case minPartSize > 0 && partSize < minPartSize
				partSize = opts.PartialMinSize
		}

		// Generate parts
		var offset int64
		for offset < blobSize {
			size := partSize
			if offset+size > blobSize {
				size = blobSize - offset
			}

			if err := b.newPart(offset, size); err != nil {
				return err
			}
			offset += size
		}
	}
	if len(b.Parts) > 0 {
		slog.Info(fmt.Sprintf("downloading %s in %d %s part(s)", b.Digest[7:19], len(b.Parts), format.HumanBytes(b.Parts[0].Size)))
	}

	return nil
}
// Set Concurrency
// https://github.com/ollama/ollama/blob/main/server/download.go#L331
func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *registryOptions) error {
	// code
	g.SetLimit(opts.Parallelism) // user can tune parallelism
}

Benefits

  • Resumable downloads with smaller chunks for unreliable networks
  • Customizable for different connection speeds
  • Progress is visible more consistently for large blobs
Originally created by @extreme4all on GitHub (Feb 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14136 # Ollama Pull: Improving Downloads for Slow Connections **Transparency:** co-created with AI Currently, `ollama pull` splits blobs into a fixed number of parts (16 by default) with a minimum/maximum part size of 100 MB–1 GB. This can cause problems for users with **slow or unstable internet connections**, where downloading a single large part (e.g., 1 GB+) may stall or fail, showing little to no progress. i suggest also changing the defaults to 1MB/100MB minimum/maximum part size. ## Proposed Solution Introduce user-configurable flags for download tuning: * `--parallelism <n>` – maximum number of concurrent part downloads * `--partial-max-size <size>` – maximum size of each part * `--partial-min-size <size>` *(optional)* – minimum size of each part ### Behavior * Number of parts is **computed dynamically** based on blob size and the user-specified `partial-max-size`. * Chunks are **evenly sized** within the min/max constraints. * Users with **good internet** can increase `--partial-max-size` for fewer, larger chunks. * Users with **poor internet** can lower `--partial-max-size` to ensure smaller, resumable chunks. * `--partial-min-size` can optionally be omitted to simplify usage. --- ## Example Implementation in Go https://github.com/ollama/ollama/blob/main/server/download.go ```go // https://github.com/ollama/ollama/blob/main/server/download.go#L128 func (b *blobDownload) Prepare(ctx context.Context, requestURL *url.URL, opts *registryOptions) error { partFilePaths, err := filepath.Glob(b.Name + "-partial-*") if err != nil { return err } b.done = make(chan struct{}) // TODO: validate if min / max size have changed, if they have than discard existing parts for _, partFilePath := range partFilePaths { part, err := b.readPart(partFilePath) if err != nil { return err } b.Total += part.Size b.Completed.Add(part.Completed.Load()) b.Parts = append(b.Parts, part) } // https://github.com/ollama/ollama/blob/main/server/download.go#L147 if len(b.Parts) == 0 { resp, err := makeRequestWithRetry(ctx, http.MethodHead, requestURL, nil, nil, opts) if err != nil { return err } defer resp.Body.Close() b.Total, _ = strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64) // Compute number of parts dynamically based on user-specified max size maxPartSize := opts.PartialMaxSize // TODO: e.g., 100 * format.MegaByte minPartSize := opts.PartialMinSize // TODO: e.g., 100 * format.MegaByte numParts := int(math.Ceil(float64(blobSize) / float64(maxPartSize))) // Compute even part size partSize := blobSize / int64(numParts) switch { case partSize > maxPartSize: partSize = maxPartSize case minPartSize > 0 && partSize < minPartSize partSize = opts.PartialMinSize } // Generate parts var offset int64 for offset < blobSize { size := partSize if offset+size > blobSize { size = blobSize - offset } if err := b.newPart(offset, size); err != nil { return err } offset += size } } if len(b.Parts) > 0 { slog.Info(fmt.Sprintf("downloading %s in %d %s part(s)", b.Digest[7:19], len(b.Parts), format.HumanBytes(b.Parts[0].Size))) } return nil } // Set Concurrency // https://github.com/ollama/ollama/blob/main/server/download.go#L331 func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *registryOptions) error { // code g.SetLimit(opts.Parallelism) // user can tune parallelism } ``` --- ## Benefits * Resumable downloads with smaller chunks for unreliable networks * Customizable for different connection speeds * Progress is visible more consistently for large blobs
GiteaMirror added the feature request label 2026-05-05 01:06:32 -05:00
Author
Owner

@extreme4all commented on GitHub (Feb 7, 2026):

for anyone actually running into this issue, you can download the model partial yourself, you need to be in the models/blobs directory fyi if you run ollama in a container you should mount this as a volume.

you can get the manifest via

  • curl https://registry.ollama.ai/v2/library/<model_name>/manifests/<tag> | jq ., find the digest of the large files and replace them
  • cd models/blobs
  • curl -L -C - -o <digest> https://registry.ollama.ai/v2/library/<model_name>/blobs/<digest>
<!-- gh-comment-id:3864500012 --> @extreme4all commented on GitHub (Feb 7, 2026): for anyone actually running into this issue, you can download the model partial yourself, you need to be in the `models/blobs` directory fyi if you run ollama in a container you should mount this as a volume. you can get the manifest via - `curl https://registry.ollama.ai/v2/library/<model_name>/manifests/<tag> | jq .`, find the digest of the large files and replace them - `cd models/blobs` - `curl -L -C - -o <digest> https://registry.ollama.ai/v2/library/<model_name>/blobs/<digest>`
Author
Owner

@extreme4all commented on GitHub (Feb 7, 2026):

Many users have reported related issues:

  • #10050 Slow downloads and stalled parts
  • #3786 Timeout errors on slow connections
  • #3162 Requests to disable max retries on slow networks
  • #1736 Download slows near completion
  • #8530 Pull hangs or fails midway
  • #6211 Max retries exceeded on slow ADSL connections

an additional feature may also be just having a --max-retries if the user sets 0 or -1 it will just keep retrying, but i would give warnings in stdout if it retries over x time and suggest for example lowering the --partial-max-size

<!-- gh-comment-id:3864510660 --> @extreme4all commented on GitHub (Feb 7, 2026): Many users have reported related issues: - #10050 Slow downloads and stalled parts - #3786 Timeout errors on slow connections - #3162 Requests to disable max retries on slow networks - #1736 Download slows near completion - #8530 Pull hangs or fails midway - #6211 Max retries exceeded on slow ADSL connections an additional feature may also be just having a `--max-retries` if the user sets 0 or -1 it will just keep retrying, but i would give warnings in stdout if it retries over x time and suggest for example lowering the `--partial-max-size`
Author
Owner

@rick-github commented on GitHub (Feb 7, 2026):

Or use the existing experimental client with OLLAMA_EXPERIMENT=client2 and OLLAMA_REGISTRY_MAXSTREAMS=1.

<!-- gh-comment-id:3864720410 --> @rick-github commented on GitHub (Feb 7, 2026): Or use the existing experimental client with OLLAMA_EXPERIMENT=client2 and OLLAMA_REGISTRY_MAXSTREAMS=1.
Author
Owner

@extreme4all commented on GitHub (Feb 7, 2026):

Or use the existing experimental client with OLLAMA_EXPERIMENT=client2 and OLLAMA_REGISTRY_MAXSTREAMS=1.

i think its part of the problem, for my its my crappy internet (very bad wifi), so lots of packet loss and times with 0 connectivity, i noticed that in many cases its hard to get 1GB file downloaded, hence why smaller configurable partial downloads may be usefull.

it seems to me the change with the experimental client mostly addresses some connection stuf where a healthy connection is not retried if it doesn't send any data

<!-- gh-comment-id:3865563152 --> @extreme4all commented on GitHub (Feb 7, 2026): > Or use the existing experimental client with OLLAMA_EXPERIMENT=client2 and OLLAMA_REGISTRY_MAXSTREAMS=1. i think its part of the problem, for my its my crappy internet (very bad wifi), so lots of packet loss and times with 0 connectivity, i noticed that in many cases its hard to get 1GB file downloaded, hence why smaller configurable partial downloads may be usefull. it seems to me the change with the experimental client mostly addresses some connection stuf where a healthy connection is not retried if it doesn't send any data
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71281