[PR #15489] server: verify blob downloads with an inline sha256 hasher #46420

Open
opened 2026-04-25 01:51:31 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15489
Author: @mverrilli
Created: 4/10/2026
Status: 🔄 Open

Base: mainHead: fix/inline-hash-blob-downloads


📝 Commits (1)

  • 50d1006 server: verify blob downloads with an inline sha256 hasher

📊 Changes

3 files changed (+143 additions, -13 deletions)

View changed files

📝 envconfig/config.go (+8 -0)
📝 server/download.go (+128 -11)
📝 server/images.go (+7 -2)

📄 Description

Summary

Adds an opt-in OLLAMA_VERIFY_INLINE_HASH=1 mode that computes a blob's sha256 via io.TeeReader on the HTTP response body as it streams into the file, and compares the resulting digest to the expected one before renaming the -partial file. When the env var is set, the download path never reads the file back from disk to verify.

The default code path is unchanged.

Motivation

On one machine I see a reproducible Error: digest mismatch, file must be downloaded again: want sha256:AAA..., got sha256:BBB... on ollama run gemma4:31b (19.8 GB), where BBB is a different wrong hash on every retry. Retries don't converge, and neither does ollama pull.

I traced the failure to the post-download verifyBlob() call in server/images.go, which re-opens the blob file and hashes it via io.Copy(sha256.New(), f). While debugging I confirmed, separately:

  • The HTTP response body on the wire delivers the correct bytes (an inline sha256.New() tee'd off resp.Body during download matches the expected digest).
  • curl piped to sha256sum of the same URL produces the expected digest.
  • The blob on disk is at least functional: ollama run gemma4:31b "hi" loads and generates a coherent response, so llama.cpp's mmap-based tensor access reads enough of the file correctly to run inference.

But sha256sum file, io.Copy(sha256.New(), f), and several other read-based hashing approaches against the exact same file return different wrong hashes on the same system. I wasn't able to fully isolate where the read path goes wrong — it's somewhere below ollama and I ran out of ladder. Since I can't fix the underlying cause from here, this PR provides a way to sidestep the re-read entirely for users who hit the same symptom.

The "digest mismatch, got: different on every retry" pattern shows up repeatedly in #941, #8105, #13775, #14554, #11831, #3931, #3326, and several of those reporters also mention that memtest86+ comes up clean and retries don't help. I can't say whether their root cause is the same as mine, and this PR does not claim to fix any of those issues — it just gives affected users a workaround to try.

Changes

envconfig/config.go — adds VerifyInlineHash = Bool("OLLAMA_VERIFY_INLINE_HASH") and a corresponding entry in the EnvVar description map.

server/download.go:

  • Prepare() branches on envconfig.VerifyInlineHash(). The default branch is the existing multi-part code, untouched. The new prepareInline() method wipes any pre-existing -partial* files, initializes b.inlineHash = sha256.New(), and creates a single part covering the entire blob (required because SHA256 state can't be shared across concurrent writers).
  • downloadChunk() now rejects any response whose status is not 206 Partial Content or 200 OK, unconditionally — this applies to both code paths and catches cases like expired CDN signed URLs returning 403 or gateway error pages that would otherwise be silently written into the blob file.
  • When b.inlineHash != nil, downloadChunk() wraps the response body in io.TeeReader(resp.Body, b.inlineHash) so the hasher receives exactly the bytes being written to the file.
  • When b.inlineHash != nil, run() disables the retry loop (maxTries = 1) because a mid-stream retry can't cleanly resume an in-progress SHA256 state, and compares the inline digest to the expected one before renaming; on mismatch or error it cleans up all partial files.
  • downloadBlob() now returns (cacheHit bool, inlineVerified bool, err error) so callers can tell whether the post-download verify pass is still needed.

server/images.goPullModel() still runs the verifyBlob() loop by default and skips it for blobs that came back with inlineVerified=true. Cached blobs continue to skip verification as before.

Trade-offs (opt-in mode only)

These apply only when OLLAMA_VERIFY_INLINE_HASH=1 is set:

  • No multi-part parallelism. Downloads run as a single sequential stream.
  • No cross-invocation resume. Prepare wipes any pre-existing -partial* files because the inline hasher can't incorporate pre-existing bytes.
  • No mid-stream retries. A network error fails the whole download; the caller has to re-invoke ollama pull.

Users unaffected by whatever is breaking the re-read don't need to set the env var and see none of these trade-offs.

Testing

  • All 388 existing tests in the server package pass (go test ./server/ -count=1).
  • go build ./..., gofmt -l, and go vet ./server/ clean.
  • End-to-end verified on the affected machine: with OLLAMA_VERIFY_INLINE_HASH=1, POST /api/pull for gemma4:31b completes in ~4m40s, the partial file is promoted to its final blob path, the manifest is written, the normal v0.20.2 daemon sees gemma4:31b in ollama list as soon as it's restarted, and ollama run gemma4:31b "hi" returns a coherent response. Without the env var, the same pull on the same machine fails at the verify step every time with a different wrong got: hash.

Relationship to #15028

#15028 adds pre-rename verification via a new verifyBlobFile() helper, which opens the file and runs GetSHA256Digest(f) — the same buffered read() path as the existing verifyBlob(). On systems where that read path is the source of the mismatch, #15028 would catch the problem one step earlier but would still fail at the same underlying operation. Its HTTP status code check and partial cleanup on mismatch are both independently valuable, and this PR incorporates the status code check. The inline-hash opt-in is orthogonal to both and could land alongside either.

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15489 **Author:** [@mverrilli](https://github.com/mverrilli) **Created:** 4/10/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/inline-hash-blob-downloads` --- ### 📝 Commits (1) - [`50d1006`](https://github.com/ollama/ollama/commit/50d1006c0e5ebb17c1972fdfdc125845320bd08d) server: verify blob downloads with an inline sha256 hasher ### 📊 Changes **3 files changed** (+143 additions, -13 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+8 -0) 📝 `server/download.go` (+128 -11) 📝 `server/images.go` (+7 -2) </details> ### 📄 Description ## Summary Adds an opt-in `OLLAMA_VERIFY_INLINE_HASH=1` mode that computes a blob's sha256 via `io.TeeReader` on the HTTP response body as it streams into the file, and compares the resulting digest to the expected one before renaming the `-partial` file. When the env var is set, the download path never reads the file back from disk to verify. The default code path is unchanged. ## Motivation On one machine I see a reproducible `Error: digest mismatch, file must be downloaded again: want sha256:AAA..., got sha256:BBB...` on `ollama run gemma4:31b` (19.8 GB), where `BBB` is a different wrong hash on every retry. Retries don't converge, and neither does `ollama pull`. I traced the failure to the post-download `verifyBlob()` call in `server/images.go`, which re-opens the blob file and hashes it via `io.Copy(sha256.New(), f)`. While debugging I confirmed, separately: - The HTTP response body on the wire delivers the correct bytes (an inline `sha256.New()` tee'd off `resp.Body` during download matches the expected digest). - `curl` piped to `sha256sum` of the same URL produces the expected digest. - The blob on disk is at least functional: `ollama run gemma4:31b "hi"` loads and generates a coherent response, so llama.cpp's mmap-based tensor access reads enough of the file correctly to run inference. But `sha256sum file`, `io.Copy(sha256.New(), f)`, and several other read-based hashing approaches against the exact same file return different wrong hashes on the same system. I wasn't able to fully isolate where the read path goes wrong — it's somewhere below ollama and I ran out of ladder. Since I can't fix the underlying cause from here, this PR provides a way to sidestep the re-read entirely for users who hit the same symptom. The "digest mismatch, `got:` different on every retry" pattern shows up repeatedly in #941, #8105, #13775, #14554, #11831, #3931, #3326, and several of those reporters also mention that `memtest86+` comes up clean and retries don't help. I can't say whether their root cause is the same as mine, and this PR does not claim to fix any of those issues — it just gives affected users a workaround to try. ## Changes **`envconfig/config.go`** — adds `VerifyInlineHash = Bool("OLLAMA_VERIFY_INLINE_HASH")` and a corresponding entry in the `EnvVar` description map. **`server/download.go`**: - `Prepare()` branches on `envconfig.VerifyInlineHash()`. The default branch is the existing multi-part code, untouched. The new `prepareInline()` method wipes any pre-existing `-partial*` files, initializes `b.inlineHash = sha256.New()`, and creates a single part covering the entire blob (required because SHA256 state can't be shared across concurrent writers). - `downloadChunk()` now rejects any response whose status is not `206 Partial Content` or `200 OK`, unconditionally — this applies to both code paths and catches cases like expired CDN signed URLs returning `403` or gateway error pages that would otherwise be silently written into the blob file. - When `b.inlineHash != nil`, `downloadChunk()` wraps the response body in `io.TeeReader(resp.Body, b.inlineHash)` so the hasher receives exactly the bytes being written to the file. - When `b.inlineHash != nil`, `run()` disables the retry loop (`maxTries = 1`) because a mid-stream retry can't cleanly resume an in-progress SHA256 state, and compares the inline digest to the expected one before renaming; on mismatch or error it cleans up all partial files. - `downloadBlob()` now returns `(cacheHit bool, inlineVerified bool, err error)` so callers can tell whether the post-download verify pass is still needed. **`server/images.go`** — `PullModel()` still runs the `verifyBlob()` loop by default and skips it for blobs that came back with `inlineVerified=true`. Cached blobs continue to skip verification as before. ## Trade-offs (opt-in mode only) These apply only when `OLLAMA_VERIFY_INLINE_HASH=1` is set: - **No multi-part parallelism.** Downloads run as a single sequential stream. - **No cross-invocation resume.** `Prepare` wipes any pre-existing `-partial*` files because the inline hasher can't incorporate pre-existing bytes. - **No mid-stream retries.** A network error fails the whole download; the caller has to re-invoke `ollama pull`. Users unaffected by whatever is breaking the re-read don't need to set the env var and see none of these trade-offs. ## Testing - All 388 existing tests in the `server` package pass (`go test ./server/ -count=1`). - `go build ./...`, `gofmt -l`, and `go vet ./server/` clean. - End-to-end verified on the affected machine: with `OLLAMA_VERIFY_INLINE_HASH=1`, `POST /api/pull` for `gemma4:31b` completes in ~4m40s, the partial file is promoted to its final blob path, the manifest is written, the normal v0.20.2 daemon sees `gemma4:31b` in `ollama list` as soon as it's restarted, and `ollama run gemma4:31b "hi"` returns a coherent response. Without the env var, the same pull on the same machine fails at the verify step every time with a different wrong `got:` hash. ## Relationship to #15028 #15028 adds pre-rename verification via a new `verifyBlobFile()` helper, which opens the file and runs `GetSHA256Digest(f)` — the same buffered `read()` path as the existing `verifyBlob()`. On systems where that read path is the source of the mismatch, #15028 would catch the problem one step earlier but would still fail at the same underlying operation. Its HTTP status code check and partial cleanup on mismatch are both independently valuable, and this PR incorporates the status code check. The inline-hash opt-in is orthogonal to both and could land alongside either. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:51:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46420