[GH-ISSUE #15651] create: blobs can still be GC-pruned during long-running create operations after the 1h grace window #56500

Open
opened 2026-04-29 10:55:51 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @shaun0927 on GitHub (Apr 17, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15651

What is the issue?

PR #15628 reduces the create/GC race by skipping pruning for blobs newer than 1 hour, but the failure mode still exists for create/import flows that run longer than that window.

Right now the protection is still wall-clock based:

  • server/images.go sets layerPruneGracePeriod = time.Hour
  • PruneLayers() skips blobs only while time.Since(info.ModTime()) < layerPruneGracePeriod
  • manifest/layer.go refreshes mtimes when layers are created or reused, but there is no durable in-progress signal while a long-running create is still active

That means a create flow can still lose pending blobs if:

  1. blobs have already been written,
  2. the manifest has not been written yet, and
  3. another Ollama process runs GC after those blob mtimes age past the 1 hour grace window.

I validated the mechanism locally against current main (57653b8e42d69ec35f68a59857bad4d0f07994a3) by:

  1. creating a blob with manifest.NewLayer(...)
  2. aging the blob mtime beyond the grace window
  3. calling PruneLayers()
  4. confirming the blob is deleted
  5. confirming a later manifest.NewLayerFromLayer(...) fails because the blob is gone

Expected behavior: blobs that are still part of an active create/import should not become GC-eligible solely because more than 1 hour elapsed before the manifest was written.

Impact: long-running create/import/quantize flows can still fail late and lose work.

Relevant log output

=== RUN   TestValidation_LongCreateBlobCanBePrunedAfterGraceWindow
time=2026-04-18T02:08:44.323+09:00 level=INFO msg="total blobs: 1"
time=2026-04-18T02:08:44.323+09:00 level=INFO msg="total unused blobs removed: 1"
--- PASS: TestValidation_LongCreateBlobCanBePrunedAfterGraceWindow (0.00s)

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

main @ 57653b8e42d69ec35f68a59857bad4d0f07994a3 (v0.21.0 tag points at the same commit)

Originally created by @shaun0927 on GitHub (Apr 17, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15651 ### What is the issue? PR #15628 reduces the create/GC race by skipping pruning for blobs newer than 1 hour, but the failure mode still exists for create/import flows that run longer than that window. Right now the protection is still wall-clock based: - `server/images.go` sets `layerPruneGracePeriod = time.Hour` - `PruneLayers()` skips blobs only while `time.Since(info.ModTime()) < layerPruneGracePeriod` - `manifest/layer.go` refreshes mtimes when layers are created or reused, but there is no durable in-progress signal while a long-running create is still active That means a create flow can still lose pending blobs if: 1. blobs have already been written, 2. the manifest has not been written yet, and 3. another Ollama process runs GC after those blob mtimes age past the 1 hour grace window. I validated the mechanism locally against current `main` (`57653b8e42d69ec35f68a59857bad4d0f07994a3`) by: 1. creating a blob with `manifest.NewLayer(...)` 2. aging the blob mtime beyond the grace window 3. calling `PruneLayers()` 4. confirming the blob is deleted 5. confirming a later `manifest.NewLayerFromLayer(...)` fails because the blob is gone Expected behavior: blobs that are still part of an active create/import should not become GC-eligible solely because more than 1 hour elapsed before the manifest was written. Impact: long-running create/import/quantize flows can still fail late and lose work. ### Relevant log output ```shell === RUN TestValidation_LongCreateBlobCanBePrunedAfterGraceWindow time=2026-04-18T02:08:44.323+09:00 level=INFO msg="total blobs: 1" time=2026-04-18T02:08:44.323+09:00 level=INFO msg="total unused blobs removed: 1" --- PASS: TestValidation_LongCreateBlobCanBePrunedAfterGraceWindow (0.00s) ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version main @ `57653b8e42d69ec35f68a59857bad4d0f07994a3` (`v0.21.0` tag points at the same commit)
Author
Owner

@rick-github commented on GitHub (Apr 17, 2026):

A workaround is to set OLLAMA_NOPRUNE=1 in all but one of the servers accessing the model directory.

<!-- gh-comment-id:4270309632 --> @rick-github commented on GitHub (Apr 17, 2026): A workaround is to set `OLLAMA_NOPRUNE=1` in all but one of the servers accessing the model directory.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56500