[GH-ISSUE #9854] Ollama Leaves Copied Files when Importing Finetuned Safetensors #52964

Open
opened 2026-04-29 01:29:12 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chigkim on GitHub (Mar 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9854

Originally assigned to: @pdevine on GitHub.

What is the issue?

I imported a finetuned gemma-3-27b from Safetensors.
ollama create gemma-3-27b-finetuned-q8_0 --quantize q8_0 -f gemma3.modelfile
It looks like it copied all the original Safetensors weights into models/blobs. Then it produced FP16 gguf. Finally it produced quantized GGUF. I ended up having >100GB stuff in the blobs folder that I don't need anymore.

  • Original Safetensors weights
  • FP16 GGUF
  • Quantized GGUF
    Ollama should delete all the stuff except the quantized model.
    Thanks!

Relevant log output

N/A No error.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

v0.6.2-rc0

Originally created by @chigkim on GitHub (Mar 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9854 Originally assigned to: @pdevine on GitHub. ### What is the issue? I imported a finetuned gemma-3-27b from Safetensors. `ollama create gemma-3-27b-finetuned-q8_0 --quantize q8_0 -f gemma3.modelfile` It looks like it copied all the original Safetensors weights into models/blobs. Then it produced FP16 gguf. Finally it produced quantized GGUF. I ended up having >100GB stuff in the blobs folder that I don't need anymore. * Original Safetensors weights * FP16 GGUF * Quantized GGUF Ollama should delete all the stuff except the quantized model. Thanks! ### Relevant log output ```shell N/A No error. ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version v0.6.2-rc0
GiteaMirror added the bug label 2026-04-29 01:29:12 -05:00
Author
Owner

@pdevine commented on GitHub (Mar 18, 2025):

Ollama should have cleaned up the fp16 blob after it did the conversion. It will never clean up the original safetensors weights (you're on your own for that). If you restart the server Ollama will clean up any unused blobs (there's a mark+sweep algorithm which checks if there are any dangling blobs and removes them automatically).

There were some changes to the create endpoint recently, so I'll check and make sure that's working correctly. In the mean time, just restart the ollama server.

<!-- gh-comment-id:2734345227 --> @pdevine commented on GitHub (Mar 18, 2025): Ollama _should_ have cleaned up the fp16 blob after it did the conversion. It will _never_ clean up the original safetensors weights (you're on your own for that). If you restart the server Ollama will clean up any unused blobs (there's a mark+sweep algorithm which checks if there are any dangling blobs and removes them automatically). There were some changes to the `create` endpoint recently, so I'll check and make sure that's working correctly. In the mean time, just restart the ollama server.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52964