[GH-ISSUE #9618] Disk space usage #68331

Closed
opened 2026-05-04 13:14:50 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @gty1829 on GitHub (Mar 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9618

Why is it necessary to copy the weights and re-encrypt them with SHA-256 when converting models? This process causes excessive memory usage when I import models locally.

Originally created by @gty1829 on GitHub (Mar 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9618 Why is it necessary to copy the weights and re-encrypt them with SHA-256 when converting models? This process causes excessive memory usage when I import models locally.
GiteaMirror added the question label 2026-05-04 13:14:50 -05:00
Author
Owner

@krenax commented on GitHub (Mar 10, 2025):

Can you provide an example?

<!-- gh-comment-id:2711018451 --> @krenax commented on GitHub (Mar 10, 2025): Can you provide an example?
Author
Owner

@pdevine commented on GitHub (Mar 11, 2025):

@gty1829 they're not "re-encrypted", the sha-256s are used to calculate the checksum of the model. Are you pulling with ollama pull or importing the weights with ollama create?

EDIT: I just saw that you mentioned converting, so I'm guessing you meant with ollama create. Are you converting a GGUF file or are you converting from Safetensors?

<!-- gh-comment-id:2712127089 --> @pdevine commented on GitHub (Mar 11, 2025): @gty1829 they're not "re-encrypted", the sha-256s are used to calculate the checksum of the model. Are you pulling with `ollama pull` or importing the weights with `ollama create`? EDIT: I just saw that you mentioned `converting`, so I'm guessing you meant with `ollama create`. Are you converting a GGUF file or are you converting from Safetensors?
Author
Owner

@gty1829 commented on GitHub (Mar 11, 2025):

I 'm converting from Safetensors. I mean, I downloaded the safetensors of qwen2.5-7B-Instruct-1M, and I ran the command ollama create qwen2.5 -f ./Modelfile. Then I found the files encrypted with SHA-256 under %OLLAMA_MODELS%\blobs, and their size is the same as the storage space occupied by the safetensors. So I guess these are actually the model weights. Why not read the safetensors directly? Thank you.

Image

Image

<!-- gh-comment-id:2712856952 --> @gty1829 commented on GitHub (Mar 11, 2025): I 'm converting from Safetensors. I mean, I downloaded the safetensors of qwen2.5-7B-Instruct-1M, and I ran the command ollama create qwen2.5 -f ./Modelfile. Then I found the files encrypted with SHA-256 under %OLLAMA_MODELS%\blobs, and their size is the same as the storage space occupied by the safetensors. So I guess these are actually the model weights. Why not read the safetensors directly? Thank you. ![Image](https://github.com/user-attachments/assets/038ad874-60f9-427a-8f84-f7cb5cff224e) ![Image](https://github.com/user-attachments/assets/27d6488b-9b02-4814-94d7-5371313f17ab)
Author
Owner

@pdevine commented on GitHub (Mar 11, 2025):

Yes, the weights are cut up into blobs (this will result in more blobs in the future as it will be cut up by tensor instead of one large file). If you're not using the --quantize option with ollama create the weights sizes will be the same size, but the data type will have changed from bfloat16 to fp16 for many of the tensors, so the weights aren't identical.

The reason for cutting the weights into blobs is so that we can deduplicate any weights which are shared between different models. If you change the Modelfile and run ollama create again it will only store one copy of the weights but you'll see two models listed in ollama ls. Hopefully that makes sense. I'll go ahead and close the issue as answered, but feel free to keep commenting.

<!-- gh-comment-id:2712977642 --> @pdevine commented on GitHub (Mar 11, 2025): Yes, the weights are cut up into blobs (this will result in more blobs in the future as it will be cut up by tensor instead of one large file). If you're not using the `--quantize` option with `ollama create` the weights sizes will be the same size, but the data type will have changed from bfloat16 to fp16 for many of the tensors, so the weights aren't identical. The reason for cutting the weights into blobs is so that we can deduplicate any weights which are shared between different models. If you change the Modelfile and run `ollama create` again it will only store one copy of the weights but you'll see two models listed in `ollama ls`. Hopefully that makes sense. I'll go ahead and close the issue as answered, but feel free to keep commenting.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68331