[GH-ISSUE #6628] no space left on device - ubuntu #66208

Closed
opened 2026-05-04 00:49:17 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @fahadshery on GitHub (Sep 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6628

What is the issue?

I am getting the following error:

root@ollama:/home/user/hugging-face-models# ollama create Lily-Cybersecurity-7B-v0.2
transferring model data 100%
converting model
Error: write /usr/share/ollama/.ollama/models/blobs/3649787503/fp164117961266: no space left on device

but I do have space?

root@ollama:/home/user/hugging-face-models# df -h
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                              4.8G  2.6M  4.7G   1% /run
efivarfs                           256K  123K  129K  49% /sys/firmware/efi/efivars
/dev/mapper/ubuntu--vg-ubuntu--lv  274G  199G   64G  76% /
tmpfs                               30G     0   30G   0% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/sda2                          2.0G  183M  1.7G  10% /boot
/dev/sda1                          1.1G  6.2M  1.1G   1% /boot/efi
overlay                            274G  199G   64G  76% /var/lib/docker/overlay2/dda534be667e34c729e88f904317afb8564a19c9b619e0c0ec6381127d996900/merged

how do I resolve it?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.6

Originally created by @fahadshery on GitHub (Sep 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6628 ### What is the issue? I am getting the following error: ``` root@ollama:/home/user/hugging-face-models# ollama create Lily-Cybersecurity-7B-v0.2 transferring model data 100% converting model Error: write /usr/share/ollama/.ollama/models/blobs/3649787503/fp164117961266: no space left on device ``` but I do have space? ``` root@ollama:/home/user/hugging-face-models# df -h Filesystem Size Used Avail Use% Mounted on tmpfs 4.8G 2.6M 4.7G 1% /run efivarfs 256K 123K 129K 49% /sys/firmware/efi/efivars /dev/mapper/ubuntu--vg-ubuntu--lv 274G 199G 64G 76% / tmpfs 30G 0 30G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda2 2.0G 183M 1.7G 10% /boot /dev/sda1 1.1G 6.2M 1.1G 1% /boot/efi overlay 274G 199G 64G 76% /var/lib/docker/overlay2/dda534be667e34c729e88f904317afb8564a19c9b619e0c0ec6381127d996900/merged ``` how do I resolve it? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.6
GiteaMirror added the bug label 2026-05-04 00:49:17 -05:00
Author
Owner

@fenggaobj commented on GitHub (Sep 4, 2024):

You may humbly specify a larger storage space using the OLLAMA_MODELS environment variable instead of its default storage location, for instance:export OLLAMA_MODELS=/workspace/ollama
Or alternatively, you could humbly add the environment variable directly to the ollama.service file.

<!-- gh-comment-id:2328418267 --> @fenggaobj commented on GitHub (Sep 4, 2024): You may humbly specify a larger storage space using the OLLAMA_MODELS environment variable instead of its default storage location, for instance:export OLLAMA_MODELS=/workspace/ollama Or alternatively, you could humbly add the environment variable directly to the ollama.service file.
Author
Owner

@fahadshery commented on GitHub (Sep 4, 2024):

You may humbly specify a larger storage space using the OLLAMA_MODELS environment variable instead of its default storage location, for instance:export OLLAMA_MODELS=/workspace/ollama Or alternatively, you could humbly add the environment variable directly to the ollama.service file.

thanks for this. I was checking out the ollama.service file from this FAQ but it's not clear on how to change the model's directory?
I have changed the service file:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
OLLAMA_MODELS=/home/fahadshery/hugging-face-models

and then restarted the service but it's still showing the old location for storing models.

What I don't understand is that /usr should be part of / which shows more than 50G available storage and it still runs of our storage?
Do I need to increase tmp location as well for runners?

<!-- gh-comment-id:2328767112 --> @fahadshery commented on GitHub (Sep 4, 2024): > You may humbly specify a larger storage space using the OLLAMA_MODELS environment variable instead of its default storage location, for instance:export OLLAMA_MODELS=/workspace/ollama Or alternatively, you could humbly add the environment variable directly to the ollama.service file. thanks for this. I was checking out the ollama.service file from this [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored) but it's not clear on how to change the model's directory? I have changed the `service` file: ``` [Service] Environment="OLLAMA_HOST=0.0.0.0" OLLAMA_MODELS=/home/fahadshery/hugging-face-models ``` and then restarted the service but it's still showing the old location for storing models. What I don't understand is that `/usr` should be part of `/` which shows more than 50G available storage and it still runs of our storage? Do I need to increase `tmp` location as well for `runners`?
Author
Owner

@jmorganca commented on GitHub (Sep 4, 2024):

@fahadshery I believe this should fix it for you, to use:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/home/fahadshery/hugging-face-models"

You can also set OLLAMA_TMPDIR for the location to extract any temp files needed for execution

<!-- gh-comment-id:2328920470 --> @jmorganca commented on GitHub (Sep 4, 2024): @fahadshery I believe this should fix it for you, to use: ``` [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_MODELS=/home/fahadshery/hugging-face-models" ``` You can also set `OLLAMA_TMPDIR` for the location to extract any temp files needed for execution
Author
Owner

@rick-github commented on GitHub (Sep 4, 2024):

This is because of the way ollama does the conversion. In the case of safetensor models like Lily-Cybersecurity-7B-v0.2, the ollama client collects the files together into an archive and sends it to the server. For Lily-Cybersecurity-7B-v0.2, that's a 25G file. The ollama server unpacks the archive into the model directory but doesn't delete the archive just yet, so disk space is up to 50G. Then ollama starts to convert the safetensor files into an FP16 GGUF file, which will also be ~25G. So at the end of the conversion process, ollama needs 75G to hold the archive, the working files, and the final model, which exceeds the 64G you have free.

You can work around this by using a quantized GGUF file. If you are OK with q4_K_M, you can fetch the quantized model from segolilylabs/Lily-Cybersecurity-7B-v0.2-GGUF.

If you want the full fidelity model, you could do this by converting the safetensors to a GGUF file and then importing that file. I use llama.cpp in a docker container for that, but you can install llama.cpp locally and use the convert_hf_to_gguf.py script:

docker run --rm -v .:/model ghcr.io/ggerganov/llama.cpp:full -c --outtype f16 /model
echo FROM *.gguf > Modelfile
ollama create Lily-Cybersecurity-7B-v0.2

This skips the intermediate steps of extracting and converting the files, requiring less disk space.

<!-- gh-comment-id:2329341148 --> @rick-github commented on GitHub (Sep 4, 2024): This is because of the way ollama does the conversion. In the case of safetensor models like Lily-Cybersecurity-7B-v0.2, the ollama client collects the files together into an archive and sends it to the server. For Lily-Cybersecurity-7B-v0.2, that's a 25G file. The ollama server unpacks the archive into the model directory but doesn't delete the archive just yet, so disk space is up to 50G. Then ollama starts to convert the safetensor files into an FP16 GGUF file, which will also be ~25G. So at the end of the conversion process, ollama needs 75G to hold the archive, the working files, and the final model, which exceeds the 64G you have free. You can work around this by using a quantized GGUF file. If you are OK with q4_K_M, you can fetch the quantized model from [segolilylabs/Lily-Cybersecurity-7B-v0.2-GGUF](https://huggingface.co/segolilylabs/Lily-Cybersecurity-7B-v0.2-GGUF/tree/main). If you want the full fidelity model, you could do this by converting the safetensors to a GGUF file and then importing that file. I use llama.cpp in a docker container for that, but you can install [llama.cpp](https://github.com/ggerganov/llama.cpp) locally and use the `convert_hf_to_gguf.py` script: ``` docker run --rm -v .:/model ghcr.io/ggerganov/llama.cpp:full -c --outtype f16 /model echo FROM *.gguf > Modelfile ollama create Lily-Cybersecurity-7B-v0.2 ``` This skips the intermediate steps of extracting and converting the files, requiring less disk space.
Author
Owner

@fahadshery commented on GitHub (Sep 4, 2024):

docker run --rm -v .:/model ghcr.io/ggerganov/llama.cpp:full -c --outtype f16 /model
echo FROM *.gguf > Modelfile
ollama create Lily-Cybersecurity-7B-v0.2

This skips the intermediate steps of extracting and converting the files, requiring less disk space.

Thank you for the perfect explanation. Docker is the perfect solution since I have already dockerised everything!
Yes, I want to retain maximum quality when converting from safetensors to the GGUF. I tried looking up the --outtype f16 but couldnt find much info on it.
Are there other output types?
What are the key differences?
Which output retains the maximum quality of a model?
Sorry just getting started with this

Many thanks

<!-- gh-comment-id:2329412093 --> @fahadshery commented on GitHub (Sep 4, 2024): > ``` > docker run --rm -v .:/model ghcr.io/ggerganov/llama.cpp:full -c --outtype f16 /model > echo FROM *.gguf > Modelfile > ollama create Lily-Cybersecurity-7B-v0.2 > ``` > > This skips the intermediate steps of extracting and converting the files, requiring less disk space. Thank you for the perfect explanation. Docker is the perfect solution since I have already dockerised everything! Yes, I want to retain maximum quality when converting from `safetensors` to the `GGUF`. I tried looking up the `--outtype f16` but couldnt find much info on it. Are there other output types? What are the key differences? Which output retains the maximum quality of a model? Sorry just getting started with this Many thanks
Author
Owner

@fahadshery commented on GitHub (Sep 5, 2024):

Never mind, I read it up on it.
F16 is actually half precision (Float 16 quantisation and ollama considers it to be full precision for commodity hardware). F32 is full precision (Float32 representation for model parameters).
Some more reading:

https://medium.com/@qdrddr/the-easiest-way-to-convert-a-model-to-gguf-and-quantize-91016e97c987

https://medium.com/@arko.basu09/docker-run-llama-2-models-on-an-orangepi-5b-using-llama-cpp-e024996d7c4c

<!-- gh-comment-id:2330670507 --> @fahadshery commented on GitHub (Sep 5, 2024): Never mind, I read it up on it. `F16` is actually half precision (Float 16 quantisation and ollama considers it to be full precision for commodity hardware). `F32` is full precision (Float32 representation for model parameters). Some more reading: https://medium.com/@qdrddr/the-easiest-way-to-convert-a-model-to-gguf-and-quantize-91016e97c987 https://medium.com/@arko.basu09/docker-run-llama-2-models-on-an-orangepi-5b-using-llama-cpp-e024996d7c4c
Author
Owner

@yarasiviswa commented on GitHub (May 2, 2026):

What

This helped. Other forums were specifying Environment="OLLAMA_MODELS=/path/to/your/directory"
Environment is not needed. only OLLAMA_MODELS=/path/to/your/directory helps

<!-- gh-comment-id:4362850728 --> @yarasiviswa commented on GitHub (May 2, 2026): > What This helped. Other forums were specifying Environment="OLLAMA_MODELS=/path/to/your/directory" Environment is not needed. only OLLAMA_MODELS=/path/to/your/directory helps
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66208