[GH-ISSUE #8723] VRAM not being fully utilized #31417

Closed
opened 2026-04-22 11:51:05 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @linh1987 on GitHub (Jan 31, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8723

What is the issue?

Hey guys,
I notice that ollama doesn't make full use of my 3090's VRAM. it seems from the log that the detected VRAM available (~22.7GB) is lower then the real available VRAM reported by nvidia-smi (~23.8GB). The resulted effect is for some model, I can't fit both model and context into VRAM, while I can squeeze the same amount of layers and context cache fully in VRAM using koboldcpp.

Is there anything that we can do to get it to work? I attached the relevant output from nvidia-smi and ollama's server log. I use q8 kv cache quantization.

ollama.txt

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.5.7

Originally created by @linh1987 on GitHub (Jan 31, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8723 ### What is the issue? Hey guys, I notice that ollama doesn't make full use of my 3090's VRAM. it seems from the log that the detected VRAM available (~22.7GB) is lower then the real available VRAM reported by nvidia-smi (~23.8GB). The resulted effect is for some model, I can't fit both model and context into VRAM, while I can squeeze the same amount of layers and context cache fully in VRAM using koboldcpp. Is there anything that we can do to get it to work? I attached the relevant output from nvidia-smi and ollama's server log. I use q8 kv cache quantization. [ollama.txt](https://github.com/user-attachments/files/18615328/ollama.txt) ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-22 11:51:05 -05:00
Author
Owner

@qits-kkruse commented on GitHub (Jan 31, 2025):

Do you run your GUI on that same card? I believe it needs a bit of VRAM as well, 1.1GB VRAM looks reasonable to me.

My GUI (KDE/Wayland + Firefox and lots of tabs) uses 1.6GB.

<!-- gh-comment-id:2627544245 --> @qits-kkruse commented on GitHub (Jan 31, 2025): Do you run your GUI on that same card? I believe it needs a bit of VRAM as well, 1.1GB VRAM looks reasonable to me. My GUI (KDE/Wayland + Firefox and lots of tabs) uses 1.6GB.
Author
Owner

@linh1987 commented on GitHub (Jan 31, 2025):

I run on windows and I use only 0.3GB of VRAM before the first model is run, even then only 22.7GB VRAM is reported to be available. Nothing else is run on the computer.

<!-- gh-comment-id:2627566336 --> @linh1987 commented on GitHub (Jan 31, 2025): I run on windows and I use only 0.3GB of VRAM before the first model is run, even then only 22.7GB VRAM is reported to be available. Nothing else is run on the computer.
Author
Owner

@tie-pilot-qxw commented on GitHub (Feb 1, 2025):

I have the same issue when running on my 4060 laptop on WSL. There is a total of 8.0GB, but only 6.9 GB is available. I have raised issue #7996, but it's not solved yet.

<!-- gh-comment-id:2628734936 --> @tie-pilot-qxw commented on GitHub (Feb 1, 2025): I have the same issue when running on my 4060 laptop on WSL. There is a total of 8.0GB, but only 6.9 GB is available. I have raised issue #7996, but it's not solved yet.
Author
Owner

@rick-github commented on GitHub (Jan 5, 2026):

Memory estimation for some models is inaccurate. Recent versions run most models with a new allocation mechanism which improves memory utilisation. I'm going to close this as stale, if you still have problems open a new issue and include a full server log.

<!-- gh-comment-id:3709705709 --> @rick-github commented on GitHub (Jan 5, 2026): Memory estimation for some models is inaccurate. Recent versions run most models with a new allocation mechanism which improves memory utilisation. I'm going to close this as stale, if you still have problems open a new issue and include a full [server log](https://docs.ollama.com/troubleshooting).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31417