VRAM not being fully utilized #5662

Open
opened 2025-11-12 13:06:03 -06:00 by GiteaMirror · 3 comments
Owner

Originally created by @linh1987 on GitHub (Jan 31, 2025).

What is the issue?

Hey guys,
I notice that ollama doesn't make full use of my 3090's VRAM. it seems from the log that the detected VRAM available (~22.7GB) is lower then the real available VRAM reported by nvidia-smi (~23.8GB). The resulted effect is for some model, I can't fit both model and context into VRAM, while I can squeeze the same amount of layers and context cache fully in VRAM using koboldcpp.

Is there anything that we can do to get it to work? I attached the relevant output from nvidia-smi and ollama's server log. I use q8 kv cache quantization.

ollama.txt

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.5.7

Originally created by @linh1987 on GitHub (Jan 31, 2025). ### What is the issue? Hey guys, I notice that ollama doesn't make full use of my 3090's VRAM. it seems from the log that the detected VRAM available (~22.7GB) is lower then the real available VRAM reported by nvidia-smi (~23.8GB). The resulted effect is for some model, I can't fit both model and context into VRAM, while I can squeeze the same amount of layers and context cache fully in VRAM using koboldcpp. Is there anything that we can do to get it to work? I attached the relevant output from nvidia-smi and ollama's server log. I use q8 kv cache quantization. [ollama.txt](https://github.com/user-attachments/files/18615328/ollama.txt) ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.5.7
GiteaMirror added the bug label 2025-11-12 13:06:03 -06:00
Author
Owner

@qits-kkruse commented on GitHub (Jan 31, 2025):

Do you run your GUI on that same card? I believe it needs a bit of VRAM as well, 1.1GB VRAM looks reasonable to me.

My GUI (KDE/Wayland + Firefox and lots of tabs) uses 1.6GB.

@qits-kkruse commented on GitHub (Jan 31, 2025): Do you run your GUI on that same card? I believe it needs a bit of VRAM as well, 1.1GB VRAM looks reasonable to me. My GUI (KDE/Wayland + Firefox and lots of tabs) uses 1.6GB.
Author
Owner

@linh1987 commented on GitHub (Jan 31, 2025):

I run on windows and I use only 0.3GB of VRAM before the first model is run, even then only 22.7GB VRAM is reported to be available. Nothing else is run on the computer.

@linh1987 commented on GitHub (Jan 31, 2025): I run on windows and I use only 0.3GB of VRAM before the first model is run, even then only 22.7GB VRAM is reported to be available. Nothing else is run on the computer.
Author
Owner

@tie-pilot-qxw commented on GitHub (Feb 1, 2025):

I have the same issue when running on my 4060 laptop on WSL. There is a total of 8.0GB, but only 6.9 GB is available. I have raised issue #7996, but it's not solved yet.

@tie-pilot-qxw commented on GitHub (Feb 1, 2025): I have the same issue when running on my 4060 laptop on WSL. There is a total of 8.0GB, but only 6.9 GB is available. I have raised issue #7996, but it's not solved yet.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#5662