[GH-ISSUE #10737] again, ollama is not using all VRAM with newer models, and I think with older ones too. #7051

Closed
opened 2026-04-12 18:58:04 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Fade78 on GitHub (May 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10737

What is the issue?

ollama version: 0.7.0

This is Qwen3:32 with a context of 19000 that should go into the VRAM. Already did this. But as you can see, ollama thinks it's 66GB in size and decides to load it in part outside the GPUs. Those three GPUs are dedicated to LLM and there is only one process per GPU. Meanwhile the VRAM is not used at all. I think this is part of another bug I opened and that was closed without investigation.

So what is wrong:

  • Qwen3:32b shouldn't, if memory serves, take 66GB of VRAM. In fact it should go into 48GB.
  • Ollama decide to split the imaginary 66GB 26/74. It means it believes I have 48GB of VRAM. I do.
  • Ollama put the imaginary 48GB load in the VRAM but only half of it is consumed in reality.

Image

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @Fade78 on GitHub (May 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10737 ### What is the issue? ollama version: 0.7.0 This is Qwen3:32 with a context of 19000 that should go into the VRAM. Already did this. But as you can see, ollama thinks it's 66GB in size and decides to load it in part outside the GPUs. Those three GPUs are dedicated to LLM and there is only one process per GPU. Meanwhile the VRAM is not used at all. I think this is part of another bug I opened and that was closed without investigation. So what is wrong: - Qwen3:32b shouldn't, if memory serves, take 66GB of VRAM. In fact it should go into 48GB. - Ollama decide to split the imaginary 66GB 26/74. It means it believes I have 48GB of VRAM. I do. - Ollama put the imaginary 48GB load in the VRAM but only half of it is consumed in reality. ![Image](https://github.com/user-attachments/assets/fbe8f03e-114b-489c-bf8d-cfbe855407d5) ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 18:58:04 -05:00
Author
Owner

@rick-github commented on GitHub (May 16, 2025):

Server logs may aid in debugging.

<!-- gh-comment-id:2887090583 --> @rick-github commented on GitHub (May 16, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@Fade78 commented on GitHub (May 16, 2025):

0.7.0 doesn't appear here anymore so I close this issue, I'll be back with the 0.6.8 I use now.

<!-- gh-comment-id:2887091601 --> @Fade78 commented on GitHub (May 16, 2025): 0.7.0 doesn't appear here anymore so I close this issue, I'll be back with the 0.6.8 I use now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7051