[GH-ISSUE #10357] Why is the CPU and GPU used in a mix instead of 100% GPU? #32564

Closed
opened 2026-04-22 13:57:49 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @kirito201711 on GitHub (Apr 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10357

 PROCESSOR              
 33%/67%   CPU/GPU   
Originally created by @kirito201711 on GitHub (Apr 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10357 PROCESSOR 33%/67% CPU/GPU
Author
Owner

@kth8 commented on GitHub (Apr 21, 2025):

the model size exceeds your GPU VRAM capacity so it has to load part of it on to the CPU/system RAM

<!-- gh-comment-id:2818033465 --> @kth8 commented on GitHub (Apr 21, 2025): the model size exceeds your GPU VRAM capacity so it has to load part of it on to the CPU/system RAM
Author
Owner

@pdevine commented on GitHub (Apr 21, 2025):

Going to go ahead and close this as answered. @kirito201711 unfortunately your GPU doesn't have enough VRAM to hold the model you're trying to run, so it starts using some of your system memory. You can try using a different model, or one which is more quantized.

<!-- gh-comment-id:2819571606 --> @pdevine commented on GitHub (Apr 21, 2025): Going to go ahead and close this as answered. @kirito201711 unfortunately your GPU doesn't have enough VRAM to hold the model you're trying to run, so it starts using some of your system memory. You can try using a different model, or one which is more quantized.
Author
Owner

@kirito201711 commented on GitHub (Apr 22, 2025):

My VRAM is only being used at less than half

the model size exceeds your GPU VRAM capacity so it has to load part of it on to the CPU/system RAM

<!-- gh-comment-id:2821359523 --> @kirito201711 commented on GitHub (Apr 22, 2025): My VRAM is only being used at less than half > the model size exceeds your GPU VRAM capacity so it has to load part of it on to the CPU/system RAM
Author
Owner

@kirito201711 commented on GitHub (Apr 22, 2025):

Going to go ahead and close this as answered. @kirito201711 unfortunately your GPU doesn't have enough VRAM to hold the model you're trying to run, so it starts using some of your system memory. You can try using a different model, or one which is more quantized.

My VRAM is only being used at less than half

<!-- gh-comment-id:2821360189 --> @kirito201711 commented on GitHub (Apr 22, 2025): > Going to go ahead and close this as answered. [@kirito201711](https://github.com/kirito201711) unfortunately your GPU doesn't have enough VRAM to hold the model you're trying to run, so it starts using some of your system memory. You can try using a different model, or one which is more quantized. My VRAM is only being used at less than half
Author
Owner

@pdevine commented on GitHub (Apr 22, 2025):

@kirito201711 each layer (which includes multiple tensors) has to be offloaded into system memory, so if you don't have a lot of VRAM, it's going to look like it's not using enough of your GPU. You didn't mention which model you were trying to load or the GPU you're using though.

<!-- gh-comment-id:2821793871 --> @pdevine commented on GitHub (Apr 22, 2025): @kirito201711 each layer (which includes multiple tensors) has to be offloaded into system memory, so if you don't have a lot of VRAM, it's going to look like it's not using enough of your GPU. You didn't mention which model you were trying to load or the GPU you're using though.
Author
Owner

@kirito201711 commented on GitHub (Apr 24, 2025):

@kirito201711 each layer (which includes multiple tensors) has to be offloaded into system memory, so if you don't have a lot of VRAM, it's going to look like it's not using enough of your GPU. You didn't mention which model you were trying to load or the GPU you're using though.

I found the problem. The gemma3:12b model I was running was 8.1GB, and my video memory was 8GB.

<!-- gh-comment-id:2827426832 --> @kirito201711 commented on GitHub (Apr 24, 2025): > [@kirito201711](https://github.com/kirito201711) each layer (which includes multiple tensors) has to be offloaded into system memory, so if you don't have a lot of VRAM, it's going to look like it's not using enough of your GPU. You didn't mention which model you were trying to load or the GPU you're using though. I found the problem. The gemma3:12b model I was running was 8.1GB, and my video memory was 8GB.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32564