[GH-ISSUE #6612] GPU Not Use #29925

Closed
opened 2026-04-22 09:16:34 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @MEnsar55 on GitHub (Sep 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6612

Originally assigned to: @dhiltgen on GitHub.

Hello, I am using Windows 11 and I have installed LLama3.1 7B and Gemma 2 27B in Ollama. When using these modems the GPU is between 0%-1%. What should I do?
My laptop specs are:
HP Victus
14700hx
4070
32 GB Ram

Originally created by @MEnsar55 on GitHub (Sep 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6612 Originally assigned to: @dhiltgen on GitHub. Hello, I am using Windows 11 and I have installed LLama3.1 7B and Gemma 2 27B in Ollama. When using these modems the GPU is between 0%-1%. What should I do? My laptop specs are: HP Victus 14700hx 4070 32 GB Ram
GiteaMirror added the nvidianeeds more infowindows labels 2026-04-22 09:16:34 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 3, 2024):

Server logs will help in debugging.

<!-- gh-comment-id:2327228011 --> @rick-github commented on GitHub (Sep 3, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Author
Owner

@dhiltgen commented on GitHub (Sep 25, 2024):

There is no llama3.1 7B, but there is a 70B, and given you also mentioned gemma2 27B, I believe you're trying to load large models on a small VRAM system. Those two models are ~40G and ~16G respectively. I believe your GPU is an 8G card. Neither of these models will fit in VRAM, so they'll have to load mostly into system RAM and run on your CPU. Low GPU utilization in this case is expected, as the CPU is the bottleneck. Try loading smaller models that fit in your VRAM and you'll see much better performance. If you want the quality of these larger models, they will run slowly on your system.

<!-- gh-comment-id:2375386271 --> @dhiltgen commented on GitHub (Sep 25, 2024): There is no llama3.1 7B, but there is a 70B, and given you also mentioned gemma2 27B, I believe you're trying to load large models on a small VRAM system. Those two models are ~40G and ~16G respectively. I believe your GPU is an 8G card. Neither of these models will fit in VRAM, so they'll have to load mostly into system RAM and run on your CPU. Low GPU utilization in this case is expected, as the CPU is the bottleneck. Try loading smaller models that fit in your VRAM and you'll see much better performance. If you want the quality of these larger models, they will run slowly on your system.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29925