[GH-ISSUE #7801] GPU usage is not high, but the display memory is full #67044

Closed
opened 2026-05-04 09:20:08 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @duolax on GitHub (Nov 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7801

What is the issue?

image
When running the model, the CPU usage is often full, but the GPU usage is not high. However, after checking the details, it turns out that the GPU memory is fully occupied during operation

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.4.2

Originally created by @duolax on GitHub (Nov 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7801 ### What is the issue? ![image](https://github.com/user-attachments/assets/1c2cf4c5-069b-4d48-8d9f-8d8d4a4ebcd6) When running the model, the CPU usage is often full, but the GPU usage is not high. However, after checking the details, it turns out that the GPU memory is fully occupied during operation ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.4.2
GiteaMirror added the bug label 2026-05-04 09:20:08 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 23, 2024):

If you have loaded a model that does not fit 100% on the GPU, part of the model will be loaded in system RAM and the CPU is used for inference on that part of the model. When you do a completion, the part that runs on the GPU will finish quickly, and the part on the CPU will take longer. Thus the CPU will have high usage while the GPU is idle waiting for the next completion.

<!-- gh-comment-id:2495481046 --> @rick-github commented on GitHub (Nov 23, 2024): If you have loaded a model that does not fit 100% on the GPU, part of the model will be loaded in system RAM and the CPU is used for inference on that part of the model. When you do a completion, the part that runs on the GPU will finish quickly, and the part on the CPU will take longer. Thus the CPU will have high usage while the GPU is idle waiting for the next completion.
Author
Owner

@jmorganca commented on GitHub (Nov 23, 2024):

Thanks for the issue. @rick-github is correct – this is common if running the model requires the CPU

<!-- gh-comment-id:2495654443 --> @jmorganca commented on GitHub (Nov 23, 2024): Thanks for the issue. @rick-github is correct – this is common if running the model requires the CPU
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67044