[GH-ISSUE #8655] GPU process at 1-3% when running Deepseek R1 32b #52120

Closed
opened 2026-04-28 22:08:18 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @BananasMan on GitHub (Jan 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8655

What is the issue?

im trying to run deepseek r1 32b locally. It runs but the GPU barely used.

when it processing a simple task like multipying numbers, i saw in task manager that the gpu barely used at 1-3%, while the cpu at 70%.
i have to add tho that both ram and vram is still used well and both almost full.

GPU; RTX 3060 12gb
CPU: Ryzen 5 5600
RAM: 16GB

if you think my pc is not strong enough to run the 32b model, i dont mind recommendations of which r1 models i should use

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

newest one

Originally created by @BananasMan on GitHub (Jan 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8655 ### What is the issue? im trying to run deepseek r1 32b locally. It runs but the GPU barely used. when it processing a simple task like multipying numbers, i saw in task manager that the gpu barely used at 1-3%, while the cpu at 70%. i have to add tho that both ram and vram is still used well and both almost full. GPU; RTX 3060 12gb CPU: Ryzen 5 5600 RAM: 16GB if you think my pc is not strong enough to run the 32b model, i dont mind recommendations of which r1 models i should use ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version newest one
GiteaMirror added the bug label 2026-04-28 22:08:18 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 29, 2025):

deepseek r1 32b needs 20G, if you run ollama ps you will see that the model is only partially loaded in GPU. 14b, 8b or 7b models should fit in your 12G VRAM. 14b might be a squeeze if you use a large context window.

<!-- gh-comment-id:2621512798 --> @rick-github commented on GitHub (Jan 29, 2025): deepseek r1 32b needs 20G, if you run `ollama ps` you will see that the model is only partially loaded in GPU. 14b, 8b or 7b models should fit in your 12G VRAM. 14b might be a squeeze if you use a large context window.
Author
Owner

@BananasMan commented on GitHub (Jan 30, 2025):

deepseek r1 32b needs 20G, if you run ollama ps you will see that the model is only partially loaded in GPU. 14b, 8b or 7b models should fit in your 12G VRAM. 14b might be a squeeze if you use a large context window.

i tried the 14b model and it work flawlessly. I guess its simply because the 32b not loaded fully.

although im curious now if there is anyway i can run the 32b model?
when im running the 14b, both my gpu and cpu only use like 14% of processing powers?
performance wise, my pc should able to run 32b if i have enough vram.

<!-- gh-comment-id:2623875491 --> @BananasMan commented on GitHub (Jan 30, 2025): > deepseek r1 32b needs 20G, if you run `ollama ps` you will see that the model is only partially loaded in GPU. 14b, 8b or 7b models should fit in your 12G VRAM. 14b might be a squeeze if you use a large context window. i tried the 14b model and it work flawlessly. I guess its simply because the 32b not loaded fully. although im curious now if there is anyway i can run the 32b model? when im running the 14b, both my gpu and cpu only use like 14% of processing powers? performance wise, my pc should able to run 32b if i have enough vram.
Author
Owner

@rick-github commented on GitHub (Jan 30, 2025):

my pc should able to run 32b if i have enough vram

Yes. The problem is you don't have enough VRAM to run the 32b model.

<!-- gh-comment-id:2623880165 --> @rick-github commented on GitHub (Jan 30, 2025): > my pc should able to run 32b if i have enough vram Yes. The problem is you don't have enough VRAM to run the 32b model.
Author
Owner

@BananasMan commented on GitHub (Jan 31, 2025):

my pc should able to run 32b if i have enough vram

Yes. The problem is you don't have enough VRAM to run the 32b model.

okay then, thanks.

<!-- gh-comment-id:2627000147 --> @BananasMan commented on GitHub (Jan 31, 2025): > > my pc should able to run 32b if i have enough vram > > Yes. The problem is you don't have enough VRAM to run the 32b model. okay then, thanks.
Author
Owner

@sealad886 commented on GitHub (Feb 5, 2025):

Also, since ollama ps is a one-time snapshot of status, and not constantly updating status like a monitor (à la htop), you probably miss the points in time when the GPU is actually actively doing something. If a model is only partially loaded on the GPU but mostly loaded for CPU processing (or even mostly on GPU and only a few layers on the CPU), then the GPU calculations will speed past you, while the CPU calculations will take comparative eons.

<!-- gh-comment-id:2637051891 --> @sealad886 commented on GitHub (Feb 5, 2025): Also, since `ollama ps` is a one-time snapshot of status, and not constantly updating status like a monitor (à la `htop`), you probably miss the points in time when the GPU is actually actively doing something. If a model is only partially loaded on the GPU but _mostly_ loaded for CPU processing (or even _mostly_ on GPU and only a few layers on the CPU), then the GPU calculations will speed past you, while the CPU calculations will take comparative eons.
Author
Owner

@erickCantu commented on GitHub (Feb 25, 2025):

Try nvtop to monitor your GPU usage.

<!-- gh-comment-id:2681786432 --> @erickCantu commented on GitHub (Feb 25, 2025): Try nvtop to monitor your GPU usage.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52120