[GH-ISSUE #6914] Work done by CPU instead of GPU #66421

Open
opened 2026-05-04 04:53:24 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Iliceth on GitHub (Sep 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6914

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I'm aware this might not be a bug, but I'm trying to understand and figure out if I can and/or should change something. When I try Reflection 70b.

CPU: 8 cores 100% utilization
RAM: 23 of 32 GB in use
GPU: on average 5% utilization
VRAM: 23 of 24 GB in use

I assume the model does not fit in VRAM and therefor the spread across VRAM and RAM, I'm fine with that. But the fact that the CPU seems to be expected to do almost all of the lifting seems odd to me, although it might be normal.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.11

Originally created by @Iliceth on GitHub (Sep 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6914 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I'm aware this might not be a bug, but I'm trying to understand and figure out if I can and/or should change something. When I try Reflection 70b. CPU: 8 cores 100% utilization RAM: 23 of 32 GB in use GPU: on average 5% utilization VRAM: 23 of 24 GB in use I assume the model does not fit in VRAM and therefor the spread across VRAM and RAM, I'm fine with that. But the fact that the CPU seems to be expected to do almost all of the lifting seems odd to me, although it might be normal. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.11
GiteaMirror added the question label 2026-05-04 04:53:24 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66421