[GH-ISSUE #7902] Using GPU + single CPU or no GPU and all CPUs #51567

Closed
opened 2026-04-28 20:35:35 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @WaarlandIT on GitHub (Dec 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7902

What is the issue?

Some models do not use the GPUs at all but use CPU's only and others use GPU + a single CPU.
llama3.2 uses the GPU and a single CPU core spikes to 100% and the other 11 cores are not used at all.
tinyllama uses CPU cores only and uses alle 12 cores.

Not sure if that is a bug or not but I expected that when the CPUs are used ollama will always use all cores and not just 1.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.13

Originally created by @WaarlandIT on GitHub (Dec 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7902 ### What is the issue? Some models do not use the GPUs at all but use CPU's only and others use GPU + a single CPU. llama3.2 uses the GPU and a single CPU core spikes to 100% and the other 11 cores are not used at all. tinyllama uses CPU cores only and uses alle 12 cores. Not sure if that is a bug or not but I expected that when the CPUs are used ollama will always use all cores and not just 1. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.13
GiteaMirror added the bug label 2026-04-28 20:35:35 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 1, 2024):

Server logs would aid in debugging, specifically from when you ran llama3.2 and tinyllama. The CPU usage for llama3.2 sounds normal, as the CPU is busy waiting on the GPUs, and the CPU usage for tinyllama sounds normal for a model running only in RAM, but if llama3.2 fits on your GPU then tinyllama should as well, so that bears looking at.

<!-- gh-comment-id:2510208118 --> @rick-github commented on GitHub (Dec 1, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) would aid in debugging, specifically from when you ran llama3.2 and tinyllama. The CPU usage for llama3.2 sounds normal, as the CPU is busy waiting on the GPUs, and the CPU usage for tinyllama sounds normal for a model running only in RAM, but if llama3.2 fits on your GPU then tinyllama should as well, so that bears looking at.
Author
Owner

@WaarlandIT commented on GitHub (Dec 2, 2024):

Thanks for your reply, I will try to recreate the issue and collect the log.
I have been playing around with many different models to see what was going on.

<!-- gh-comment-id:2510675341 --> @WaarlandIT commented on GitHub (Dec 2, 2024): Thanks for your reply, I will try to recreate the issue and collect the log. I have been playing around with many different models to see what was going on.
Author
Owner

@WaarlandIT commented on GitHub (Dec 3, 2024):

I think I might have solved the problem. I noticed that it sometimes worked as expected and sometimes it did not.
What you said made me think that the GPU might have been busy at some point when I started tinyllama and it was forced to use the CPU instead.
Ollama is also used by Home Assistant for various tasks, when I stopped that integration Ollama seemed to work fine.

<!-- gh-comment-id:2513692256 --> @WaarlandIT commented on GitHub (Dec 3, 2024): I think I might have solved the problem. I noticed that it sometimes worked as expected and sometimes it did not. What you said made me think that the GPU might have been busy at some point when I started tinyllama and it was forced to use the CPU instead. Ollama is also used by Home Assistant for various tasks, when I stopped that integration Ollama seemed to work fine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51567