[GH-ISSUE #10502] Compute loads are shifted from GPU to CPU, to never return to GPU #32669

Closed
opened 2026-04-22 14:22:27 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @ctrlaltprocrastinate on GitHub (Apr 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10502

What is the issue?

First, this is not a new issue and has been bugging me for a while now. I've tried various things including running ollama in a container and limiting the ollama process to a single CPU core, but the endresult is always the same:

When starting long running tasks with Ollama, after a few seconds they are being shifted to the CPU and Ollama will only make minimal use of the GPU at all (idling in the single digit % load range, sometimes going up to around 25% for very short times). From that point on, the GPU pretty much sits idle until a new model is loaded or ollama is restarted. For me this is reproducible 100% of the time with long running things, regardless of the client (open-webui, avante, cline, roo). It feels like the GPU is actually managing the CPU and not the other way around, because the CPU runs full tilt on all cores (if available).

This issue probably affects all older hybrid machines like mine and is 100% reproducible on my TigerLake based laptop which comes with a discreet NVIDIA GeForce RTX 3070 Mobile / Max-Q (8GB) chip.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.6.6

Originally created by @ctrlaltprocrastinate on GitHub (Apr 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10502 ### What is the issue? First, this is not a new issue and has been bugging me for a while now. I've tried various things including running ollama in a container and limiting the ollama process to a single CPU core, but the endresult is always the same: When starting long running tasks with Ollama, after a few seconds they are being shifted to the CPU and Ollama will only make minimal use of the GPU at all (idling in the single digit % load range, sometimes going up to around 25% for very short times). From that point on, the GPU pretty much sits idle until a new model is loaded or ollama is restarted. For me this is reproducible 100% of the time with long running things, regardless of the client (open-webui, avante, cline, roo). It feels like the GPU is actually managing the CPU and not the other way around, because the CPU runs full tilt on all cores (if available). This issue probably affects all older hybrid machines like mine and is 100% reproducible on my TigerLake based laptop which comes with a discreet NVIDIA GeForce RTX 3070 Mobile / Max-Q (8GB) chip. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.6
GiteaMirror added the needs more infobug labels 2026-04-22 14:22:29 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 30, 2025):

Server logs will aid in debugging.

<!-- gh-comment-id:2842336601 --> @rick-github commented on GitHub (Apr 30, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@gschwind commented on GitHub (May 2, 2025):

Hello,

I have similar issue after update to cuda-12-8. It's seems to happen with several ollama version thus I think it's related to cuda.

I cannot revert my server to cuda-12-6 to check if it's actually related to cuda update, because cuda-12-6 cannot be installed belong cuda-12-8 and my server is in production (and ollama stuff is actually not a priority :) )

In other hand, I can test ollama that I build from github tree.

Other IA stuff working properly.

Best regards.

<!-- gh-comment-id:2847593873 --> @gschwind commented on GitHub (May 2, 2025): Hello, I have similar issue after update to cuda-12-8. It's seems to happen with several ollama version thus I think it's related to cuda. I cannot revert my server to cuda-12-6 to check if it's actually related to cuda update, because cuda-12-6 cannot be installed belong cuda-12-8 and my server is in production (and ollama stuff is actually not a priority :) ) In other hand, I can test ollama that I build from github tree. Other IA stuff working properly. Best regards.
Author
Owner

@rick-github commented on GitHub (May 2, 2025):

Server logs will aid in debugging.

<!-- gh-comment-id:2847602884 --> @rick-github commented on GitHub (May 2, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@gschwind commented on GitHub (May 7, 2025):

Hello,

I gathered this log using version v0.6.6, starting Ollama server then running llama3.1.

Best regards

stderr-20250507.log

<!-- gh-comment-id:2857862815 --> @gschwind commented on GitHub (May 7, 2025): Hello, I gathered this log using version v0.6.6, starting Ollama server then running llama3.1. Best regards [stderr-20250507.log](https://github.com/user-attachments/files/20080849/stderr-20250507.log)
Author
Owner

@gschwind commented on GitHub (May 7, 2025):

Same issue using v0.6.8

best regards

<!-- gh-comment-id:2857915236 --> @gschwind commented on GitHub (May 7, 2025): Same issue using v0.6.8 best regards
Author
Owner

@rick-github commented on GitHub (May 7, 2025):

time=2025-05-07T11:21:58.185+02:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/opt/ollama/ollama
time=2025-05-07T11:21:58.186+02:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)

ollama was unable to find any backends. ollama finds backends relative to the path to the binary, if you modified the location of the binary or backends and it can't find the libraries, it will fallback to using CPU.

/xxx/yyy/zzz/ollama
/xxx/yyy/lib/ollama/libggml-base.so

<!-- gh-comment-id:2857956271 --> @rick-github commented on GitHub (May 7, 2025): ``` time=2025-05-07T11:21:58.185+02:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/opt/ollama/ollama time=2025-05-07T11:21:58.186+02:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ``` ollama was unable to find any backends. ollama finds backends relative to the path to the binary, if you modified the location of the binary or backends and it can't find the libraries, it will fallback to using CPU. /xxx/yyy/zzz/ollama /xxx/yyy/lib/ollama/libggml-base.so
Author
Owner

@gschwind commented on GitHub (May 12, 2025):

@rick-github Thanks, it was the issue in my case, I didn't noticed that I have to build the backend.

I do not know if a debug message that suggest that the backend may not be installed or not built properly could help ?

Best regards

<!-- gh-comment-id:2872272011 --> @gschwind commented on GitHub (May 12, 2025): @rick-github Thanks, it was the issue in my case, I didn't noticed that I have to build the backend. I do not know if a debug message that suggest that the backend may not be installed or not built properly could help ? Best regards
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32669