[GH-ISSUE #12141] Models are not running on GPU, only using CPU #33830

Closed
opened 2026-04-22 16:54:56 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @iloveumyfriend on GitHub (Sep 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12141

What is the issue?

Hello,

I am running Ollama with a GPU available (NVIDIA, confirmed via nvidia-smi), but models seem to only use the CPU.

  • htop shows 100% CPU usage
  • nvidia-smi shows that Ollama is started, but no significant GPU utilization
  • Logs show lines like:
    offloading output layer to CPU
    offloaded 0/49 layers to GPU

Environment:

  • Ollama version: ollama version is 0.11.6
  • GPU: NVIDIA H100 80GB
  • OS: Ubuntu - 24.04.2

nvidia-smi.txt

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.6

Originally created by @iloveumyfriend on GitHub (Sep 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12141 ### What is the issue? Hello, I am running Ollama with a GPU available (NVIDIA, confirmed via nvidia-smi), but models seem to only use the CPU. - htop shows 100% CPU usage - nvidia-smi shows that Ollama is started, but no significant GPU utilization - Logs show lines like: `offloading output layer to CPU` `offloaded 0/49 layers to GPU` Environment: - Ollama version: `ollama version is 0.11.6` - GPU: NVIDIA H100 80GB - OS: Ubuntu - 24.04.2 [nvidia-smi.txt](https://github.com/user-attachments/files/22074736/nvidia-smi.txt) ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.6
GiteaMirror added the bug label 2026-04-22 16:54:56 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 1, 2025):

Add the full server log.

<!-- gh-comment-id:3241484505 --> @rick-github commented on GitHub (Sep 1, 2025): Add the full server log.
Author
Owner

@iloveumyfriend commented on GitHub (Sep 1, 2025):

journalctl.txt

<!-- gh-comment-id:3241500855 --> @iloveumyfriend commented on GitHub (Sep 1, 2025): [journalctl.txt](https://github.com/user-attachments/files/22074865/journalctl.txt)
Author
Owner

@rick-github commented on GitHub (Sep 1, 2025):

Sep 01 10:15:07 AI ollama[2128]: time=2025-09-01T10:15:07.233+03:00 level=INFO source=server.go:531 msg=offload library=cpu 

Despite detecting the GPU, the server thinks that the only usable backend is the CPU. The start of the log would show device detection but unfortunately has not been included. Do the following and post the output:

sudo systemctl stop ollama
sudo systemctl start ollama
ollama run gpt-oss:20b ''
journalctl -u ollama --no-pager --since "$(systemctl show ollama --property=ActiveEnterTimestamp --value)"
<!-- gh-comment-id:3241710537 --> @rick-github commented on GitHub (Sep 1, 2025): ``` Sep 01 10:15:07 AI ollama[2128]: time=2025-09-01T10:15:07.233+03:00 level=INFO source=server.go:531 msg=offload library=cpu ``` Despite detecting the GPU, the server thinks that the only usable backend is the CPU. The start of the log would show device detection but unfortunately has not been included. Do the following and post the output: ``` sudo systemctl stop ollama sudo systemctl start ollama ollama run gpt-oss:20b '' journalctl -u ollama --no-pager --since "$(systemctl show ollama --property=ActiveEnterTimestamp --value)" ```
Author
Owner

@iloveumyfriend commented on GitHub (Sep 1, 2025):

Thank you for the instructions. After stopping and restarting the Ollama service, the models started running on the GPU as expected. It seems the issue was resolved after restarting.

Do you know what could have caused Ollama to initially fall back to CPU even though the GPU was detected?

journalctl.txt

<!-- gh-comment-id:3241775279 --> @iloveumyfriend commented on GitHub (Sep 1, 2025): Thank you for the instructions. After stopping and restarting the Ollama service, the models started running on the GPU as expected. It seems the issue was resolved after restarting. Do you know what could have caused Ollama to initially fall back to CPU even though the GPU was detected? [journalctl.txt](https://github.com/user-attachments/files/22076293/journalctl.txt)
Author
Owner

@rick-github commented on GitHub (Sep 1, 2025):

Do you know what could have caused Ollama to initially fall back to CPU even though the GPU was detected?

No idea. As mentioned, the previous log may include relevant details.

<!-- gh-comment-id:3241793158 --> @rick-github commented on GitHub (Sep 1, 2025): > Do you know what could have caused Ollama to initially fall back to CPU even though the GPU was detected? No idea. As mentioned, the previous log may include relevant details.
Author
Owner

@iloveumyfriend commented on GitHub (Sep 1, 2025):

Thank you anyway for your help! I’ll keep an eye on the logs in case this happens again.

<!-- gh-comment-id:3241807847 --> @iloveumyfriend commented on GitHub (Sep 1, 2025): Thank you anyway for your help! I’ll keep an eye on the logs in case this happens again.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33830