[GH-ISSUE #11154] Ollama not utilizing GPU, despite ollama ps showing 100% GPU usage #33117

Closed
opened 2026-04-22 15:25:46 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @SmilingPixel on GitHub (Jun 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11154

What is the issue?

I have manually installed Ollama v0.9.2 on a Ubuntu server with 4 RTX 3090, following instructions in Munual.

However, when running a model (Qwen3-4B), I noticed that the token generation speed is very slow — only about 2 tokens per second. I checked the system status:

  1. ollama ps reports that the model is loaded on the GPU at 100% usage.
  2. nvidia-smi shows no GPU load, while CPU usage is high.

Below is the ollama log.
log.log

And other system status.

> ollama ps                                                                                                                                              (base) 
NAME        ID              SIZE      PROCESSOR    UNTIL               
qwen3:4b    2bfd38a7daaf    5.2 GB    100% GPU     12 seconds from now
> nvidia-smi                                                                                                                                             (base) 
Sat Jun 21 18:27:21 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:19:00.0 Off |                  N/A |
| 30%   28C    P8             48W /  350W |      18MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:1A:00.0 Off |                  N/A |
| 30%   30C    P5             95W /  350W |      18MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        Off |   00000000:67:00.0 Off |                  N/A |
| 30%   32C    P5             96W /  350W |      18MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        Off |   00000000:68:00.0 Off |                  N/A |
| 30%   32C    P5            100W /  350W |      44MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1797      G   /usr/lib/xorg/Xorg                              4MiB |
|    1   N/A  N/A      1797      G   /usr/lib/xorg/Xorg                              4MiB |
|    2   N/A  N/A      1797      G   /usr/lib/xorg/Xorg                              4MiB |
|    3   N/A  N/A      1797      G   /usr/lib/xorg/Xorg                             19MiB |
|    3   N/A  N/A      4249      G   /usr/bin/gnome-shell                            6MiB |
+-----------------------------------------------------------------------------------------+

Relevant log output

See the log file.
I started the ollama using `OLLAMA_HOST=0.0.0.0:11434 OLLAMA_DEBUG=1 ollama serve`.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.9.2

Originally created by @SmilingPixel on GitHub (Jun 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11154 ### What is the issue? I have manually installed Ollama v0.9.2 on a Ubuntu server with 4 RTX 3090, following instructions in [Munual](https://github.com/ollama/ollama/blob/main/docs/linux.md). However, when running a model (Qwen3-4B), I noticed that the token generation speed is very slow — only about 2 tokens per second. I checked the system status: 1. `ollama ps` reports that the model is loaded on the GPU at 100% usage. 2. `nvidia-smi` shows no GPU load, while CPU usage is high. Below is the ollama log. [log.log](https://github.com/user-attachments/files/20846145/log.log) And other system status. ``` > ollama ps (base) NAME ID SIZE PROCESSOR UNTIL qwen3:4b 2bfd38a7daaf 5.2 GB 100% GPU 12 seconds from now ``` ``` > nvidia-smi (base) Sat Jun 21 18:27:21 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:19:00.0 Off | N/A | | 30% 28C P8 48W / 350W | 18MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 Off | 00000000:1A:00.0 Off | N/A | | 30% 30C P5 95W / 350W | 18MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3090 Off | 00000000:67:00.0 Off | N/A | | 30% 32C P5 96W / 350W | 18MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3090 Off | 00000000:68:00.0 Off | N/A | | 30% 32C P5 100W / 350W | 44MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1797 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 1797 G /usr/lib/xorg/Xorg 4MiB | | 2 N/A N/A 1797 G /usr/lib/xorg/Xorg 4MiB | | 3 N/A N/A 1797 G /usr/lib/xorg/Xorg 19MiB | | 3 N/A N/A 4249 G /usr/bin/gnome-shell 6MiB | +-----------------------------------------------------------------------------------------+ ``` ### Relevant log output ```shell See the log file. I started the ollama using `OLLAMA_HOST=0.0.0.0:11434 OLLAMA_DEBUG=1 ollama serve`. ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.9.2
GiteaMirror added the bug label 2026-04-22 15:25:46 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 21, 2025):

time=2025-06-21T18:22:10.315+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
time=2025-06-21T18:22:10.315+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)

No CPU or GPU enabled backends found, your installation is broken. Since the manual install is done in /usr and the server is looking for backends in /usr/local, it's possible that the binary was moved post-install or a previously installed version is taking precedence. I suggest using the recommended install method: curl -fsSL https://ollama.com/install.sh | sh.

<!-- gh-comment-id:2993537533 --> @rick-github commented on GitHub (Jun 21, 2025): ``` time=2025-06-21T18:22:10.315+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama time=2025-06-21T18:22:10.315+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) ``` No CPU or GPU enabled backends found, your installation is broken. Since the manual install is done in `/usr` and the server is looking for backends in `/usr/local`, it's possible that the binary was moved post-install or a previously installed version is taking precedence. I suggest using the recommended install method: `curl -fsSL https://ollama.com/install.sh | sh`.
Author
Owner

@SmilingPixel commented on GitHub (Jun 22, 2025):

I checked the Ollama installation and discovered that there were multiple versions installed:

> whereis ollama
ollama: /usr/bin/ollama /usr/lib/ollama /usr/local/bin/ollama /usr/local/lib/ollama /usr/local/ollama /usr/share/ollama

After removing these directories and reinstalling Ollama, the issue was resolved!

Thank you so much for your help! @rick-github

<!-- gh-comment-id:2993838368 --> @SmilingPixel commented on GitHub (Jun 22, 2025): I checked the Ollama installation and discovered that there were multiple versions installed: ``` > whereis ollama ollama: /usr/bin/ollama /usr/lib/ollama /usr/local/bin/ollama /usr/local/lib/ollama /usr/local/ollama /usr/share/ollama ``` After removing these directories and reinstalling Ollama, the issue was resolved! Thank you so much for your help! @rick-github
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33117