[GH-ISSUE #8606] Why doesn't my ollama use GPU #52077

Closed
opened 2026-04-28 21:51:52 -05:00 by GiteaMirror · 21 comments
Owner

Originally created by @baotianxia on GitHub (Jan 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8606

I installed the Nvidia driver through used sudo apt install nvidia-driver- xxxand the ollama display model is being used on the GPU, but my CPU usage is 100% and the GPU is 0%.

Image

Image

Image

Ubuntu server 24.04

Originally created by @baotianxia on GitHub (Jan 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8606 I installed the Nvidia driver through `used sudo apt install nvidia-driver- xxx`and the ollama display model is being used on the GPU, but my CPU usage is 100% and the GPU is 0%. ![Image](https://github.com/user-attachments/assets/d9473bd3-953a-4f12-99d2-36420a7645d5) ![Image](https://github.com/user-attachments/assets/53934bc4-d3c2-4e77-83a3-14a57749edca) ![Image](https://github.com/user-attachments/assets/550849ce-12f8-46fb-a04c-4e23244e6742) Ubuntu server 24.04
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Server logs will aid in debugging.

<!-- gh-comment-id:2615262946 --> @rick-github commented on GitHub (Jan 27, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@baotianxia commented on GitHub (Jan 27, 2025):

Server logs will aid in debugging.

Image

<!-- gh-comment-id:2615285269 --> @baotianxia commented on GitHub (Jan 27, 2025): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging. ![Image](https://github.com/user-attachments/assets/44c7c524-d12c-484d-b89a-2c21f4a90c86)
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Are you on a Mac?

<!-- gh-comment-id:2615294434 --> @rick-github commented on GitHub (Jan 27, 2025): Are you on a Mac?
Author
Owner

@baotianxia commented on GitHub (Jan 27, 2025):

Are you on a Mac?

Ubuntu server 24.04

<!-- gh-comment-id:2615298510 --> @baotianxia commented on GitHub (Jan 27, 2025): > Are you on a Mac? Ubuntu server 24.04
Author
Owner

@baotianxia commented on GitHub (Jan 27, 2025):

Are you on a Mac?

Ubuntu server 24.04

I will try to redeploy

<!-- gh-comment-id:2615301873 --> @baotianxia commented on GitHub (Jan 27, 2025): > > Are you on a Mac? > > Ubuntu server 24.04 I will try to redeploy
Author
Owner

@cleverunit commented on GitHub (Jan 27, 2025):

I encounted the same question, and very low speed while chating with ollama.

<!-- gh-comment-id:2615473047 --> @cleverunit commented on GitHub (Jan 27, 2025): I encounted the same question, and very low speed while chating with ollama.
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Server logs will aid in debugging.

<!-- gh-comment-id:2615479250 --> @rick-github commented on GitHub (Jan 27, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@prusnak commented on GitHub (Jan 27, 2025):

Tesla M60 uses Compute Capability 5.2 which is not enabled for Cuda 12.x - fix is in PR https://github.com/ollama/ollama/pull/8567

<!-- gh-comment-id:2615540522 --> @prusnak commented on GitHub (Jan 27, 2025): Tesla M60 uses Compute Capability 5.2 which is not enabled for Cuda 12.x - fix is in PR https://github.com/ollama/ollama/pull/8567
Author
Owner

@cleverunit commented on GitHub (Jan 27, 2025):

Log says:
root@autodl-container-6d25459816-0bbc0370:/autodl-pub/data/CASIAWebFace# journalctl -u ollama --no-pager
No journal files were found.
-- No entries --

<!-- gh-comment-id:2615559935 --> @cleverunit commented on GitHub (Jan 27, 2025): Log says: root@autodl-container-6d25459816-0bbc0370:/autodl-pub/data/CASIAWebFace# journalctl -u ollama --no-pager No journal files were found. -- No entries --
Author
Owner

@cleverunit commented on GitHub (Jan 27, 2025):

I‘m using GPU RTX2080Ti

<!-- gh-comment-id:2615561612 --> @cleverunit commented on GitHub (Jan 27, 2025): I‘m using GPU RTX2080Ti
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Does your autodl container use systemd? If not, how are you starting ollama? If you start is manually, do you redirect the output? If you redirect the output, that's where the system logs are.

<!-- gh-comment-id:2615607870 --> @rick-github commented on GitHub (Jan 27, 2025): Does your autodl container use systemd? If not, how are you starting ollama? If you start is manually, do you redirect the output? If you redirect the output, that's where the system logs are.
Author
Owner

@ghost commented on GitHub (Jan 27, 2025):

The same problem, ollama shows 100%GPU, but nvidia-smi shows no process in running.

Running model:

deepseek-r1-7b

GPU:

NVIDIA-P104

cuda:

NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7

logs:

Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx
Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v12_avx
Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.540+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-0be6e008-6845-1eab-8b1f-3c7e401d3b56 library=cuda variant=v12 compute=6.1 driver=12.7 name="NVIDIA P104-100" total="7.9 GiB" available="7.8 GiB"
Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.914+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[7.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.6 GiB" memory.required.partial="5.6 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[5.6 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx
Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v12_avx
Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx
Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v12_avx

<!-- gh-comment-id:2615636812 --> @ghost commented on GitHub (Jan 27, 2025): The same problem, ollama shows 100%GPU, but nvidia-smi shows no process in running. Running model: > deepseek-r1-7b GPU: > NVIDIA-P104 cuda: > NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 logs: > Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v12_avx Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.540+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-0be6e008-6845-1eab-8b1f-3c7e401d3b56 library=cuda variant=v12 compute=6.1 driver=12.7 name="NVIDIA P104-100" total="7.9 GiB" available="7.8 GiB" Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.914+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[7.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.6 GiB" memory.required.partial="5.6 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[5.6 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB" Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v12_avx Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx Jan 27 20:21:26 2404 ollama[3370]: time=2025-01-27T20:21:26.915+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v12_avx
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx

The runners have been compiled for CPUs with AVX extensions, which apparently your CPU doesn't have. If you are running on a virtual CPU, enable avx/avx2 (eg, for proxmox: https://pve.proxmox.com/pve-docs/cpu-models.conf.5.html#_example_file), or see here for instructions on building a custom ollama without AVX.

<!-- gh-comment-id:2615674331 --> @rick-github commented on GitHub (Jan 27, 2025): ``` Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx ``` The runners have been compiled for CPUs with AVX extensions, which apparently your CPU doesn't have. If you are running on a virtual CPU, enable avx/avx2 (eg, for proxmox: https://pve.proxmox.com/pve-docs/cpu-models.conf.5.html#_example_file), or see [here](https://github.com/ollama/ollama/blob/main/docs/development.md#advanced-cpu-vector-settings) for instructions on building a custom ollama without AVX.
Author
Owner

@cleverunit commented on GitHub (Jan 27, 2025):

Does your autodl container use systemd? If not, how are you starting ollama? If you start is manually, do you redirect the output? If you redirect the output, that's where the system logs are.

start ollama by using command:
ollama serve
and chat with ollama by using command:
ollama run qwen2.5:7b-instruct

<!-- gh-comment-id:2615683998 --> @cleverunit commented on GitHub (Jan 27, 2025): > Does your autodl container use systemd? If not, how are you starting ollama? If you start is manually, do you redirect the output? If you redirect the output, that's where the system logs are. start ollama by using command: ollama serve and chat with ollama by using command: ollama run qwen2.5:7b-instruct
Author
Owner

@ghost commented on GitHub (Jan 27, 2025):

Does your autodl container use systemd? If not, how are you starting ollama? If you start is manually, do you redirect the output? If you redirect the output, that's where the system logs are.

start ollama by using command: ollama serve and chat with ollama by using command: ollama run qwen2.5:7b-instruct

use journalctl -u ollama --no-pager | grep cuda

<!-- gh-comment-id:2615686604 --> @ghost commented on GitHub (Jan 27, 2025): > > Does your autodl container use systemd? If not, how are you starting ollama? If you start is manually, do you redirect the output? If you redirect the output, that's where the system logs are. > > start ollama by using command: ollama serve and chat with ollama by using command: ollama run qwen2.5:7b-instruct use journalctl -u ollama --no-pager | grep cuda
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

The terminal where you ran ollama serve is displaying the logs.

<!-- gh-comment-id:2615687735 --> @rick-github commented on GitHub (Jan 27, 2025): The terminal where you ran `ollama serve` is displaying the logs.
Author
Owner

@ghost commented on GitHub (Jan 27, 2025):

Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx

The runners have been compiled for CPUs with AVX extensions, which apparently your CPU doesn't have. If you are running on a virtual CPU, enable avx/avx2 (eg, for proxmox: https://pve.proxmox.com/pve-docs/cpu-models.conf.5.html#_example_file), or see here for instructions on building a custom ollama without AVX.

Thanks, I'll try it. The absence of errors in the log made me ignore it.

<!-- gh-comment-id:2615693011 --> @ghost commented on GitHub (Jan 27, 2025): > ``` > Jan 27 20:21:21 2404 ollama[3370]: time=2025-01-27T20:21:21.430+08:00 level=INFO source=common.go:131 msg="GPU runner incompatible with host system, CPU does not have AVX" runner=cuda_v11_avx > ``` > > The runners have been compiled for CPUs with AVX extensions, which apparently your CPU doesn't have. If you are running on a virtual CPU, enable avx/avx2 (eg, for proxmox: https://pve.proxmox.com/pve-docs/cpu-models.conf.5.html#_example_file), or see [here](https://github.com/ollama/ollama/blob/main/docs/development.md#advanced-cpu-vector-settings) for instructions on building a custom ollama without AVX. Thanks, I'll try it. The absence of errors in the log made me ignore it.
Author
Owner

@baotianxia commented on GitHub (Jan 28, 2025):

Tesla M60 uses Compute Capability 5.2 which is not enabled for Cuda 12.x - fix is in PR #8567

Tesla M60 uses Compute Capability 5.2 which is not enabled for Cuda 12.x - fix is in PR #8567

What should I do?Can you tell me in detail?

<!-- gh-comment-id:2617594898 --> @baotianxia commented on GitHub (Jan 28, 2025): > Tesla M60 uses Compute Capability 5.2 which is not enabled for Cuda 12.x - fix is in PR [#8567](https://github.com/ollama/ollama/pull/8567) > Tesla M60 uses Compute Capability 5.2 which is not enabled for Cuda 12.x - fix is in PR [#8567](https://github.com/ollama/ollama/pull/8567) What should I do?Can you tell me in detail?
Author
Owner

@baotianxia commented on GitHub (Jan 28, 2025):

I started with a new system, installing drivers through ubuntu-drivers, but it still didn't use the GPU. I have already installed cuda

<!-- gh-comment-id:2617607306 --> @baotianxia commented on GitHub (Jan 28, 2025): I started with a new system, installing drivers through ubuntu-drivers, but it still didn't use the GPU. I have already installed cuda
Author
Owner

@rick-github commented on GitHub (Jan 28, 2025):

Server logs will aid in debugging.

If it is a CUDA architecture issue with v12, you can try using the v11 runner by setting OLLAMA_LLM_LIBRARY=cuda_v11 in the server environment.

<!-- gh-comment-id:2617635056 --> @rick-github commented on GitHub (Jan 28, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging. If it is a CUDA architecture issue with v12, you can try using the v11 runner by setting `OLLAMA_LLM_LIBRARY=cuda_v11` in the server environment.
Author
Owner

@baotianxia commented on GitHub (Jan 28, 2025):

This issue has been resolved

If an incorrect cpu is used and the AVX instruction set is missing, the GPU cannot be used in ollama

Solution: pve changes the CPU to host. For details, you can query the CPU on the Internet

Other virtual machine software is similar

<!-- gh-comment-id:2617665555 --> @baotianxia commented on GitHub (Jan 28, 2025): This issue has been resolved If an incorrect cpu is used and the AVX instruction set is missing, the GPU cannot be used in ollama Solution: pve changes the CPU to host. For details, you can query the CPU on the Internet Other virtual machine software is similar
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52077