[GH-ISSUE #4242] Ollama not using GPU #64682

Closed
opened 2026-05-03 18:29:54 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @ziqizh on GitHub (May 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4242

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am running a llama3 8b Q4, but it does not run on GPU. Here is the system information:

GPU: 10GB VRAM RTX 3080
OS: Ubuntu 22.04
CUDA version (from nvcc): 11.8
NVIDIA driver version: 545.23.06

I tried the installation script and Docker (sudo docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama) and observed no GPU usage.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.1.34

Originally created by @ziqizh on GitHub (May 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4242 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am running a llama3 8b Q4, but it does not run on GPU. Here is the system information: GPU: 10GB VRAM RTX 3080 OS: Ubuntu 22.04 CUDA version (from nvcc): 11.8 NVIDIA driver version: 545.23.06 I tried the installation script and Docker (`sudo docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama`) and observed no GPU usage. ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.34
GiteaMirror added the dockerbugnvidia labels 2026-05-03 18:29:55 -05:00
Author
Owner

@manojmanivannan commented on GitHub (May 8, 2024):

Something similar on my side, I have RTX 4090, running ollama on docker does not recognize my nvidia GPU

~/A/ollama $ docker run -d -e CUDA_VISIBLE_DEVICES=0 --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
5eea5106ec6d75e36a01e0c2ac88f3877f1899e99fb4a23ffe54b0aaba1b6a66
~/A/ollama $ docker logs ollama -f                                                                                       ⌚ 12:49:18
time=2024-05-08T11:49:18.380Z level=INFO source=images.go:828 msg="total blobs: 5"
time=2024-05-08T11:49:18.381Z level=INFO source=images.go:835 msg="total unused blobs removed: 0"
time=2024-05-08T11:49:18.381Z level=INFO source=routes.go:1071 msg="Listening on [::]:11434 (version 0.1.33)"
time=2024-05-08T11:49:18.381Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2592519153/runners
time=2024-05-08T11:49:20.151Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]"
time=2024-05-08T11:49:20.151Z level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-08T11:49:20.315Z level=INFO source=gpu.go:101 msg="detected GPUs" library=/tmp/ollama2592519153/runners/cuda_v11/libcudart.so.11.0 count=1
time=2024-05-08T11:49:20.315Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:143 msg="amdgpu too old gfx000" gpu=0
time=2024-05-08T11:49:20.427Z level=INFO source=amd_linux.go:286 msg="no compatible amdgpu devices detected"

output of nvidia-smi

~/A/ollama $ nvidia-smi                                                                                                  ⌚ 12:52:23
Wed May  8 12:52:25 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   41C    P5             28W /  450W |     146MiB /  24564MiB |     70%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2904      G   /usr/lib/xorg/Xorg                            133MiB |
+-----------------------------------------------------------------------------------------+
~/A/ollama $   

OS: Ubuntu 22.04
CPU: AMD
GPU NVIDIA 4090

<!-- gh-comment-id:2100407136 --> @manojmanivannan commented on GitHub (May 8, 2024): Something similar on my side, I have RTX 4090, running ollama on docker does not recognize my nvidia GPU ```bash ~/A/ollama $ docker run -d -e CUDA_VISIBLE_DEVICES=0 --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama 5eea5106ec6d75e36a01e0c2ac88f3877f1899e99fb4a23ffe54b0aaba1b6a66 ~/A/ollama $ docker logs ollama -f ⌚ 12:49:18 time=2024-05-08T11:49:18.380Z level=INFO source=images.go:828 msg="total blobs: 5" time=2024-05-08T11:49:18.381Z level=INFO source=images.go:835 msg="total unused blobs removed: 0" time=2024-05-08T11:49:18.381Z level=INFO source=routes.go:1071 msg="Listening on [::]:11434 (version 0.1.33)" time=2024-05-08T11:49:18.381Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2592519153/runners time=2024-05-08T11:49:20.151Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]" time=2024-05-08T11:49:20.151Z level=INFO source=gpu.go:96 msg="Detecting GPUs" time=2024-05-08T11:49:20.315Z level=INFO source=gpu.go:101 msg="detected GPUs" library=/tmp/ollama2592519153/runners/cuda_v11/libcudart.so.11.0 count=1 time=2024-05-08T11:49:20.315Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-05-08T11:49:20.427Z level=WARN source=amd_linux.go:143 msg="amdgpu too old gfx000" gpu=0 time=2024-05-08T11:49:20.427Z level=INFO source=amd_linux.go:286 msg="no compatible amdgpu devices detected" ``` output of `nvidia-smi` ```bash ~/A/ollama $ nvidia-smi ⌚ 12:52:23 Wed May 8 12:52:25 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off | | 0% 41C P5 28W / 450W | 146MiB / 24564MiB | 70% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2904 G /usr/lib/xorg/Xorg 133MiB | +-----------------------------------------------------------------------------------------+ ~/A/ollama $ ``` OS: Ubuntu 22.04 CPU: AMD GPU NVIDIA 4090
Author
Owner

@lujx1024 commented on GitHub (May 8, 2024):

I have the same problem with Ubuntu 20.04, with the Nvidia RTX 2060 12GB GPU installed.

I've double-checked that I installed the Driver Docker engine and Nvidia toolkit correctly.

I looked up the TroubleShooting page and found nothing, so if you know about the situation, please tell me what's going on. I would be very appreciative.

<!-- gh-comment-id:2100703084 --> @lujx1024 commented on GitHub (May 8, 2024): I have the same problem with Ubuntu 20.04, with the Nvidia RTX 2060 12GB GPU installed. I've double-checked that I installed the Driver Docker engine and Nvidia toolkit correctly. I looked up the TroubleShooting page and found nothing, so if you know about the situation, please tell me what's going on. I would be very appreciative.
Author
Owner

@dhiltgen commented on GitHub (May 8, 2024):

We've adjusted the GPU discovery logic in 0.1.34 to use a different nvidia library - the Driver API, which should hopefully make it more reliable. Can you all please try pulling the latest ollama/ollama image (or use the explicit tag ollama/ollama:0.1.34) and see if it discovered your GPUs correctly now? If not, please run the container with -e OLLAMA_DEBUG=1 and share the log so we can see what may be the problem.

<!-- gh-comment-id:2101299556 --> @dhiltgen commented on GitHub (May 8, 2024): We've adjusted the GPU discovery logic in 0.1.34 to use a different nvidia library - the Driver API, which should hopefully make it more reliable. Can you all please try pulling the latest `ollama/ollama` image (or use the explicit tag `ollama/ollama:0.1.34`) and see if it discovered your GPUs correctly now? If not, please run the container with `-e OLLAMA_DEBUG=1` and share the log so we can see what may be the problem.
Author
Owner

@ziqizh commented on GitHub (May 8, 2024):

Debug log from version 0.1.34:

time=2024-05-08T21:50:55.140Z level=INFO source=images.go:904 msg="total unused blobs removed: 0"                                         [34/574]
time=2024-05-08T21:50:55.141Z level=INFO source=routes.go:1034 msg="Listening on [::]:11434 (version 0.1.34)"
time=2024-05-08T21:50:55.141Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2923388324/runners                   time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_se
rver.gz                                                                                                                                           time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz                                                                                                                                         time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.
11.0.gz                                                                                                                                           time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_
server.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.tx
t.gz
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_t.gz                                                                                                                                      
time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_
llama_server.gz                                                                                                                                   time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu
time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx
time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx2         time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cuda_v11         time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/rocm_v60002      time=2024-05-08T21:50:57.928Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"             time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-05-08T21:50:57.928Z level=DEBUG source=sched.go:85 msg="starting llm scheduler"                                                         time=2024-05-08T21:50:57.928Z level=INFO source=gpu.go:122 msg="Detecting GPUs"
time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcuda.so*
time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /us
r/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /us
r/local/lib*/libcuda.so*]"
time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06]
cuInit err: 999
time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06]
cuInit err: 999
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:336 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06 error="nvcuda init failure: 999"
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcudart.so*
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2923388324/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 999
time=2024-05-08T21:50:57.934Z level=DEBUG source=gpu.go:319 msg="Unable to load cudart" library=/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0 error="cudart init failure: 999"
time=2024-05-08T21:50:57.934Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-08T21:50:57.934Z level=DEBUG source=amd_linux.go:297 msg="amdgpu driver not detected /sys/module/amdgpu"

btw I am able to run llama.cpp and LM Studio on GPU, so CUDA should be working.

<!-- gh-comment-id:2101545717 --> @ziqizh commented on GitHub (May 8, 2024): Debug log from version `0.1.34`: ``` time=2024-05-08T21:50:55.140Z level=INFO source=images.go:904 msg="total unused blobs removed: 0" [34/574] time=2024-05-08T21:50:55.141Z level=INFO source=routes.go:1034 msg="Listening on [::]:11434 (version 0.1.34)" time=2024-05-08T21:50:55.141Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2923388324/runners time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_se rver.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so. 11.0.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_ server.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.tx t.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_t.gz time=2024-05-08T21:50:55.141Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_ llama_server.gz time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cpu_avx2 time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/cuda_v11 time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2923388324/runners/rocm_v60002 time=2024-05-08T21:50:57.928Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-05-08T21:50:57.928Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-05-08T21:50:57.928Z level=DEBUG source=sched.go:85 msg="starting llm scheduler" time=2024-05-08T21:50:57.928Z level=INFO source=gpu.go:122 msg="Detecting GPUs" time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcuda.so* time=2024-05-08T21:50:57.928Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcuda.so** /usr/local/nvidia/lib64/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /us r/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /us r/local/lib*/libcuda.so*]" time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06] cuInit err: 999 time=2024-05-08T21:50:57.929Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06] cuInit err: 999 time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:336 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.545.23.06 error="nvcuda init failure: 999" time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:255 msg="Searching for GPU library" name=libcudart.so* time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:274 msg="gpu library search" globs="[/usr/local/nvidia/lib/libcudart.so** /usr/local/nvidia/lib64/libcudart.so** /tmp/ollama2923388324/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-05-08T21:50:57.932Z level=DEBUG source=gpu.go:307 msg="discovered GPU libraries" paths=[/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 999 time=2024-05-08T21:50:57.934Z level=DEBUG source=gpu.go:319 msg="Unable to load cudart" library=/tmp/ollama2923388324/runners/cuda_v11/libcudart.so.11.0 error="cudart init failure: 999" time=2024-05-08T21:50:57.934Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-05-08T21:50:57.934Z level=DEBUG source=amd_linux.go:297 msg="amdgpu driver not detected /sys/module/amdgpu" ``` btw I am able to run llama.cpp and LM Studio on GPU, so CUDA should be working.
Author
Owner

@dhiltgen commented on GitHub (May 8, 2024):

@ziqizh What I've heard is this typically points to a driver bug, but you mention other tools are working properly inside a container on the same host. Can you try any/all of the following and see if it resolves the problem, or gives us more details on the failure?

  • Does a reboot change the failure mode?
  • Can you upgrade to the latest driver? (if already running the latest driver, re-install the driver)
  • Set CUDA_ERROR_LEVEL=50 and try again to see if we get more diagnostic logs
  • Check dmesg for any errors sudo dmesg | grep -i nvrm and sudo dmesg | grep -i nvidia
  • Is the uvm driver not loaded? sudo nvidia-modprobe -u
  • Try reloading the nvidia_uvm driver - sudo rmmod nvidia_uvm then sudo modprobe nvidia_uvm
<!-- gh-comment-id:2101676985 --> @dhiltgen commented on GitHub (May 8, 2024): @ziqizh What I've heard is this typically points to a driver bug, but you mention other tools are working properly inside a container on the same host. Can you try any/all of the following and see if it resolves the problem, or gives us more details on the failure? - Does a reboot change the failure mode? - Can you upgrade to the latest driver? (if already running the latest driver, re-install the driver) - Set `CUDA_ERROR_LEVEL=50` and try again to see if we get more diagnostic logs - Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia` - Is the uvm driver not loaded? `sudo nvidia-modprobe -u` - Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm`
Author
Owner

@lujx1024 commented on GitHub (May 9, 2024):

@dhiltgen It works.
I removed the old docker image and pulled the latest image by explicit tag ollama/ollama:0.1.34. it still cannot discover the GPU at first, but when I reboot the machine and restart the container, the problem is solved. Thanks for your effort, appreciated.

Here is the Monitor data, GPU is discovered:
ollama_running_003

<!-- gh-comment-id:2101746676 --> @lujx1024 commented on GitHub (May 9, 2024): @dhiltgen It works. I removed the old docker image and pulled the latest image by explicit tag `ollama/ollama:0.1.34`. it still cannot discover the GPU at first, but when I reboot the machine and restart the container, the problem is solved. Thanks for your effort, appreciated. Here is the Monitor data, GPU is discovered: <img width="1920" alt="ollama_running_003" src="https://github.com/ollama/ollama/assets/36251986/9ea24b23-2f74-4804-bdba-3e87c5611102">
Author
Owner

@farheinheigt commented on GitHub (May 9, 2024):

great, the update works well for me, GPU is working perfectly now thanks

<!-- gh-comment-id:2101788883 --> @farheinheigt commented on GitHub (May 9, 2024): great, the update works well for me, GPU is working perfectly now thanks
Author
Owner

@dhiltgen commented on GitHub (May 9, 2024):

That's great to hear!

I'll update our doc's to include these workaround/troubleshooting steps for others that may hit these problems as well.

<!-- gh-comment-id:2102942282 --> @dhiltgen commented on GitHub (May 9, 2024): That's great to hear! I'll update our doc's to include these workaround/troubleshooting steps for others that may hit these problems as well.
Author
Owner

@manojmanivannan commented on GitHub (May 15, 2024):

I can confirm this as well, using the latest ollama image solves the issue for me

~ $ docker run -d --gpus=all -v ollama:/root/.ollama -p 11435:11434 --name ollama ollama/ollama                                                                                                                                                                  ⌚ 6:07:27
61671828322b09f4a48da9635ffec3e5fee52d367385d8a3353cb0880c3a7a7a
~ $ docker logs ollama -f                                                                                                                                                                                                                                        ⌚ 6:07:30
2024/05/15 17:07:30 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-15T17:07:30.376Z level=INFO source=images.go:704 msg="total blobs: 0"
time=2024-05-15T17:07:30.376Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-15T17:07:30.376Z level=INFO source=routes.go:1052 msg="Listening on [::]:11434 (version 0.1.37)"
time=2024-05-15T17:07:30.376Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2950159761/runners
time=2024-05-15T17:07:32.115Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:163 msg="amdgpu too old gfx000" gpu=0
time=2024-05-15T17:07:32.303Z level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected"
time=2024-05-15T17:07:32.303Z level=INFO source=types.go:71 msg="inference compute" id=GPU-83112007-904c-7153-a250-6fce07a5db13 library=cuda compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.4 GiB" available="22.9 GiB"

<!-- gh-comment-id:2113062430 --> @manojmanivannan commented on GitHub (May 15, 2024): I can confirm this as well, using the latest ollama image solves the issue for me ```bash ~ $ docker run -d --gpus=all -v ollama:/root/.ollama -p 11435:11434 --name ollama ollama/ollama ⌚ 6:07:27 61671828322b09f4a48da9635ffec3e5fee52d367385d8a3353cb0880c3a7a7a ~ $ docker logs ollama -f ⌚ 6:07:30 2024/05/15 17:07:30 routes.go:1006: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-05-15T17:07:30.376Z level=INFO source=images.go:704 msg="total blobs: 0" time=2024-05-15T17:07:30.376Z level=INFO source=images.go:711 msg="total unused blobs removed: 0" time=2024-05-15T17:07:30.376Z level=INFO source=routes.go:1052 msg="Listening on [::]:11434 (version 0.1.37)" time=2024-05-15T17:07:30.376Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2950159761/runners time=2024-05-15T17:07:32.115Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-05-15T17:07:32.303Z level=WARN source=amd_linux.go:163 msg="amdgpu too old gfx000" gpu=0 time=2024-05-15T17:07:32.303Z level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected" time=2024-05-15T17:07:32.303Z level=INFO source=types.go:71 msg="inference compute" id=GPU-83112007-904c-7153-a250-6fce07a5db13 library=cuda compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.4 GiB" available="22.9 GiB" ```
Author
Owner

@2loki4u commented on GitHub (Jul 7, 2024):

greetings all - full disclosure, be gentle - lol - I'm a novice in linux as well as most aspects of running a dedicated AI server. I've built a dedicated workstation for self hosted AI - ryzen 7900x/64g ddr/4070ti-super/m.2 4x4 - os: mint 21.3 (ubuntu 22.04).

Struggling on how to resolve an issue where some llama models fully utilize the GPU and some do not. example: llama3:latest, fully utilizes the GPU as does llama2:latest but neither mixtral nor llama3:70b are even touching the GPU and solely peg out most if not all cores on the 7900x.

I'm running the latest ollama build 0.1.48 with nvidia 550.90.07 drivers - nvidia is set to "on-demand" - upon install of 0.1.48 machine reports nvidia GPU detected (obviously, based on 2 of 4 models using it extensively).

ollama is installed directly on linux (not a docker container) - I am using a docker container for openweb-ui and I see the same behavior based on which model I chose.

reading through this thread, I noted that the other users were running ollama in a docker container where I am not. Wasn't sure if this would change the method of troubleshooting but frankly I am at a loss as I cannot explain why it would behave in this way.

Questions:
Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU?
Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use)
Could something else be going on here?

thanks in advance... if you need additional information, please provide instructions with your ask (or just reference where to get them)

<!-- gh-comment-id:2212159743 --> @2loki4u commented on GitHub (Jul 7, 2024): greetings all - full disclosure, be gentle - lol - I'm a novice in linux as well as most aspects of running a dedicated AI server. I've built a dedicated workstation for self hosted AI - ryzen 7900x/64g ddr/4070ti-super/m.2 4x4 - os: mint 21.3 (ubuntu 22.04). Struggling on how to resolve an issue where some llama models fully utilize the GPU and some do not. example: llama3:latest, fully utilizes the GPU as does llama2:latest but neither mixtral nor llama3:70b are even touching the GPU and solely peg out most if not all cores on the 7900x. I'm running the latest ollama build 0.1.48 with nvidia 550.90.07 drivers - nvidia is set to "on-demand" - upon install of 0.1.48 machine reports nvidia GPU detected (obviously, based on 2 of 4 models using it extensively). ollama is installed directly on linux (not a docker container) - I am using a docker container for openweb-ui and I see the same behavior based on which model I chose. reading through this thread, I noted that the other users were running ollama in a docker container where I am not. Wasn't sure if this would change the method of troubleshooting but frankly I am at a loss as I cannot explain why it would behave in this way. Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here? thanks in advance... if you need additional information, please provide instructions with your ask (or just reference where to get them)
Author
Owner

@Trijeet commented on GitHub (Feb 5, 2025):

Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here?

Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM?

<!-- gh-comment-id:2638196439 --> @Trijeet commented on GitHub (Feb 5, 2025): > > Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here? > Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM?
Author
Owner

@2loki4u commented on GitHub (Feb 6, 2025):

Not that I've found - a colleague of mine was seeking to do the same thing and found the same road block, just a week or so ago.


From: Trijeet Ganguly @.>
Sent: Wednesday, February 5, 2025 5:38 PM
To: ollama/ollama @.
>
Cc: Chris B @.>; Comment @.>
Subject: Re: [ollama/ollama] Ollama not using GPU (Issue #4242)

Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here?

Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM?


Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/4242#issuecomment-2638196439, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANULTJFYSMTW4BPTJ2OUDBD2OKHHRAVCNFSM6AAAAABWSFINIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE4TMNBTHE.
You are receiving this because you commented.Message ID: @.***>

<!-- gh-comment-id:2638728457 --> @2loki4u commented on GitHub (Feb 6, 2025): Not that I've found - a colleague of mine was seeking to do the same thing and found the same road block, just a week or so ago. ________________________________ From: Trijeet Ganguly ***@***.***> Sent: Wednesday, February 5, 2025 5:38 PM To: ollama/ollama ***@***.***> Cc: Chris B ***@***.***>; Comment ***@***.***> Subject: Re: [ollama/ollama] Ollama not using GPU (Issue #4242) Questions: Could it be that the larger models being greater than the available VRAM of this GPU causes ollama to defaults back to the CPU and system memory? If so, is this normal? Is there a work around to force utilization of the GPU? Should I delete the large models previously downloaded and re-download them? (seems highly unlikely unless they get marked in some way when downloaded, as to what resources to use) Could something else be going on here? Hi, is there any answer to this? This seems to be the case even now. Are there any workarounds for ollama to use both GPU and CPU for models larger than available VRAM? — Reply to this email directly, view it on GitHub<https://github.com/ollama/ollama/issues/4242#issuecomment-2638196439>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANULTJFYSMTW4BPTJ2OUDBD2OKHHRAVCNFSM6AAAAABWSFINIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE4TMNBTHE>. You are receiving this because you commented.Message ID: ***@***.***>
Author
Owner

@cfengseagate commented on GitHub (Aug 14, 2025):

Just a side note. I have CUDA 11.8 and cannot upgrade it. Ollama:0.3.14 can use GPU but 0.9+ cannot. Didn't test all versions but apparently there is a compatibility connection between ollama version and cuda version.

<!-- gh-comment-id:3189586181 --> @cfengseagate commented on GitHub (Aug 14, 2025): Just a side note. I have CUDA 11.8 and cannot upgrade it. Ollama:0.3.14 can use GPU but 0.9+ cannot. Didn't test all versions but apparently there is a compatibility connection between ollama version and cuda version.
Author
Owner

@Mikhail42 commented on GitHub (Aug 30, 2025):

See also https://github.com/ollama/ollama/issues/5464

<!-- gh-comment-id:3239513734 --> @Mikhail42 commented on GitHub (Aug 30, 2025): See also https://github.com/ollama/ollama/issues/5464
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64682