[GH-ISSUE #3024] Ollama not using GPU, falling back to CPU #1860

New Issue

GiteaMirror · 2026-04-12T11:55:26-05:00

GiteaMirror commented

2026-04-12 11:55:26 -05:00

Originally created by @kopigeek-labs on GitHub (Mar 9, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3024

Originally assigned to: @dhiltgen on GitHub.

I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%.

Here is my output from docker logs ollama:

time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6"
time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6"
time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]"
time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected"
...
...
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so"
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
...
...
llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:        CPU input buffer size   =    13.02 MiB
llama_new_context_with_model:        CPU compute buffer size =   160.00 MiB

I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit

root@docker-debian:/root/docker# nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.14.6

nvidia-smi output:

root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidiactl \
--device /dev/nvidia-modeset \
--device /dev/nvidia-uvm \
debian nvidia-smi
Sat Mar  9 15:53:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla M40 24GB                 Off |   00000000:02:00.0 Off |                  Off |
| N/A   38C    P8             16W /  250W |       0MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1660 ...    Off |   00000000:03:00.0 Off |                  N/A |
| 51%   42C    P8             12W /  125W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I'm very new to this and learning! Hope some one can point me in the right direction

Originally created by @kopigeek-labs on GitHub (Mar 9, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3024 Originally assigned to: @dhiltgen on GitHub. I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%. Here is my output from `docker logs ollama`: ``` time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6" time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6" time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]" time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]" time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]" time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]" time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected" ... ... time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU" loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so" time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" ... ... llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU input buffer size = 13.02 MiB llama_new_context_with_model: CPU compute buffer size = 160.00 MiB ``` I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit ``` root@docker-debian:/root/docker# nvidia-ctk --version NVIDIA Container Toolkit CLI version 1.14.6 ``` `nvidia-smi` output: ``` root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidiactl \ --device /dev/nvidia-modeset \ --device /dev/nvidia-uvm \ debian nvidia-smi Sat Mar 9 15:53:14 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla M40 24GB Off | 00000000:02:00.0 Off | Off | | N/A 38C P8 16W / 250W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce GTX 1660 ... Off | 00000000:03:00.0 Off | N/A | | 51% 42C P8 12W / 125W | 0MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ``` I'm very new to this and learning! Hope some one can point me in the right direction

GiteaMirror added the docker nvidia labels 2026-04-12 11:55:27 -05:00

GiteaMirror closed this issue

2026-04-12 11:55:27 -05:00

GiteaMirror commented

2026-04-12 11:55:29 -05:00

@aosan commented on GitHub (Mar 11, 2024):

Hi @kopigeek-labs

It seems the problem starts at:

time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"

It appears your NVIDIA Tesla M40 24GB supports CUDA 5.2, no CUDA support for your NVIDIA GeForce GTX 1660, apparently.

It's not clear if CUDA Toolkit is installed, easily checked with:

nvcc --version

and it should output something similar to:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

I couldn't determine how NVIDIA `Tesla M40 24GB works with Driver Version: 550.54.14 but one way to check would be to follow the installation and configuration steps for CUDA from this documentation:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

I hope you get Ollama running with your GPU, it's painful to run it on CPU only.

@aosan commented on GitHub (Mar 11, 2024): Hi @kopigeek-labs It seems the problem starts at: `time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"` It appears your NVIDIA Tesla M40 24GB supports CUDA 5.2, no CUDA support for your NVIDIA GeForce GTX 1660, apparently. It's not clear if CUDA Toolkit is installed, easily checked with: `nvcc --version` and it should output something similar to: ``` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0 ``` I couldn't determine how NVIDIA `Tesla M40 24GB works with Driver Version: 550.54.14 but one way to check would be to follow the installation and configuration steps for CUDA from this documentation: [https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) I hope you get Ollama running with your GPU, it's painful to run it on CPU only.

GiteaMirror commented

2026-04-12 11:55:29 -05:00

@kopigeek-labs commented on GitHub (Mar 11, 2024):

Thank you @aosan for looking into this. I only had NVIDIA container toolkit installed (i'd thought it included CUDA) but not the CUDA toolkit. I've followed the instructions to install CUDA toolkit. See below nvcc --version output:

root@docker-debian:/root# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

and

root@docker-debian:/# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  550.54.14  Thu Feb 22 01:44:30 UTC 2024
GCC version:  gcc version 12.2.0 (Debian 12.2.0-14)

Rebooted and re-ran Ollama but it is still running painfully slow on CPU only. Still 0% on my GPU utilization. Looks to be the same error in the logs:

time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected"

@kopigeek-labs commented on GitHub (Mar 11, 2024): Thank you @aosan for looking into this. I only had NVIDIA container toolkit installed (i'd thought it included CUDA) but not the CUDA toolkit. I've followed the instructions to install CUDA toolkit. See below `nvcc --version` output: ``` root@docker-debian:/root# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0 ``` and ``` root@docker-debian:/# cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 550.54.14 Thu Feb 22 01:44:30 UTC 2024 GCC version: gcc version 12.2.0 (Debian 12.2.0-14) ``` Rebooted and re-ran Ollama but it is still running painfully slow on CPU only. Still 0% on my GPU utilization. Looks to be the same error in the logs: ``` time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected" ```

GiteaMirror commented

2026-04-12 11:55:30 -05:00

@aosan commented on GitHub (Mar 11, 2024):

OK, perhaps NVIDIA Tesla M40 is not supported by CUDA v12.

According to this article, Tesla M40/Maxwell/M Series are supported up to CUDA v11:

https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I couldn't confirm anything on NVIDIA's site, the link to M40 is broken:

https://developer.nvidia.com/cuda-gpus

@aosan commented on GitHub (Mar 11, 2024): OK, perhaps NVIDIA Tesla M40 is not supported by CUDA v12. According to this article, Tesla M40/Maxwell/M Series are supported up to CUDA v11: [https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/](url) I couldn't confirm anything on NVIDIA's site, the link to M40 is broken: [https://developer.nvidia.com/cuda-gpus](url)

GiteaMirror commented

2026-04-12 11:55:31 -05:00

@dhiltgen commented on GitHub (Mar 11, 2024):

@kopigeek-labs can you try only passing the GTX 1660 through to the container and see if it is able to discover without the 999 (unknown) error from the nvidia management library? That's a more modern Compute Capability.

Another experiment to try is installing on the host directly and see if that works to help isolate if this is a container image/runtime bug.

@dhiltgen commented on GitHub (Mar 11, 2024): @kopigeek-labs can you try only passing the GTX 1660 through to the container and see if it is able to discover without the 999 (unknown) error from the nvidia management library? That's a more modern Compute Capability. Another experiment to try is installing on the host directly and see if that works to help isolate if this is a container image/runtime bug.

GiteaMirror commented

2026-04-12 11:55:32 -05:00

@dhiltgen commented on GitHub (Mar 26, 2024):

One other thing to try. For systems showing the "unknown error" or "999" error on nvidia GPUs, try checking dmesg logs (dmesg -l err) to see if there's anything interesting being reported by the nvidia drivers.

@dhiltgen commented on GitHub (Mar 26, 2024): One other thing to try. For systems showing the "unknown error" or "999" error on nvidia GPUs, try checking `dmesg` logs (`dmesg -l err`) to see if there's anything interesting being reported by the nvidia drivers.

GiteaMirror commented

2026-04-12 11:55:33 -05:00

@dhiltgen commented on GitHub (Apr 12, 2024):

If you're still having troubles, please give the above suggestions a try and let us know.

@dhiltgen commented on GitHub (Apr 12, 2024): If you're still having troubles, please give the above suggestions a try and let us know.

GiteaMirror commented

2026-04-12 11:55:34 -05:00

@iganev commented on GitHub (Apr 29, 2024):

I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%.

Here is my output from docker logs ollama:

time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6"
time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6"
time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]"
time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected"
...
...
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so"
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
...
...
llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:        CPU input buffer size   =    13.02 MiB
llama_new_context_with_model:        CPU compute buffer size =   160.00 MiB

I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit

root@docker-debian:/root/docker# nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.14.6

nvidia-smi output:

root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidiactl \
--device /dev/nvidia-modeset \
--device /dev/nvidia-uvm \
debian nvidia-smi
Sat Mar  9 15:53:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla M40 24GB                 Off |   00000000:02:00.0 Off |                  Off |
| N/A   38C    P8             16W /  250W |       0MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1660 ...    Off |   00000000:03:00.0 Off |                  N/A |
| 51%   42C    P8             12W /  125W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I'm very new to this and learning! Hope some one can point me in the right direction

Did you try to run the docker container as --privileged and to pass the actual device nodes by mounting -v /dev:/dev?

That worked for me.

I used to face the same issue, which is what lead me to this thread.

To elaborate, here's when I was initially trying and did NOT work:

docker run --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

And here's what works for me:

docker run --gpus=all -v ollama:/root/.ollama -v /dev:/dev --privileged -p 11434:11434 --name ollama ollama/ollama

@iganev commented on GitHub (Apr 29, 2024): > I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%. > > Here is my output from `docker logs ollama`: > > ``` > time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]" > time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6" > time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6" > time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]" > time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]" > time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]" > time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]" > time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected" > ... > ... > time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU" > loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so > time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so" > time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" > ... > ... > llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB > llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB > llama_new_context_with_model: CPU input buffer size = 13.02 MiB > llama_new_context_with_model: CPU compute buffer size = 160.00 MiB > ``` > > I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit > > ``` > root@docker-debian:/root/docker# nvidia-ctk --version > NVIDIA Container Toolkit CLI version 1.14.6 > ``` > > `nvidia-smi` output: > > ``` > root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \ > --device /dev/nvidia0:/dev/nvidia0 \ > --device /dev/nvidia1:/dev/nvidia1 \ > --device /dev/nvidiactl \ > --device /dev/nvidia-modeset \ > --device /dev/nvidia-uvm \ > debian nvidia-smi > Sat Mar 9 15:53:14 2024 > +-----------------------------------------------------------------------------------------+ > | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | > |-----------------------------------------+------------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | > | | | MIG M. | > |=========================================+========================+======================| > | 0 Tesla M40 24GB Off | 00000000:02:00.0 Off | Off | > | N/A 38C P8 16W / 250W | 0MiB / 24576MiB | 0% Default | > | | | N/A | > +-----------------------------------------+------------------------+----------------------+ > | 1 NVIDIA GeForce GTX 1660 ... Off | 00000000:03:00.0 Off | N/A | > | 51% 42C P8 12W / 125W | 0MiB / 6144MiB | 0% Default | > | | | N/A | > +-----------------------------------------+------------------------+----------------------+ > > +-----------------------------------------------------------------------------------------+ > | Processes: | > | GPU GI CI PID Type Process name GPU Memory | > | ID ID Usage | > |=========================================================================================| > | No running processes found | > +-----------------------------------------------------------------------------------------+ > ``` > > I'm very new to this and learning! Hope some one can point me in the right direction Did you try to run the docker container as `--privileged` and to pass the actual device nodes by mounting `-v /dev:/dev`? That worked for me. I used to face the same issue, which is what lead me to this thread. To elaborate, here's when I was initially trying and did **NOT** work: ``` docker run --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ``` And here's what works for me: ``` docker run --gpus=all -v ollama:/root/.ollama -v /dev:/dev --privileged -p 11434:11434 --name ollama ollama/ollama ```

GiteaMirror referenced this issue

2026-04-22 03:27:05 -05:00

[GH-ISSUE #1860] [FEATURE] Add "mv" command + add possibly add confirmation for "rm" #26821

GiteaMirror referenced this issue

2026-04-28 04:16:22 -05:00

[GH-ISSUE #1860] [FEATURE] Add "mv" command + add possibly add confirmation for "rm" #47573

GiteaMirror referenced this issue

2026-05-03 12:01:07 -05:00

[GH-ISSUE #1860] [FEATURE] Add "mv" command + add possibly add confirmation for "rm" #63099

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#1860