[GH-ISSUE #3024] Ollama not using GPU, falling back to CPU #1860

Closed
opened 2026-04-12 11:55:26 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @kopigeek-labs on GitHub (Mar 9, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3024

Originally assigned to: @dhiltgen on GitHub.

I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%.

Here is my output from docker logs ollama:

time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6"
time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6"
time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]"
time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected"
...
...
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so"
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
...
...
llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:        CPU input buffer size   =    13.02 MiB
llama_new_context_with_model:        CPU compute buffer size =   160.00 MiB

I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit

root@docker-debian:/root/docker# nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.14.6

nvidia-smi output:

root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidiactl \
--device /dev/nvidia-modeset \
--device /dev/nvidia-uvm \
debian nvidia-smi
Sat Mar  9 15:53:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla M40 24GB                 Off |   00000000:02:00.0 Off |                  Off |
| N/A   38C    P8             16W /  250W |       0MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1660 ...    Off |   00000000:03:00.0 Off |                  N/A |
| 51%   42C    P8             12W /  125W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I'm very new to this and learning! Hope some one can point me in the right direction

Originally created by @kopigeek-labs on GitHub (Mar 9, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3024 Originally assigned to: @dhiltgen on GitHub. I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%. Here is my output from `docker logs ollama`: ``` time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6" time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6" time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]" time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]" time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]" time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected" time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]" time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected" ... ... time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU" loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so" time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" ... ... llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU input buffer size = 13.02 MiB llama_new_context_with_model: CPU compute buffer size = 160.00 MiB ``` I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit ``` root@docker-debian:/root/docker# nvidia-ctk --version NVIDIA Container Toolkit CLI version 1.14.6 ``` `nvidia-smi` output: ``` root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidiactl \ --device /dev/nvidia-modeset \ --device /dev/nvidia-uvm \ debian nvidia-smi Sat Mar 9 15:53:14 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla M40 24GB Off | 00000000:02:00.0 Off | Off | | N/A 38C P8 16W / 250W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce GTX 1660 ... Off | 00000000:03:00.0 Off | N/A | | 51% 42C P8 12W / 125W | 0MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ``` I'm very new to this and learning! Hope some one can point me in the right direction
GiteaMirror added the dockernvidia labels 2026-04-12 11:55:27 -05:00
Author
Owner

@aosan commented on GitHub (Mar 11, 2024):

Hi @kopigeek-labs

It seems the problem starts at:

time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"

It appears your NVIDIA Tesla M40 24GB supports CUDA 5.2, no CUDA support for your NVIDIA GeForce GTX 1660, apparently.

It's not clear if CUDA Toolkit is installed, easily checked with:

nvcc --version

and it should output something similar to:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

I couldn't determine how NVIDIA `Tesla M40 24GB works with Driver Version: 550.54.14 but one way to check would be to follow the installation and configuration steps for CUDA from this documentation:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

I hope you get Ollama running with your GPU, it's painful to run it on CPU only.

<!-- gh-comment-id:1988418775 --> @aosan commented on GitHub (Mar 11, 2024): Hi @kopigeek-labs It seems the problem starts at: `time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"` It appears your NVIDIA Tesla M40 24GB supports CUDA 5.2, no CUDA support for your NVIDIA GeForce GTX 1660, apparently. It's not clear if CUDA Toolkit is installed, easily checked with: `nvcc --version` and it should output something similar to: ``` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0 ``` I couldn't determine how NVIDIA `Tesla M40 24GB works with Driver Version: 550.54.14 but one way to check would be to follow the installation and configuration steps for CUDA from this documentation: [https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) I hope you get Ollama running with your GPU, it's painful to run it on CPU only.
Author
Owner

@kopigeek-labs commented on GitHub (Mar 11, 2024):

Thank you @aosan for looking into this. I only had NVIDIA container toolkit installed (i'd thought it included CUDA) but not the CUDA toolkit. I've followed the instructions to install CUDA toolkit. See below nvcc --version output:

root@docker-debian:/root# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

and

root@docker-debian:/# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  550.54.14  Thu Feb 22 01:44:30 UTC 2024
GCC version:  gcc version 12.2.0 (Debian 12.2.0-14) 

Rebooted and re-ran Ollama but it is still running painfully slow on CPU only. Still 0% on my GPU utilization. Looks to be the same error in the logs:

time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected"
<!-- gh-comment-id:1988586687 --> @kopigeek-labs commented on GitHub (Mar 11, 2024): Thank you @aosan for looking into this. I only had NVIDIA container toolkit installed (i'd thought it included CUDA) but not the CUDA toolkit. I've followed the instructions to install CUDA toolkit. See below `nvcc --version` output: ``` root@docker-debian:/root# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0 ``` and ``` root@docker-debian:/# cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 550.54.14 Thu Feb 22 01:44:30 UTC 2024 GCC version: gcc version 12.2.0 (Debian 12.2.0-14) ``` Rebooted and re-ran Ollama but it is still running painfully slow on CPU only. Still 0% on my GPU utilization. Looks to be the same error in the logs: ``` time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0" time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected" ```
Author
Owner

@aosan commented on GitHub (Mar 11, 2024):

OK, perhaps NVIDIA Tesla M40 is not supported by CUDA v12.

According to this article, Tesla M40/Maxwell/M Series are supported up to CUDA v11:

https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I couldn't confirm anything on NVIDIA's site, the link to M40 is broken:

https://developer.nvidia.com/cuda-gpus

<!-- gh-comment-id:1988712709 --> @aosan commented on GitHub (Mar 11, 2024): OK, perhaps NVIDIA Tesla M40 is not supported by CUDA v12. According to this article, Tesla M40/Maxwell/M Series are supported up to CUDA v11: [https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/](url) I couldn't confirm anything on NVIDIA's site, the link to M40 is broken: [https://developer.nvidia.com/cuda-gpus](url)
Author
Owner

@dhiltgen commented on GitHub (Mar 11, 2024):

@kopigeek-labs can you try only passing the GTX 1660 through to the container and see if it is able to discover without the 999 (unknown) error from the nvidia management library? That's a more modern Compute Capability.

Another experiment to try is installing on the host directly and see if that works to help isolate if this is a container image/runtime bug.

<!-- gh-comment-id:1989101697 --> @dhiltgen commented on GitHub (Mar 11, 2024): @kopigeek-labs can you try only passing the GTX 1660 through to the container and see if it is able to discover without the 999 (unknown) error from the nvidia management library? That's a more modern Compute Capability. Another experiment to try is installing on the host directly and see if that works to help isolate if this is a container image/runtime bug.
Author
Owner

@dhiltgen commented on GitHub (Mar 26, 2024):

One other thing to try. For systems showing the "unknown error" or "999" error on nvidia GPUs, try checking dmesg logs (dmesg -l err) to see if there's anything interesting being reported by the nvidia drivers.

<!-- gh-comment-id:2021626246 --> @dhiltgen commented on GitHub (Mar 26, 2024): One other thing to try. For systems showing the "unknown error" or "999" error on nvidia GPUs, try checking `dmesg` logs (`dmesg -l err`) to see if there's anything interesting being reported by the nvidia drivers.
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

If you're still having troubles, please give the above suggestions a try and let us know.

<!-- gh-comment-id:2052631570 --> @dhiltgen commented on GitHub (Apr 12, 2024): If you're still having troubles, please give the above suggestions a try and let us know.
Author
Owner

@iganev commented on GitHub (Apr 29, 2024):

I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%.

Here is my output from docker logs ollama:

time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6"
time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6"
time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]"
time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0"
time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0"
time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)"
time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]"
time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999"
time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected"
...
...
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so"
time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
...
...
llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:        CPU input buffer size   =    13.02 MiB
llama_new_context_with_model:        CPU compute buffer size =   160.00 MiB

I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit

root@docker-debian:/root/docker# nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.14.6

nvidia-smi output:

root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidiactl \
--device /dev/nvidia-modeset \
--device /dev/nvidia-uvm \
debian nvidia-smi
Sat Mar  9 15:53:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla M40 24GB                 Off |   00000000:02:00.0 Off |                  Off |
| N/A   38C    P8             16W /  250W |       0MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1660 ...    Off |   00000000:03:00.0 Off |                  N/A |
| 51%   42C    P8             12W /  125W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I'm very new to this and learning! Hope some one can point me in the right direction

Did you try to run the docker container as --privileged and to pass the actual device nodes by mounting -v /dev:/dev?

That worked for me.

I used to face the same issue, which is what lead me to this thread.

To elaborate, here's when I was initially trying and did NOT work:

docker run --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

And here's what works for me:

docker run --gpus=all -v ollama:/root/.ollama -v /dev:/dev --privileged -p 11434:11434 --name ollama ollama/ollama
<!-- gh-comment-id:2083809654 --> @iganev commented on GitHub (Apr 29, 2024): > I'm running Ollama via a docker container on Debian. For a llama2 model, my CPU utilization is at 100% while GPU remains at 0%. > > Here is my output from `docker logs ollama`: > > ``` > time=2024-03-09T14:52:42.622Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T14:52:42.623Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T14:52:42.623Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T14:52:42.623Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T14:52:46.425Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu_avx rocm_v60000 cpu_avx2 cuda_v11 cpu]" > time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T14:52:46.425Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T14:52:46.426Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T14:52:46.434Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T14:52:46.434Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T14:52:46.434Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:12:39.692Z level=INFO source=images.go:800 msg="total blobs: 6" > time=2024-03-09T15:12:39.694Z level=INFO source=images.go:807 msg="total unused blobs removed: 6" > time=2024-03-09T15:12:39.695Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:12:39.695Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:12:43.522Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx cpu_avx2 cuda_v11]" > time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:12:43.523Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:12:43.525Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:12:43.535Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:12:43.535Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:12:43.535Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:25:32.983Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T15:25:32.984Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T15:25:32.984Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:25:32.985Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:25:36.686Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cpu_avx2 cuda_v11]" > time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:25:36.686Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:25:36.688Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:25:36.698Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:25:36.698Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:25:36.698Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:28:43.196Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T15:28:43.198Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T15:28:43.198Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:28:43.199Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:28:46.997Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cpu cpu_avx rocm_v60000 cuda_v11 cpu_avx2]" > time=2024-03-09T15:28:46.997Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:28:46.998Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:28:46.999Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:28:47.010Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:28:47.010Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:28:47.010Z level=INFO source=routes.go:1042 msg="no GPU detected" > time=2024-03-09T15:33:09.444Z level=INFO source=images.go:800 msg="total blobs: 0" > time=2024-03-09T15:33:09.444Z level=INFO source=images.go:807 msg="total unused blobs removed: 0" > time=2024-03-09T15:33:09.445Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.28)" > time=2024-03-09T15:33:09.445Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." > time=2024-03-09T15:33:13.264Z level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [cuda_v11 cpu_avx cpu rocm_v60000 cpu_avx2]" > time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-09T15:33:13.264Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-09T15:33:13.278Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14]" > time=2024-03-09T15:33:13.287Z level=INFO source=gpu.go:249 msg="Unable to load CUDA management library /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14: nvml vram init failure: 999" > time=2024-03-09T15:33:13.287Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:33:13.287Z level=INFO source=routes.go:1042 msg="no GPU detected" > ... > ... > time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:36:53.196Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-09T15:36:53.196Z level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU" > loading library /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so > time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /root/.ollama/assets/0.1.28/cpu_avx2/libext_server.so" > time=2024-03-09T15:36:53.200Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" > ... > ... > llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB > llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB > llama_new_context_with_model: CPU input buffer size = 13.02 MiB > llama_new_context_with_model: CPU compute buffer size = 160.00 MiB > ``` > > I can confirm that I have NVIDIA drivers installed, and also the latest version of nvidia-container-toolkit > > ``` > root@docker-debian:/root/docker# nvidia-ctk --version > NVIDIA Container Toolkit CLI version 1.14.6 > ``` > > `nvidia-smi` output: > > ``` > root@docker-debian:/root/docker# sudo docker run --rm --runtime=nvidia --gpus all \ > --device /dev/nvidia0:/dev/nvidia0 \ > --device /dev/nvidia1:/dev/nvidia1 \ > --device /dev/nvidiactl \ > --device /dev/nvidia-modeset \ > --device /dev/nvidia-uvm \ > debian nvidia-smi > Sat Mar 9 15:53:14 2024 > +-----------------------------------------------------------------------------------------+ > | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | > |-----------------------------------------+------------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | > | | | MIG M. | > |=========================================+========================+======================| > | 0 Tesla M40 24GB Off | 00000000:02:00.0 Off | Off | > | N/A 38C P8 16W / 250W | 0MiB / 24576MiB | 0% Default | > | | | N/A | > +-----------------------------------------+------------------------+----------------------+ > | 1 NVIDIA GeForce GTX 1660 ... Off | 00000000:03:00.0 Off | N/A | > | 51% 42C P8 12W / 125W | 0MiB / 6144MiB | 0% Default | > | | | N/A | > +-----------------------------------------+------------------------+----------------------+ > > +-----------------------------------------------------------------------------------------+ > | Processes: | > | GPU GI CI PID Type Process name GPU Memory | > | ID ID Usage | > |=========================================================================================| > | No running processes found | > +-----------------------------------------------------------------------------------------+ > ``` > > I'm very new to this and learning! Hope some one can point me in the right direction Did you try to run the docker container as `--privileged` and to pass the actual device nodes by mounting `-v /dev:/dev`? That worked for me. I used to face the same issue, which is what lead me to this thread. To elaborate, here's when I was initially trying and did **NOT** work: ``` docker run --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ``` And here's what works for me: ``` docker run --gpus=all -v ollama:/root/.ollama -v /dev:/dev --privileged -p 11434:11434 --name ollama ollama/ollama ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1860