[GH-ISSUE #9819] fail to use gpu #68485

Closed
opened 2026-05-04 14:08:21 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @yongjer on GitHub (Mar 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9819

What is the issue?

start the container:

docker run -d \
    --gpus all \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    -e OLLAMA_FLASH_ATTENTION=1 \
    ollama/ollama

using ollama ps shows that the whole model running on CPU
gpu: RTX 4060Ti

Relevant log output

container log:

2025/03/17 11:20:06 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-17T11:20:06.479Z level=INFO source=images.go:432 msg="total blobs: 31"
time=2025-03-17T11:20:06.479Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-17T11:20:06.480Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.1)"
time=2025-03-17T11:20:06.480Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-17T11:20:06.482Z level=WARN source=gpu.go:605 msg="unknown error initializing cuda driver library /usr/lib/x86_64-linux-gnu/libcuda.so.570.86.15: cuda driver library init failure: 999. see https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for more information"
time=2025-03-17T11:20:06.483Z level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-03-17T11:20:06.483Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.2 GiB" available="23.4 GiB"
[GIN] 2025/03/17 - 11:20:13 | 200 |      42.099µs |       127.0.0.1 | GET      "/api/version"

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.6.1

Originally created by @yongjer on GitHub (Mar 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9819 ### What is the issue? start the container: ``` docker run -d \ --gpus all \ -v ollama:/root/.ollama \ -p 11434:11434 \ --name ollama \ -e OLLAMA_FLASH_ATTENTION=1 \ ollama/ollama ``` using ```ollama ps``` shows that the whole model running on CPU gpu: ```RTX 4060Ti``` ### Relevant log output ```shell container log: 2025/03/17 11:20:06 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-17T11:20:06.479Z level=INFO source=images.go:432 msg="total blobs: 31" time=2025-03-17T11:20:06.479Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-17T11:20:06.480Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.1)" time=2025-03-17T11:20:06.480Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-17T11:20:06.482Z level=WARN source=gpu.go:605 msg="unknown error initializing cuda driver library /usr/lib/x86_64-linux-gnu/libcuda.so.570.86.15: cuda driver library init failure: 999. see https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for more information" time=2025-03-17T11:20:06.483Z level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-03-17T11:20:06.483Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.2 GiB" available="23.4 GiB" [GIN] 2025/03/17 - 11:20:13 | 200 | 42.099µs | 127.0.0.1 | GET "/api/version" ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.1
GiteaMirror added the bug label 2026-05-04 14:08:21 -05:00
Author
Owner

@yongjer commented on GitHub (Mar 17, 2025):

using nvidia-smi in docker:

Mon Mar 17 11:27:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:01:00.0  On |                  N/A |
|  0%   37C    P8             13W /  165W |     786MiB /  16380MiB |     36%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
<!-- gh-comment-id:2729144972 --> @yongjer commented on GitHub (Mar 17, 2025): using ```nvidia-smi``` in docker: ``` Mon Mar 17 11:27:12 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.86.15 Driver Version: 570.86.15 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 On | N/A | | 0% 37C P8 13W / 165W | 786MiB / 16380MiB | 36% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+ ```
Author
Owner

@yongjer commented on GitHub (Mar 17, 2025):

solve it by running sudo nvidia-ctk runtime configure --runtime=docker then restart

<!-- gh-comment-id:2729201869 --> @yongjer commented on GitHub (Mar 17, 2025): solve it by running ```sudo nvidia-ctk runtime configure --runtime=docker``` then restart
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68485