[GH-ISSUE #4563] Nvidia 555 driver does not work with Ollama #49374

New Issue

GiteaMirror · 2026-04-28T11:36:01-05:00

GiteaMirror commented

2026-04-28 11:36:01 -05:00

Originally created by @ginestopo on GitHub (May 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4563

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I just updated nvidia drivers in my host to this version. I have a RTX4070ti.

Then, when I run ollama inside my container (my container is running Ubuntu 20.04), Ollama is not using the GPU (I can tell because it is at 1% when responding)

This is my log when running Ollama

2024/05/21 21:31:02 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-21T21:31:02.076Z level=INFO source=images.go:704 msg="total blobs: 5"
time=2024-05-21T21:31:02.076Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-21T21:31:02.077Z level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)"
time=2024-05-21T21:31:02.077Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama53751129/runners
time=2024-05-21T21:31:04.593Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]"
time=2024-05-21T21:31:04.669Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="15.6 GiB" available="165.9 MiB"
[GIN] 2024/05/21 - 21:31:25 | 200 |      34.776µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/05/21 - 21:31:25 | 404 |     146.669µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/05/21 - 21:31:25 | 200 |  776.506553ms |       127.0.0.1 | POST     "/api/pull"
[GIN] 2024/05/21 - 21:31:35 | 200 |      19.307µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/05/21 - 21:31:35 | 200 |     482.647µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/05/21 - 21:31:35 | 200 |     323.505µs |       127.0.0.1 | POST     "/api/show"
time=2024-05-21T21:31:36.117Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="145.4 MiB" memory.required.full="4.6 GiB" memory.required.partial="794.5 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-05-21T21:31:36.118Z level=INFO source=server.go:320 msg="starting llama server" cmd="/tmp/ollama53751129/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 34935"
time=2024-05-21T21:31:36.118Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-05-21T21:31:36.119Z level=INFO source=server.go:504 msg="waiting for llama runner to start responding"
time=2024-05-21T21:31:36.119Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="952d03d" tid="140248322381696" timestamp=1716327096
INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140248322381696" timestamp=1716327096 total_threads=16
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="34935" tid="140248322381696" timestamp=1716327096
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
time=2024-05-21T21:31:36.370Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
INFO [main] model loaded | tid="140248322381696" timestamp=1716327097
time=2024-05-21T21:31:37.123Z level=INFO source=server.go:545 msg="llama runner started in 1.00 seconds"
[GIN] 2024/05/21 - 21:31:37 | 200 |   1.64966928s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/05/21 - 21:31:43 | 200 |   4.29836565s |       127.0.0.1 | POST     "/api/chat"

Edit:
In addition, my container detects successfully the gpu passthrough when doing nvidia-smi.

OS

Docker

GPU

Nvidia

CPU

AMD

Ollama version

0.1.38

Originally created by @ginestopo on GitHub (May 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4563 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I just updated nvidia drivers in my host to this version. I have a RTX4070ti. ![imagen](https://github.com/ollama/ollama/assets/12192256/5eb48002-a7dd-4d1e-bc9f-f64564a216c3) Then, when I run ollama inside my container (my container is running Ubuntu 20.04), Ollama is not using the GPU (I can tell because it is at 1% when responding) This is my log when running Ollama ``` 2024/05/21 21:31:02 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-05-21T21:31:02.076Z level=INFO source=images.go:704 msg="total blobs: 5" time=2024-05-21T21:31:02.076Z level=INFO source=images.go:711 msg="total unused blobs removed: 0" time=2024-05-21T21:31:02.077Z level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)" time=2024-05-21T21:31:02.077Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama53751129/runners time=2024-05-21T21:31:04.593Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]" time=2024-05-21T21:31:04.669Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="15.6 GiB" available="165.9 MiB" [GIN] 2024/05/21 - 21:31:25 | 200 | 34.776µs | 127.0.0.1 | HEAD "/" [GIN] 2024/05/21 - 21:31:25 | 404 | 146.669µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/05/21 - 21:31:25 | 200 | 776.506553ms | 127.0.0.1 | POST "/api/pull" [GIN] 2024/05/21 - 21:31:35 | 200 | 19.307µs | 127.0.0.1 | HEAD "/" [GIN] 2024/05/21 - 21:31:35 | 200 | 482.647µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/05/21 - 21:31:35 | 200 | 323.505µs | 127.0.0.1 | POST "/api/show" time=2024-05-21T21:31:36.117Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="145.4 MiB" memory.required.full="4.6 GiB" memory.required.partial="794.5 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-05-21T21:31:36.118Z level=INFO source=server.go:320 msg="starting llama server" cmd="/tmp/ollama53751129/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 34935" time=2024-05-21T21:31:36.118Z level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-05-21T21:31:36.119Z level=INFO source=server.go:504 msg="waiting for llama runner to start responding" time=2024-05-21T21:31:36.119Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="952d03d" tid="140248322381696" timestamp=1716327096 INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140248322381696" timestamp=1716327096 total_threads=16 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="34935" tid="140248322381696" timestamp=1716327096 llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors time=2024-05-21T21:31:36.370Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens definition check successful ( 256/128256 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_tensors: ggml ctx size = 0.15 MiB llm_load_tensors: CPU buffer size = 4437.80 MiB ....................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 256.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: CPU output buffer size = 0.50 MiB llama_new_context_with_model: CPU compute buffer size = 258.50 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 INFO [main] model loaded | tid="140248322381696" timestamp=1716327097 time=2024-05-21T21:31:37.123Z level=INFO source=server.go:545 msg="llama runner started in 1.00 seconds" [GIN] 2024/05/21 - 21:31:37 | 200 | 1.64966928s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/05/21 - 21:31:43 | 200 | 4.29836565s | 127.0.0.1 | POST "/api/chat" ``` Edit: In addition, my container detects successfully the gpu passthrough when doing nvidia-smi. ### OS Docker ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.38

GiteaMirror added the docker bug nvidia labels 2026-04-28 11:36:02 -05:00

GiteaMirror closed this issue

2026-04-28 11:36:05 -05:00

GiteaMirror commented

2026-04-28 11:36:06 -05:00

@Zyfax commented on GitHub (May 21, 2024):

Interestingly, I experienced a similar phenomenon when I upgraded from driver version 535 to 550 - my CPU usage remained high until I rebooted the host machine.
As for the nvidia-smi command, if it outputs an error message, it's likely indicative of a problem with the connection between the NVIDIA driver and graphics card.

@Zyfax commented on GitHub (May 21, 2024): Interestingly, I experienced a similar phenomenon when I upgraded from driver version 535 to 550 - my CPU usage remained high until I rebooted the host machine. As for the nvidia-smi command, if it outputs an error message, it's likely indicative of a problem with the connection between the NVIDIA driver and graphics card.

GiteaMirror commented

2026-04-28 11:36:07 -05:00

@pdevine commented on GitHub (May 21, 2024):

What's the output of ollama ps?

@pdevine commented on GitHub (May 21, 2024): What's the output of `ollama ps`?

GiteaMirror commented

2026-04-28 11:36:07 -05:00

@dhiltgen commented on GitHub (May 22, 2024):

Please add -e OLLAMA_DEBUG=1 to your container and share the log so we can see a little more detail on why it can't discover the GPU. Also try docker run --gpus all ubuntu nvidia-smi to see if the Docker + Nvidia container runtime has become unhealthy.

@dhiltgen commented on GitHub (May 22, 2024): Please add `-e OLLAMA_DEBUG=1` to your container and share the log so we can see a little more detail on why it can't discover the GPU. Also try `docker run --gpus all ubuntu nvidia-smi` to see if the Docker + Nvidia container runtime has become unhealthy.

GiteaMirror commented

2026-04-28 11:36:08 -05:00

@brodieferguson commented on GitHub (May 22, 2024):

I don't believe it's related to Ollama. I also had this issue. I discovered it when setting up a container unrelated to Ollama. It's the new 555 drivers, and affects any cuda/gpu related container (I tested multiple including base pytorch cuda docker image).

For example, trying to list Cuda capability in pytorch docker gives a "CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error?"

Nvidia computer container benchmark said 1 device requested, 0 available.

Issue was instantly fixed by reverting to prior drivers.

@brodieferguson commented on GitHub (May 22, 2024): I don't believe it's related to Ollama. I also had this issue. I discovered it when setting up a container unrelated to Ollama. It's the new 555 drivers, and affects any cuda/gpu related container (I tested multiple including base pytorch cuda docker image). For example, trying to list Cuda capability in pytorch docker gives a "CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error?" Nvidia computer container benchmark said 1 device requested, 0 available. Issue was instantly fixed by reverting to prior drivers.

GiteaMirror commented

2026-04-28 11:36:08 -05:00

@ginestopo commented on GitHub (May 22, 2024):

Thank you very much @brodieferguson ! this seemed to do the trick. Nevertheless, I am not quite happy to downgrade my GPU drivers in order to make Ollama work. For that reason I wouldn't consider this issue resolved and I will cooperate to provide more info to solve this problem in case it is needed.

@dhiltgen I just downgraded my drivers to the immediate version and Ollama started to use the GPU (RTX4070ti) instantly:

There might be an incompatibility with the new drivers as @brodieferguson had the same problem and probably many other users too.

Thank you very much both. I hope I can enjoy Ollama with the latest drivers soon.

@ginestopo commented on GitHub (May 22, 2024): Thank you very much @brodieferguson ! this seemed to do the trick. Nevertheless, I am not quite happy to downgrade my GPU drivers in order to make Ollama work. For that reason I wouldn't consider this issue resolved and I will cooperate to provide more info to solve this problem in case it is needed. @dhiltgen I just downgraded my drivers to the immediate version and Ollama started to use the GPU (RTX4070ti) instantly: ![imagen](https://github.com/ollama/ollama/assets/12192256/2941214c-6f57-4513-b9f1-0365aeae2624) There might be an incompatibility with the new drivers as @brodieferguson had the same problem and probably many other users too. Thank you very much both. I hope I can enjoy Ollama with the latest drivers soon.

GiteaMirror commented

2026-04-28 11:36:09 -05:00

@nerdpudding commented on GitHub (May 24, 2024):

same issue here:
Ollama worked fine on GPU before upgrading both Ollama and NVIDIA previous drivers so far I know. I am on Windows 11 with WSL2 and using Docker Desktop. This morning I did two things:

noticed new Nvidia drivers available: 555.85; It also included a PhysX update this time (first time I saw that in years actually): version 9.23.1019 --> installed both
docker pull ollama/ollama to get 0.1.38 version (I was on 0.1.37 before)
deleted the existing OIlama container. Then did: docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Later I noticed that ollama now no longer uses my GPU, noticing it was much slower and looking at resources there GPU memory was not used. Using the newly available ollama ps command confirmed the same thing:
NAME ID SIZE PROCESSOR UNTIL
mistral:latest 61e88e884507 4.6 GB 100% CPU 4 minutes from now

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 23 G /Xwayland N/A |
| 0 N/A N/A 27 G /Xwayland N/A |
| 0 N/A N/A 31 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+

I will downgrade drivers as well, but there clearly is a issue with ollama and these drivers

@nerdpudding commented on GitHub (May 24, 2024): same issue here: Ollama worked fine on GPU before upgrading both Ollama and NVIDIA previous drivers so far I know. I am on Windows 11 with WSL2 and using Docker Desktop. This morning I did two things: 1. noticed new Nvidia drivers available: 555.85; It also included a PhysX update this time (first time I saw that in years actually): version 9.23.1019 --> installed both 2. docker pull ollama/ollama to get 0.1.38 version (I was on 0.1.37 before) 3. deleted the existing OIlama container. Then did: docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama Later I noticed that ollama now no longer uses my GPU, noticing it was much slower and looking at resources there GPU memory was not used. Using the newly available ollama ps command confirmed the same thing: NAME ID SIZE PROCESSOR UNTIL mistral:latest 61e88e884507 4.6 GB 100% CPU 4 minutes from now nvidia-smi clearly showed GPU is available: nvidia-smi Fri May 24 07:34:10 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:0A:00.0 On | Off | | 50% 27C P8 52W / 530W | 933MiB / 24564MiB | 1% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 23 G /Xwayland N/A | | 0 N/A N/A 27 G /Xwayland N/A | | 0 N/A N/A 31 G /Xwayland N/A | +-----------------------------------------------------------------------------------------+ I will downgrade drivers as well, but there clearly is a issue with ollama and these drivers

GiteaMirror commented

2026-04-28 11:36:09 -05:00

@nerdpudding commented on GitHub (May 24, 2024):

confirmed, reverted to 552.42 drivers, GPU is now used again.
ollama ps
NAME ID SIZE PROCESSOR UNTIL
mistral:latest 61e88e884507 5.1 GB 100% GPU 4 minutes from now

@nerdpudding commented on GitHub (May 24, 2024): confirmed, reverted to 552.42 drivers, GPU is now used again. ollama ps NAME ID SIZE PROCESSOR UNTIL mistral:latest 61e88e884507 5.1 GB 100% GPU 4 minutes from now

GiteaMirror commented

2026-04-28 11:36:10 -05:00

@ginestopo commented on GitHub (May 24, 2024):

@nerdpudding thanks for your contribution. We can confirm this is not an isolated issue and nvidia drivers 555.85 causes ollama not to use the gpu for some reason.

@ginestopo commented on GitHub (May 24, 2024): @nerdpudding thanks for your contribution. We can confirm this is not an isolated issue and nvidia drivers 555.85 causes ollama not to use the gpu for some reason.

GiteaMirror commented

2026-04-28 11:36:10 -05:00

@TSavo commented on GitHub (May 26, 2024):

Can confirm, no CUDA docker image works with 555. downgrading to 552 fixes the issue. This is unrelated to ollama and needs to be fixed by docker/nvidia.

@TSavo commented on GitHub (May 26, 2024): Can confirm, no CUDA docker image works with 555. downgrading to 552 fixes the issue. This is unrelated to ollama and needs to be fixed by docker/nvidia.

GiteaMirror commented

2026-04-28 11:36:11 -05:00

@jmorganca commented on GitHub (May 26, 2024):

Hi folks it seems the 555 Nvidia driver branch is not working with Ollama (and other projects that integrate llama.cpp). We're working to resolve this together – in the meantime downgrading to a prior version will fix the issue. So sorry about this and will post more updates here.

@jmorganca commented on GitHub (May 26, 2024): Hi folks it seems the 555 Nvidia driver branch is not working with Ollama (and other projects that integrate llama.cpp). We're working to resolve this together – in the meantime downgrading to a prior version will fix the issue. So sorry about this and will post more updates here.

GiteaMirror commented

2026-04-28 11:36:11 -05:00

@jmorganca commented on GitHub (May 26, 2024):

Hi all, this seems to be from the nvidia_uvm kernel module not being loaded. You can run:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

to load them manually. A fix is coming with the install script.

Also, adding:

nvidia
nvidia-uvm

to /etc/modules-load.d/nvidia.conf will make sure they are loaded on startup

@jmorganca commented on GitHub (May 26, 2024): Hi all, this seems to be from the `nvidia_uvm` kernel module not being loaded. You can run: ``` sudo modprobe nvidia sudo modprobe nvidia_uvm ``` to load them manually. A fix is coming with the install script. Also, adding: ``` nvidia nvidia-uvm ``` to `/etc/modules-load.d/nvidia.conf` will make sure they are loaded on startup

GiteaMirror commented

2026-04-28 11:36:13 -05:00

@jmorganca commented on GitHub (May 26, 2024):

Hi folks it seems this is from the new driver packages not loading the nvidia_uvm kernel module from what I can see. It should work to re-load the module:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

and then to keep it loaded, edit the conrfig for nvidia-persistenced by adding them to /etc/modules-load.d/nvidia.conf

nvidia
nvidia-uvm

@jmorganca commented on GitHub (May 26, 2024): Hi folks it seems this is from the new driver packages not loading the `nvidia_uvm` kernel module from what I can see. It should work to re-load the module: ``` sudo modprobe nvidia sudo modprobe nvidia_uvm ``` and then to keep it loaded, edit the conrfig for `nvidia-persistenced` by adding them to `/etc/modules-load.d/nvidia.conf` ``` nvidia nvidia-uvm ```

GiteaMirror commented

2026-04-28 11:36:14 -05:00

@ginestopo commented on GitHub (May 27, 2024):

@jmorganca Thanks a lot for the fix! 💙

@ginestopo commented on GitHub (May 27, 2024): @jmorganca Thanks a lot for the fix! 💙

GiteaMirror commented

2026-04-28 11:36:15 -05:00

@falmanna commented on GitHub (May 27, 2024):

Hi folks it seems this is from the new driver packages not loading the nvidia_uvm kernel module from what I can see. It should work to re-load the module:
sudo modprobe nvidia
sudo modprobe nvidia_uvm
and then to keep it loaded, edit the conrfig for nvidia-persistenced by adding them to /etc/modules-load.d/nvidia.conf
nvidia
nvidia-uvm

Not sure how to apply these on WLS docker installation. I downgraded my driver for now and it is working again.

@falmanna commented on GitHub (May 27, 2024): > Hi folks it seems this is from the new driver packages not loading the `nvidia_uvm` kernel module from what I can see. It should work to re-load the module: > > ``` > sudo modprobe nvidia > sudo modprobe nvidia_uvm > ``` > > and then to keep it loaded, edit the conrfig for `nvidia-persistenced` by adding them to `/etc/modules-load.d/nvidia.conf` > > ``` > nvidia > nvidia-uvm > ``` Not sure how to apply these on WLS docker installation. I downgraded my driver for now and it is working again.

GiteaMirror commented

2026-04-28 11:36:15 -05:00

@nerdpudding commented on GitHub (Jun 3, 2024):

Same issue here. The modprobe nvidia fix doesn't seem to work with WSL 2.

I’m not an expert, but I tried different Docker builds using the install script with pytorch:2.3.0-cuda12.1-cudnn8-runtime as the base image. Everything works fine with older NVIDIA drivers, but not with 555.85.

Even though nvidia-smi shows the GPU is available (both in WSL and the running container), Ollama defaults to CPU. The debug logs look like this:

time=2024-06-03T11:59:56.076Z level=DEBUG source=gpu.go:355 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.1 error="nvcuda init failure: 500"

Any ideas or suggestions on what might be causing this with the latest drivers? Should we just await if a newer nvidia driver release fixes it?

@nerdpudding commented on GitHub (Jun 3, 2024): Same issue here. The modprobe nvidia fix doesn't seem to work with WSL 2. I’m not an expert, but I tried different Docker builds using the install script with pytorch:2.3.0-cuda12.1-cudnn8-runtime as the base image. Everything works fine with older NVIDIA drivers, but not with 555.85. Even though nvidia-smi shows the GPU is available (both in WSL and the running container), Ollama defaults to CPU. The debug logs look like this: time=2024-06-03T11:59:56.076Z level=DEBUG source=gpu.go:355 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.1 error="nvcuda init failure: 500" Any ideas or suggestions on what might be causing this with the latest drivers? Should we just await if a newer nvidia driver release fixes it?

GiteaMirror commented

2026-04-28 11:36:15 -05:00

@ginestopo commented on GitHub (Jun 3, 2024):

@nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

@ginestopo commented on GitHub (Jun 3, 2024): @nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

GiteaMirror commented

2026-04-28 11:36:15 -05:00

@nerdpudding commented on GitHub (Jun 3, 2024):

@nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

Yes. First I tried pulling the lates official docker image.
Then I used this custom docker build I must admit that (chatGPT created that dockerfile for me though... it is probably not ideal...), which uses the install.sh and installs v0.1.41:

Use the official PyTorch image with CUDA support

FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

Install dependencies

RUN apt-get update && apt-get install -y
curl
sudo
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*

Create a non-root user

RUN useradd -m -s /bin/bash ollama
RUN echo 'ollama ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER ollama
WORKDIR /home/ollama

Install Ollama

RUN curl -fsSL https://ollama.com/install.sh | sh

Ensure the volumes are correctly set up

VOLUME ["/root/.ollama"]

Set environment variables for CUDA and debugging

ENV OLLAMA_DEBUG=1
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
ENV PATH=/usr/local/cuda/bin:$PATH

Ensure correct library links

RUN sudo mkdir -p /usr/local/cuda/lib64/stubs && sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/local/cuda/lib64/stubs/libcuda.so

Run Ollama

CMD ["ollama", "serve"]

This works fine with the old drivers, not with the new.

@nerdpudding commented on GitHub (Jun 3, 2024): > @nerdpudding Did you try using Ollama v0.1.41 (the latest release) ? Yes. First I tried pulling the lates official docker image. Then I used this custom docker build I must admit that (chatGPT created that dockerfile for me though... it is probably not ideal...), which uses the install.sh and installs v0.1.41: # Use the official PyTorch image with CUDA support FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime # Install dependencies RUN apt-get update && apt-get install -y \ curl \ sudo \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* # Create a non-root user RUN useradd -m -s /bin/bash ollama RUN echo 'ollama ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers USER ollama WORKDIR /home/ollama # Install Ollama RUN curl -fsSL https://ollama.com/install.sh | sh # Ensure the volumes are correctly set up VOLUME ["/root/.ollama"] # Set environment variables for CUDA and debugging ENV OLLAMA_DEBUG=1 ENV NVIDIA_VISIBLE_DEVICES=all ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH ENV PATH=/usr/local/cuda/bin:$PATH # Ensure correct library links RUN sudo mkdir -p /usr/local/cuda/lib64/stubs && sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/local/cuda/lib64/stubs/libcuda.so # Run Ollama CMD ["ollama", "serve"] This works fine with the old drivers, not with the new.

GiteaMirror commented

2026-04-28 11:36:16 -05:00

@brodieferguson commented on GitHub (Jun 3, 2024):

@nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest.

https://github.com/NVIDIA/nvidia-container-toolkit/issues/520

If you see this symptom using Docker CE on Linux under WSL2, please update your nvidia-container-toolkit to 1.14.4 or newer.

If you see this symptom using Docker Desktop, a fix (to upgrade the bundled nvidia-container-toolkit) is in progress; we will reply back here when it is published. Until that fix is ready, if you are using Docker Desktop, please use NVIDIA Driver 552.xx or earlier.

@brodieferguson commented on GitHub (Jun 3, 2024): @nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest. [https://github.com/NVIDIA/nvidia-container-toolkit/issues/520](https://github.com/NVIDIA/nvidia-container-toolkit/issues/520) > - If you see this symptom using Docker CE on Linux under WSL2, please update your nvidia-container-toolkit to 1.14.4 or newer. > - If you see this symptom using Docker Desktop, a fix (to upgrade the bundled nvidia-container-toolkit) is in progress; we will reply back here when it is published. Until that fix is ready, if you are using Docker Desktop, please use NVIDIA Driver 552.xx or earlier.

GiteaMirror commented

2026-04-28 11:36:16 -05:00

@nerdpudding commented on GitHub (Jun 3, 2024):

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

@nerdpudding commented on GitHub (Jun 3, 2024): Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

GiteaMirror commented

2026-04-28 11:36:17 -05:00

@zimdin12 commented on GitHub (Jun 6, 2024):

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

@zimdin12 commented on GitHub (Jun 6, 2024): > Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-) Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

GiteaMirror commented

2026-04-28 11:36:17 -05:00

@nerdpudding commented on GitHub (Jun 6, 2024):

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

Not sure, but to my understanding, it is a NVIDIA issue so...
By the way, I noticed newer drivers where released today (555.99) and hoped it fixed it after reading this:

https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/543951/geforce-grd-55599-feedback-thread-released-6424/
Fixed General Bugs:
CUDA 12.5 does not work with CUDA enabled Docker images [4668302]

I just installed, rebooted and tested with that newer driver using both my own docker file (which uses pytorch:2.3.0-cuda12.1-cudnn8-runtime and then installs ollama using the install.sh script) and the latest official Ollama Docker image. Both still only use the CPU. I reverted back to 551.44 and immediately GPU was used again. So apparently that 'general fix' still does not apply to WLS2 with Docker Desktop yet then, but only for CE maybe.

I'm not sure if there is another open issue on it here, but I guess it is a NVIDIA issue, so we probably just have to look on forum/threads there and wait until they release a fix with newer drivers.

@nerdpudding commented on GitHub (Jun 6, 2024): > > Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-) > > Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D Not sure, but to my understanding, it is a NVIDIA issue so... By the way, I noticed newer drivers where released today (555.99) and hoped it fixed it after reading this: https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/543951/geforce-grd-55599-feedback-thread-released-6424/ _Fixed General Bugs: _CUDA 12.5 does not work with CUDA enabled Docker images [4668302]__ I just installed, rebooted and tested with that newer driver using both my own docker file (which uses pytorch:2.3.0-cuda12.1-cudnn8-runtime and then installs ollama using the install.sh script) and the latest official Ollama Docker image. Both still only use the CPU. I reverted back to 551.44 and immediately GPU was used again. So apparently that 'general fix' still does not apply to WLS2 with Docker Desktop yet then, but only for CE maybe. I'm not sure if there is another open issue on it here, but I guess it is a NVIDIA issue, so we probably just have to look on forum/threads there and wait until they release a fix with newer drivers.

GiteaMirror commented

2026-04-28 11:36:17 -05:00

@falmanna commented on GitHub (Jun 6, 2024):

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

I subscribed to this one

@nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest.

NVIDIA/nvidia-container-toolkit#520

@falmanna commented on GitHub (Jun 6, 2024): > Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D I subscribed to this one > @nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest. > > [NVIDIA/nvidia-container-toolkit#520](https://github.com/NVIDIA/nvidia-container-toolkit/issues/520) >

GiteaMirror commented

2026-04-28 11:36:17 -05:00

@nerdpudding commented on GitHub (Jun 7, 2024):

Docker has released an update for Docker Desktop.

See https://docs.docker.com/desktop/release-notes/
Upgrades:
--> NVIDIA Container Toolkit v1.15.0

I just tested it and GPU is used now with Nvidia drivers 555.99 after upgrading Docker Desktop to 4.31.0

This fixed it for me!

So if you are using Docker on Windows with WSL 2 (Now not only for Docker CE, but also Docker Desktop), after updating, it will work again.

@nerdpudding commented on GitHub (Jun 7, 2024): Docker has released an update for Docker Desktop. See https://docs.docker.com/desktop/release-notes/ Upgrades: --> NVIDIA Container Toolkit v1.15.0 I just tested it and GPU is used now with Nvidia drivers 555.99 after upgrading Docker Desktop to 4.31.0 This fixed it for me! So if you are using Docker on Windows with WSL 2 (Now not only for Docker CE, but also Docker Desktop), after updating, it will work again.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#49374