[GH-ISSUE #14387] NVIDIA A800-SXM4-80GB Centos7.9 system ollama running container video memory always total_vram="0 B" How to solve and troubleshoot? #71408

Closed
opened 2026-05-05 01:32:40 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @levelol on GitHub (Feb 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14387

What is the issue?

The container has been successfully installed with Nvidia's runtime, and they can all be recognized.

Relevant log output

time=2026-02-24T07:12:24.748Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-02-24T07:12:24.748Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false"
time=2026-02-24T07:12:24.752Z level=INFO source=images.go:473 msg="total blobs: 10"
time=2026-02-24T07:12:24.753Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-24T07:12:24.753Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.0)"
time=2026-02-24T07:12:24.754Z level=DEBUG source=sched.go:147 msg="starting llm scheduler"
time=2026-02-24T07:12:24.754Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-24T07:12:24.755Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42227"
time=2026-02-24T07:12:24.755Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12
time=2026-02-24T07:12:24.879Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.745535ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-02-24T07:12:24.880Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41153"
time=2026-02-24T07:12:24.880Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
time=2026-02-24T07:12:24.979Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.725134ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-02-24T07:12:24.979Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-02-24T07:12:24.979Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0
time=2026-02-24T07:12:24.979Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=225.481307ms
time=2026-02-24T07:12:24.980Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="2015.3 GiB" available="1951.5 GiB"
time=2026-02-24T07:12:24.980Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[GIN] 2026/02/24 - 07:13:17 | 200 |     234.575µs |       127.0.0.1 | GET      "/api/version"

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

ollama version is 0.17.0

Originally created by @levelol on GitHub (Feb 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14387 ### What is the issue? The container has been successfully installed with Nvidia's runtime, and they can all be recognized. ### Relevant log output ```shell time=2026-02-24T07:12:24.748Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-02-24T07:12:24.748Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false" time=2026-02-24T07:12:24.752Z level=INFO source=images.go:473 msg="total blobs: 10" time=2026-02-24T07:12:24.753Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-24T07:12:24.753Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.0)" time=2026-02-24T07:12:24.754Z level=DEBUG source=sched.go:147 msg="starting llm scheduler" time=2026-02-24T07:12:24.754Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-24T07:12:24.755Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42227" time=2026-02-24T07:12:24.755Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2026-02-24T07:12:24.879Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.745535ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-02-24T07:12:24.880Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41153" time=2026-02-24T07:12:24.880Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2026-02-24T07:12:24.979Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=99.725134ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-02-24T07:12:24.979Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-02-24T07:12:24.979Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0 time=2026-02-24T07:12:24.979Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=225.481307ms time=2026-02-24T07:12:24.980Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="2015.3 GiB" available="1951.5 GiB" time=2026-02-24T07:12:24.980Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [GIN] 2026/02/24 - 07:13:17 | 200 | 234.575µs | 127.0.0.1 | GET "/api/version" ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version ollama version is 0.17.0
GiteaMirror added the bug label 2026-05-05 01:32:40 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 24, 2026):

Set OLLAMA_DEBUG=2, restart the server and post the log from the start to the line that contains inference compute.

<!-- gh-comment-id:3950007232 --> @rick-github commented on GitHub (Feb 24, 2026): Set `OLLAMA_DEBUG=2`, restart the server and post the log from the start to the line that contains `inference compute`.
Author
Owner

@levelol commented on GitHub (Feb 24, 2026):

1、 NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4

2、OLLAMA_DEBUG=2 logs。。。。。

time=2026-02-24T08:36:07.638Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-02-24T08:36:07.638Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false"
time=2026-02-24T08:36:07.639Z level=INFO source=images.go:473 msg="total blobs: 10"
time=2026-02-24T08:36:07.639Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-24T08:36:07.640Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.0)"
time=2026-02-24T08:36:07.640Z level=DEBUG source=sched.go:147 msg="starting llm scheduler"
time=2026-02-24T08:36:07.641Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-24T08:36:07.641Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs=map[]
time=2026-02-24T08:36:07.644Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41353"
time=2026-02-24T08:36:07.644Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12
time=2026-02-24T08:36:07.677Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-24T08:36:07.678Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41353"
time=2026-02-24T08:36:07.686Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string
time=2026-02-24T08:36:07.686Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string
time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.file_type default=0
time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.name default=""
time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.description default=""
time=2026-02-24T08:36:07.686Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-02-24T08:36:07.704Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: failed to initialize CUDA: system not yet initialized
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-02-24T08:36:07.762Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-02-24T08:36:07.762Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.pooling_type default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.expert_count default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.embedding_length default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-02-24T08:36:07.764Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=78.449509ms
time=2026-02-24T08:36:07.764Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=5.061µs
time=2026-02-24T08:36:07.765Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[]
time=2026-02-24T08:36:07.765Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.213749ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-02-24T08:36:07.765Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs=map[]
time=2026-02-24T08:36:07.765Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41882"
time=2026-02-24T08:36:07.766Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
time=2026-02-24T08:36:07.786Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-24T08:36:07.787Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41882"
time=2026-02-24T08:36:07.797Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string
time=2026-02-24T08:36:07.797Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string
time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32
time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.file_type default=0
time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.name default=""
time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.description default=""
time=2026-02-24T08:36:07.797Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-02-24T08:36:07.816Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13
ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-02-24T08:36:07.863Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.pooling_type default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.expert_count default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.embedding_length default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-02-24T08:36:07.863Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=66.313469ms
time=2026-02-24T08:36:07.863Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=1.222µs
time=2026-02-24T08:36:07.863Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices=[]
time=2026-02-24T08:36:07.864Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=98.482865ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
time=2026-02-24T08:36:07.864Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-02-24T08:36:07.864Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0
time=2026-02-24T08:36:07.864Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[]
time=2026-02-24T08:36:07.864Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=223.681034ms
time=2026-02-24T08:36:07.864Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="2015.3 GiB" available="1951.4 GiB"
time=2026-02-24T08:36:07.864Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096

<!-- gh-comment-id:3950089739 --> @levelol commented on GitHub (Feb 24, 2026): 1、 NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 2、OLLAMA_DEBUG=2 logs。。。。。 ``` time=2026-02-24T08:36:07.638Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-02-24T08:36:07.638Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false" time=2026-02-24T08:36:07.639Z level=INFO source=images.go:473 msg="total blobs: 10" time=2026-02-24T08:36:07.639Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-24T08:36:07.640Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.0)" time=2026-02-24T08:36:07.640Z level=DEBUG source=sched.go:147 msg="starting llm scheduler" time=2026-02-24T08:36:07.641Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-24T08:36:07.641Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extraEnvs=map[] time=2026-02-24T08:36:07.644Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41353" time=2026-02-24T08:36:07.644Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2026-02-24T08:36:07.677Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-24T08:36:07.678Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41353" time=2026-02-24T08:36:07.686Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string time=2026-02-24T08:36:07.686Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.file_type default=0 time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.name default="" time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.description default="" time=2026-02-24T08:36:07.686Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-02-24T08:36:07.686Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-02-24T08:36:07.704Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12 ggml_cuda_init: failed to initialize CUDA: system not yet initialized load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2026-02-24T08:36:07.762Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-02-24T08:36:07.762Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.pooling_type default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.expert_count default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.embedding_length default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-02-24T08:36:07.763Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-02-24T08:36:07.764Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=78.449509ms time=2026-02-24T08:36:07.764Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=5.061µs time=2026-02-24T08:36:07.765Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[] time=2026-02-24T08:36:07.765Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=124.213749ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-02-24T08:36:07.765Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extraEnvs=map[] time=2026-02-24T08:36:07.765Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41882" time=2026-02-24T08:36:07.766Z level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 OLLAMA_GPU=100 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2026-02-24T08:36:07.786Z level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-24T08:36:07.787Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:41882" time=2026-02-24T08:36:07.797Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string time=2026-02-24T08:36:07.797Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.alignment default=32 time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.file_type default=0 time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.name default="" time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=general.description default="" time=2026-02-24T08:36:07.797Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-02-24T08:36:07.797Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-02-24T08:36:07.816Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13 ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-02-24T08:36:07.863Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.pooling_type default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.expert_count default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.block_count default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.embedding_length default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-02-24T08:36:07.863Z level=DEBUG source=ggml.go:300 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-02-24T08:36:07.863Z level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=66.313469ms time=2026-02-24T08:36:07.863Z level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=1.222µs time=2026-02-24T08:36:07.863Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" devices=[] time=2026-02-24T08:36:07.864Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=98.482865ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2026-02-24T08:36:07.864Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-02-24T08:36:07.864Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0 time=2026-02-24T08:36:07.864Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[] time=2026-02-24T08:36:07.864Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=223.681034ms time=2026-02-24T08:36:07.864Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="2015.3 GiB" available="1951.4 GiB" time=2026-02-24T08:36:07.864Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 ```
Author
Owner

@rick-github commented on GitHub (Feb 24, 2026):

Try this:

echo 'options nvidia_uvm uvm_disable_hmm=1' | sudo tee -a /etc/modprobe.d/nvidia-uvm.conf
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm
<!-- gh-comment-id:3950290014 --> @rick-github commented on GitHub (Feb 24, 2026): Try this: ``` echo 'options nvidia_uvm uvm_disable_hmm=1' | sudo tee -a /etc/modprobe.d/nvidia-uvm.conf sudo modprobe -r nvidia_uvm sudo modprobe nvidia_uvm ```
Author
Owner

@levelol commented on GitHub (Feb 25, 2026):

The ollama version is 0.13.5 and it is also 100% using CPU, and there is still the following error. Which version supports cuda12.4?

time=2026-02-25T02:09:48.248Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-02-25T02:09:48.248Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-02-25T02:09:48.266Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: failed to initialize CUDA: system not yet initialized
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-02-25T02:09:54.821Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.block_count default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.pooling_type default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.expert_count default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.block_count default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.embedding_length default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-02-25T02:09:54.821Z level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=6.574581061s
time=2026-02-25T02:09:54.821Z level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=661ns
time=2026-02-25T02:09:54.823Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[]
time=2026-02-25T02:09:54.823Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=6.608669139s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-02-25T02:09:54.823Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0
time=2026-02-25T02:09:54.823Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[]
time=2026-02-25T02:09:54.823Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=6.714283278s
time=2026-02-25T02:09:54.824Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="2015.3 GiB" available="1955.3 GiB"
time=2026-02-25T02:09:54.824Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
[root@a800-8-gpu-deepseek32b ~]#
<!-- gh-comment-id:3956319127 --> @levelol commented on GitHub (Feb 25, 2026): The ollama version is 0.13.5 and it is also 100% using CPU, and there is still the following error. Which version supports cuda12.4? ``` time=2026-02-25T02:09:48.248Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-02-25T02:09:48.248Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-02-25T02:09:48.266Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12 ggml_cuda_init: failed to initialize CUDA: system not yet initialized load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so time=2026-02-25T02:09:54.821Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.block_count default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.pooling_type default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.expert_count default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.block_count default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.embedding_length default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-02-25T02:09:54.821Z level=DEBUG source=ggml.go:282 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-02-25T02:09:54.821Z level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=6.574581061s time=2026-02-25T02:09:54.821Z level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=661ns time=2026-02-25T02:09:54.823Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" devices=[] time=2026-02-25T02:09:54.823Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=6.608669139s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2026-02-25T02:09:54.823Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=0 time=2026-02-25T02:09:54.823Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[] time=2026-02-25T02:09:54.823Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=6.714283278s time=2026-02-25T02:09:54.824Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="2015.3 GiB" available="1955.3 GiB" time=2026-02-25T02:09:54.824Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" [root@a800-8-gpu-deepseek32b ~]# ```
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

ollama has supported CUDA 12 since around 0.3.7.

<!-- gh-comment-id:3956323110 --> @rick-github commented on GitHub (Feb 25, 2026): ollama has supported CUDA 12 since around [0.3.7](https://github.com/ollama/ollama/releases/tag/v0.3.7).
Author
Owner

@levelol commented on GitHub (Feb 25, 2026):

Hello everyone, the problem has been solved. Ollama can now access the GPU, and the nvidia-fabricmanager dependency needs to be installed. On high-performance servers like A800/H800/A100 that use NVLink interconnection, the graphics cards do not work independently but are connected as a whole through a high-speed interconnect bus. The nvidia-fabricmanager is a necessary daemon for A800 operation, responsible for managing the NVLink links.

You need to access the NVIDIA official repository address on a machine with internet access to download the following two core RPM packages:

Repository path: https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/

Files to download:

nvidia-fabric-manager-550.54.15-1.x86_64.rpm

nvidia-fabric-manager-devel-550.54.15-1.x86_64.rpm (recommended to download together to prevent dependency errors)

yum localinstall -y nvidia-fabric-manager-*.rpm

<!-- gh-comment-id:3956771333 --> @levelol commented on GitHub (Feb 25, 2026): Hello everyone, the problem has been solved. Ollama can now access the GPU, and the nvidia-fabricmanager dependency needs to be installed. On high-performance servers like A800/H800/A100 that use NVLink interconnection, the graphics cards do not work independently but are connected as a whole through a high-speed interconnect bus. The nvidia-fabricmanager is a necessary daemon for A800 operation, responsible for managing the NVLink links. You need to access the NVIDIA official repository address on a machine with internet access to download the following two core RPM packages: Repository path: https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/ Files to download: nvidia-fabric-manager-550.54.15-1.x86_64.rpm nvidia-fabric-manager-devel-550.54.15-1.x86_64.rpm (recommended to download together to prevent dependency errors) yum localinstall -y nvidia-fabric-manager-*.rpm
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71408