[GH-ISSUE #11852] Ollama does not offload any layers to nvidia gpu #54380

Closed
opened 2026-04-29 05:51:35 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @helgehr on GitHub (Aug 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11852

What is the issue?

I am out of clues for why ollama does not offload any layers to the gpu. I am using a machine with an NVIDIA A100 80G gpu and try to get some small model like gemma3:1b running as shown in the attached log file. The nvidia gpu is detected, but it is not used when running the model. Instead, the model is exclusively loaded on the cpu and only the cpu backend is loaded:
load_backend: loaded CPU backend from /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cpu-haswell.so

On another machine with a very similar configuration, this setup works without problems and the line
load_backend: loaded CUDA backend from /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cuda.so
shows up in the log file, additionally.

Furthermore, all shared libraries needed by libggml-cuda are available as shown by the last entry in the log file.

The attached log file with OLLAMA_DEBUG=1 does not give me any clue for why ollama is not able to load the gpu backend. Therefore, I would be thankful for any tips what goes wrong here!

Relevant log output

me@gpu_machine - bin%./ollama serve &
[1] 2110802
me@gpu_machine - bin%time=2025-08-11T11:31:15.947+02:00 level=INFO source=routes.go:1304 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL:0 HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:
HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5
m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/work/mygroup/me/ollama/models/ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_
ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http:
//127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES
:0 http_proxy: https_proxy: no_proxy:]"
time=2025-08-11T11:31:16.153+02:00 level=INFO source=images.go:477 msg="total blobs: 17"
time=2025-08-11T11:31:16.230+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-08-11T11:31:16.267+02:00 level=INFO source=routes.go:1357 msg="Listening on 127.0.0.1:11434 (version 0.11.4)"
time=2025-08-11T11:31:16.267+02:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler"
time=2025-08-11T11:31:16.267+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-08-11T11:31:16.291+02:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-08-11T11:31:16.291+02:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-08-11T11:31:16.291+02:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/work/mygroup/me/ollama/ollama/lib/ollama/libcuda.so* /sw/spack-levante/nvhpc-24.7-py26uc/Linux_x86_64/24.7
/compilers/lib/libcuda.so* /sw/spack-levante/perl-alien-svn-1.8.11.0-s7bpqh/lib/perl5/x86_64-linux-thread-multi/Alien/SVN/libcuda.so* /work/mygroup/me/ollama/ollama/lib/ollama/libcuda.so* /usr/local/cuda*/
targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/
cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-08-11T11:31:16.315+02:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[/usr/lib64/libcuda.so.560.35.05]
initializing /usr/lib64/libcuda.so.560.35.05
dlsym: cuInit - 0x7fff6d896800
dlsym: cuDriverGetVersion - 0x7fff6d896820
dlsym: cuDeviceGetCount - 0x7fff6d896860
dlsym: cuDeviceGet - 0x7fff6d896840
dlsym: cuDeviceGetAttribute - 0x7fff6d896940
dlsym: cuDeviceGetUuid - 0x7fff6d8968a0
dlsym: cuDeviceGetName - 0x7fff6d896880
dlsym: cuCtxCreate_v3 - 0x7fff6d8a1020
dlsym: cuMemGetInfo_v2 - 0x7fff6d8ac4e0
dlsym: cuCtxDestroy - 0x7fff6d9071b0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f1c
CUDA driver version: 12.6
calling cuDeviceGetCount
device count 1
time=2025-08-11T11:31:16.472+02:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.560.35.05
[GPU-974624fb-16fe-4613-6582-286897a53b6a] CUDA totalMem 81155mb
[GPU-974624fb-16fe-4613-6582-286897a53b6a] CUDA freeMem 80731mb
[GPU-974624fb-16fe-4613-6582-286897a53b6a] Compute Capability 8.0
time=2025-08-11T11:31:16.787+02:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing cuda driver library
time=2025-08-11T11:31:16.787+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-974624fb-16fe-4613-6582-286897a53b6a library=cuda variant=v12 compute=8.0 driver=12.6 name="NVIDIA A100-SXM4-80G
B" total="79.3 GiB" available="78.8 GiB"

me@gpu_machine - bin%./ollama run gemma3:1b
[GIN] 2025/08/11 - 11:31:30 | 200 |       76.73µs |       127.0.0.1 | HEAD     "/"
time=2025-08-11T11:31:30.578+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/08/11 - 11:31:30 | 200 |  275.861135ms |       127.0.0.1 | POST     "/api/show"time=2025-08-11T11:31:30.721+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="503.7 GiB" before.free="489.2 GiB" before.free_swap="0 B" now.total="503.7 GiB" now.free="488.
9 GiB" now.free_swap="0 B"
initializing /usr/lib64/libcuda.so.560.35.05
dlsym: cuInit - 0x7fff6d896800
dlsym: cuDriverGetVersion - 0x7fff6d896820
dlsym: cuDeviceGetCount - 0x7fff6d896860
dlsym: cuDeviceGet - 0x7fff6d896840
dlsym: cuDeviceGetAttribute - 0x7fff6d896940
dlsym: cuDeviceGetUuid - 0x7fff6d8968a0
dlsym: cuDeviceGetName - 0x7fff6d896880
dlsym: cuCtxCreate_v3 - 0x7fff6d8a1020
dlsym: cuMemGetInfo_v2 - 0x7fff6d8ac4e0
dlsym: cuCtxDestroy - 0x7fff6d9071b0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f1c
CUDA driver version: 12.6
calling cuDeviceGetCount
device count 1time=2025-08-11T11:31:31.002+02:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-974624fb-16fe-4613-6582-286897a53b6a name="NVIDIA A100-SXM4-80GB" overhead="0 B" before.total="79.3 Gi
B" before.free="78.8 GiB" now.total="79.3 GiB" now.free="78.8 GiB" now.used="424.2 MiB"
releasing cuda driver library
time=2025-08-11T11:31:31.002+02:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-08-11T11:31:31.041+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32time=2025-08-11T11:31:31.140+02:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b
9c01
time=2025-08-11T11:31:31.140+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[78.8 GiB]"
time=2025-08-11T11:31:31.140+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0
time=2025-08-11T11:31:31.140+02:00 level=INFO source=sched.go:786 msg="new model will fit in available VRAM in single GPU, loading" model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c90
6dac1694b6a47684b37b8895d470ac688520b9c01 gpu=GPU-974624fb-16fe-4613-6582-286897a53b6a parallel=1 available=84653113344 required="1.7 GiB"
time=2025-08-11T11:31:31.141+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="503.7 GiB" before.free="488.9 GiB" before.free_swap="0 B" now.total="503.7 GiB" now.free="488.9
GiB" now.free_swap="0 B"
initializing /usr/lib64/libcuda.so.560.35.05
dlsym: cuInit - 0x7fff6d896800
dlsym: cuDriverGetVersion - 0x7fff6d896820
dlsym: cuDeviceGetCount - 0x7fff6d896860
dlsym: cuDeviceGet - 0x7fff6d896840
dlsym: cuDeviceGetAttribute - 0x7fff6d896940
dlsym: cuDeviceGetUuid - 0x7fff6d8968a0
dlsym: cuDeviceGetName - 0x7fff6d896880
dlsym: cuCtxCreate_v3 - 0x7fff6d8a1020
dlsym: cuMemGetInfo_v2 - 0x7fff6d8ac4e0
dlsym: cuCtxDestroy - 0x7fff6d9071b0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f1c
CUDA driver version: 12.6
calling cuDeviceGetCount
device count 1time=2025-08-11T11:31:31.408+02:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-974624fb-16fe-4613-6582-286897a53b6a name="NVIDIA A100-SXM4-80GB" overhead="0 B" before.total="79.3 Gi
B" before.free="78.8 GiB" now.total="79.3 GiB" now.free="78.8 GiB" now.used="424.2 MiB"
releasing cuda driver library
time=2025-08-11T11:31:31.408+02:00 level=INFO source=server.go:135 msg="system memory" total="503.7 GiB" free="488.9 GiB" free_swap="0 B"
time=2025-08-11T11:31:31.408+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[78.8 GiB]"
time=2025-08-11T11:31:31.408+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0
time=2025-08-11T11:31:31.409+02:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=27 layers.offload=27 layers.split="" memory.available="[78.8 GiB]" memory.gpu_overhe
ad="0 B" memory.required.full="1.7 GiB" memory.required.partial="1.7 GiB" memory.required.kv="38.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="762.5 MiB" memory.weights.repeating="456.5
MiB" memory.weights.nonrepeating="306.0 MiB" memory.graph.full="514.2 MiB" memory.graph.partial="750.5 MiB"
time=2025-08-11T11:31:31.410+02:00 level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[]time=2025-08-11T11:31:31.488+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
time=2025-08-11T11:31:31.488+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
time=2025-08-11T11:31:31.488+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.num_channels default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.embedding_length default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.head_count default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.rope.freq_scale default=1
time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.mm_tokens_per_image default=256
time=2025-08-11T11:31:31.496+02:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/work/mygroup/me/ollama/ollama/bin/ollama runner --ollama-engine --model /work/mygroup/me/ollama/mode
ls/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 --ctx-size 4096 --batch-size 512 --n-gpu-layers 27 --threads 128 --parallel 1 --port 33697"
time=2025-08-11T11:31:31.496+02:00 level=DEBUG source=server.go:439 msg=subprocess LD_LIBRARY_PATH=/work/mygroup/me/ollama/ollama/lib/ollama:/sw/spack-levante/nvhpc-24.7-py26uc/Linux_x86_64/24.7/compilers/
lib:/sw/spack-levante/perl-alien-svn-1.8.11.0-s7bpqh/lib/perl5/x86_64-linux-thread-multi/Alien/SVN:/work/mygroup/me/ollama/ollama/lib/ollama:/work/mygroup/me/ollama/ollama/lib/ollama ROCR_VISIBLE_DEVIC
ES=0 OLLAMA_DEBUG=1 CUDA_VISIBLE_DEVICES=GPU-974624fb-16fe-4613-6582-286897a53b6a PATH=/home/b/me/perl5/bin:/work/mygroup/me/spack_software/vim-8.2.2541-2lyzm3/bin:/home/b/me/myapps/bins:/work/bd
1179/me/MyApps/bin:/home/b/me/.local/bin:/work/mygroup/me/dwd_icon_tools/icontools:/home/b/me/.local/myscripts:/home/b/me/.cargo/bin:/sw/bin:/sw/bin:/sw/spack-levante/nvhpc-24.7-py26uc/
Linux_x86_64/24.7/compilers/bin:/sw/spack-levante/cryptsetup-2.3.5-kjtctt/sbin:/sw/spack-levante/singularity-3.8.5-w53g5a/bin:/home/b/me/perl5/bin:/work/mygroup/me/spack_software/vim-8.2.2541-2lyzm3/b
in:/home/b/me/myapps/bins:/work/mygroup/me/MyApps/bin:/home/b/me/.local/bin:/work/mygroup/me/dwd_icon_tools/icontools:/home/b/me/.local/myscripts:/home/b/me/.cargo/bin:/sw/bin:/sw/b
in:/home/b/me/perl5/bin:/work/mygroup/me/mambaforge/condabin:/work/mygroup/me/spack_software/vim-8.2.2541-2lyzm3/bin:/home/b/me/myapps/bins:/work/mygroup/me/MyApps/bin:/home/b/me/.lo
cal/bin:/work/mygroup/me/dwd_icon_tools/icontools:/home/b/me/.local/myscripts:/home/b/me/.cargo/bin:/sw/spack-levante/mambaforge-22.9.0-2-Linux-x86_64-kptncg/bin:/sw/spack-levante/git-lfs-2.11.0-
oihcwo/bin:/sw/spack-levante/git-2.43.3-bm2hrp/bin:/sw/bin:/sw/bin:/sw/spack-workplace/spack/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/work/mygroup/me/MyApps/fzf/bin OLL
AMA_MODELS=/work/mygroup/me/ollama/models/ GPU_DEVICE_ORDINAL=0 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/work/mygroup/me/ollama/ollama/lib/ollama
time=2025-08-11T11:31:31.496+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
time=2025-08-11T11:31:31.496+02:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
time=2025-08-11T11:31:31.497+02:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-08-11T11:31:31.509+02:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-08-11T11:31:31.509+02:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33697"
time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.name default=""
time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.description default=""
time=2025-08-11T11:31:31.579+02:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=340 num_key_values=32
time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/work/mygroup/me/ollama/ollama/lib/ollama
⠏ load_backend: loaded CPU backend from /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cpu-haswell.so
time=2025-08-11T11:31:31.669+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compile
r=cgo(gcc)
time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:365 msg="offloading 0 repeating layers to GPU"
time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:369 msg="offloading output layer to CPU"
time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:376 msg="offloaded 0/27 layers to GPU"
time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:379 msg="model weights" buffer=CPU size="1.0 GiB"
time=2025-08-11T11:31:31.672+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
time=2025-08-11T11:31:31.672+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.num_channels default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.embedding_length default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.head_count default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.rope.freq_scale default=1
time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.mm_tokens_per_image default=256time=2025-08-11T11:31:31.692+02:00 level=DEBUG source=ggml.go:650 msg="compute graph" nodes=1151 splits=1
time=2025-08-11T11:31:31.692+02:00 level=INFO source=ggml.go:668 msg="compute graph" backend=CPU buffer_type=CPU size="36.2 MiB"
time=2025-08-11T11:31:31.692+02:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=320864256A allocated.CPU.Weights="[19491584A 19491584A 19491584A 17328128A 17328128A 19491584A 17328128A 17
328128A 19491584A 17328128A 17328128A 19491584A 17328128A 17328128A 19491584A 17328128A 17328128A 19491584A 17328128A 17328128A 19491584A 17328128A 19491584A 19491584A 19491584A 19491584A 320868864A]" allocate
d.CPU.Cache="[1048576A 1048576A 1048576A 1048576A 1048576A 4194304A 1048576A 1048576A 1048576A 1048576A 1048576A 4194304A 1048576A 1048576A 1048576A 1048576A 1048576A 4194304A 1048576A 1048576A 1048576A 104857
6A 1048576A 4194304A 1048576A 1048576A 0U]" allocated.CPU.Graph=38010880A
time=2025-08-11T11:31:31.748+02:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-11T11:31:31.748+02:00 level=DEBUG source=server.go:643 msg="model load progress 0.06"time=2025-08-11T11:31:31.999+02:00 level=DEBUG source=server.go:643 msg="model load progress 0.53"time=2025-08-11T11:31:32.250+02:00 level=INFO source=server.go:637 msg="llama runner started in 0.75 seconds"
time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma3:1b runner.inference=cuda runner.devices=1 runner.size="1.7 GiB" runner
.vram="1.7 GiB" runner.parallel=1 runner.pid=2110853 runner.model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 runner.num_ctx=4096
[GIN] 2025/08/11 - 11:31:32 | 200 |  1.668672758s |       127.0.0.1 | POST     "/api/generate"
time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:501 msg="context for request finished"
time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:1b runner.inference=cuda runner.
devices=1 runner.size="1.7 GiB" runner.vram="1.7 GiB" runner.parallel=1 runner.pid=2110853 runner.model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac6885
20b9c01 runner.num_ctx=4096 duration=5m0s
time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:359 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:1b runner.inference=cuda runner.devices=1 runner.s
ize="1.7 GiB" runner.vram="1.7 GiB" runner.parallel=1 runner.pid=2110853 runner.model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 runner.num
_ctx=4096 refCount=0
>>>

me@gpu_machine - bin%ldd /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cuda.so
        linux-vdso.so.1 (0x00007ffff7ffa000)
        libggml-base.so => /work/mygroup/me/ollama/ollama/lib/ollama/libggml-base.so (0x00007ffff7f21000)
        libcudart.so.12 => /work/mygroup/me/ollama/ollama/lib/ollama/libcudart.so.12 (0x00007fffaae5f000)
        libcublas.so.12 => /work/mygroup/me/ollama/ollama/lib/ollama/libcublas.so.12 (0x00007fffa3d54000)
        libcublasLt.so.12 => /work/mygroup/me/ollama/ollama/lib/ollama/libcublasLt.so.12 (0x00007fff71a62000)
        libcuda.so.1 => /lib64/libcuda.so.1 (0x00007fff6f899000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fff6f679000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fff6f475000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fff6f26d000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fff6eed8000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fff6eb56000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fff6e93e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fff6e579000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7dce000)

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.11.4

Originally created by @helgehr on GitHub (Aug 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11852 ### What is the issue? I am out of clues for why ollama does not offload any layers to the gpu. I am using a machine with an NVIDIA A100 80G gpu and try to get some small model like `gemma3:1b` running as shown in the attached log file. The nvidia gpu is detected, but it is not used when running the model. Instead, the model is exclusively loaded on the cpu and only the cpu backend is loaded: `load_backend: loaded CPU backend from /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cpu-haswell.so` On another machine with a very similar configuration, this setup works without problems and the line `load_backend: loaded CUDA backend from /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cuda.so` shows up in the log file, additionally. Furthermore, all shared libraries needed by `libggml-cuda` are available as shown by the last entry in the log file. The attached log file with `OLLAMA_DEBUG=1` does not give me any clue for why ollama is not able to load the gpu backend. Therefore, I would be thankful for any tips what goes wrong here! ### Relevant log output ```shell me@gpu_machine - bin%./ollama serve & [1] 2110802 me@gpu_machine - bin%time=2025-08-11T11:31:15.947+02:00 level=INFO source=routes.go:1304 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL:0 HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5 m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/work/mygroup/me/ollama/models/ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http: //127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES :0 http_proxy: https_proxy: no_proxy:]" time=2025-08-11T11:31:16.153+02:00 level=INFO source=images.go:477 msg="total blobs: 17" time=2025-08-11T11:31:16.230+02:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-08-11T11:31:16.267+02:00 level=INFO source=routes.go:1357 msg="Listening on 127.0.0.1:11434 (version 0.11.4)" time=2025-08-11T11:31:16.267+02:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler" time=2025-08-11T11:31:16.267+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-08-11T11:31:16.291+02:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-08-11T11:31:16.291+02:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* time=2025-08-11T11:31:16.291+02:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/work/mygroup/me/ollama/ollama/lib/ollama/libcuda.so* /sw/spack-levante/nvhpc-24.7-py26uc/Linux_x86_64/24.7 /compilers/lib/libcuda.so* /sw/spack-levante/perl-alien-svn-1.8.11.0-s7bpqh/lib/perl5/x86_64-linux-thread-multi/Alien/SVN/libcuda.so* /work/mygroup/me/ollama/ollama/lib/ollama/libcuda.so* /usr/local/cuda*/ targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/ cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-08-11T11:31:16.315+02:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[/usr/lib64/libcuda.so.560.35.05] initializing /usr/lib64/libcuda.so.560.35.05 dlsym: cuInit - 0x7fff6d896800 dlsym: cuDriverGetVersion - 0x7fff6d896820 dlsym: cuDeviceGetCount - 0x7fff6d896860 dlsym: cuDeviceGet - 0x7fff6d896840 dlsym: cuDeviceGetAttribute - 0x7fff6d896940 dlsym: cuDeviceGetUuid - 0x7fff6d8968a0 dlsym: cuDeviceGetName - 0x7fff6d896880 dlsym: cuCtxCreate_v3 - 0x7fff6d8a1020 dlsym: cuMemGetInfo_v2 - 0x7fff6d8ac4e0 dlsym: cuCtxDestroy - 0x7fff6d9071b0 calling cuInit calling cuDriverGetVersion raw version 0x2f1c CUDA driver version: 12.6 calling cuDeviceGetCount device count 1 time=2025-08-11T11:31:16.472+02:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.560.35.05 [GPU-974624fb-16fe-4613-6582-286897a53b6a] CUDA totalMem 81155mb [GPU-974624fb-16fe-4613-6582-286897a53b6a] CUDA freeMem 80731mb [GPU-974624fb-16fe-4613-6582-286897a53b6a] Compute Capability 8.0 time=2025-08-11T11:31:16.787+02:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu" releasing cuda driver library time=2025-08-11T11:31:16.787+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-974624fb-16fe-4613-6582-286897a53b6a library=cuda variant=v12 compute=8.0 driver=12.6 name="NVIDIA A100-SXM4-80G B" total="79.3 GiB" available="78.8 GiB" me@gpu_machine - bin%./ollama run gemma3:1b [GIN] 2025/08/11 - 11:31:30 | 200 | 76.73µs | 127.0.0.1 | HEAD "/" time=2025-08-11T11:31:30.578+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/08/11 - 11:31:30 | 200 | 275.861135ms | 127.0.0.1 | POST "/api/show" ⠙ time=2025-08-11T11:31:30.721+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="503.7 GiB" before.free="489.2 GiB" before.free_swap="0 B" now.total="503.7 GiB" now.free="488. 9 GiB" now.free_swap="0 B" initializing /usr/lib64/libcuda.so.560.35.05 dlsym: cuInit - 0x7fff6d896800 dlsym: cuDriverGetVersion - 0x7fff6d896820 dlsym: cuDeviceGetCount - 0x7fff6d896860 dlsym: cuDeviceGet - 0x7fff6d896840 dlsym: cuDeviceGetAttribute - 0x7fff6d896940 dlsym: cuDeviceGetUuid - 0x7fff6d8968a0 dlsym: cuDeviceGetName - 0x7fff6d896880 dlsym: cuCtxCreate_v3 - 0x7fff6d8a1020 dlsym: cuMemGetInfo_v2 - 0x7fff6d8ac4e0 dlsym: cuCtxDestroy - 0x7fff6d9071b0 calling cuInit calling cuDriverGetVersion raw version 0x2f1c CUDA driver version: 12.6 calling cuDeviceGetCount device count 1 ⠸ time=2025-08-11T11:31:31.002+02:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-974624fb-16fe-4613-6582-286897a53b6a name="NVIDIA A100-SXM4-80GB" overhead="0 B" before.total="79.3 Gi B" before.free="78.8 GiB" now.total="79.3 GiB" now.free="78.8 GiB" now.used="424.2 MiB" releasing cuda driver library time=2025-08-11T11:31:31.002+02:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-08-11T11:31:31.041+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 ⠼ time=2025-08-11T11:31:31.140+02:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b 9c01 time=2025-08-11T11:31:31.140+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[78.8 GiB]" time=2025-08-11T11:31:31.140+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0 time=2025-08-11T11:31:31.140+02:00 level=INFO source=sched.go:786 msg="new model will fit in available VRAM in single GPU, loading" model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c90 6dac1694b6a47684b37b8895d470ac688520b9c01 gpu=GPU-974624fb-16fe-4613-6582-286897a53b6a parallel=1 available=84653113344 required="1.7 GiB" time=2025-08-11T11:31:31.141+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="503.7 GiB" before.free="488.9 GiB" before.free_swap="0 B" now.total="503.7 GiB" now.free="488.9 GiB" now.free_swap="0 B" initializing /usr/lib64/libcuda.so.560.35.05 dlsym: cuInit - 0x7fff6d896800 dlsym: cuDriverGetVersion - 0x7fff6d896820 dlsym: cuDeviceGetCount - 0x7fff6d896860 dlsym: cuDeviceGet - 0x7fff6d896840 dlsym: cuDeviceGetAttribute - 0x7fff6d896940 dlsym: cuDeviceGetUuid - 0x7fff6d8968a0 dlsym: cuDeviceGetName - 0x7fff6d896880 dlsym: cuCtxCreate_v3 - 0x7fff6d8a1020 dlsym: cuMemGetInfo_v2 - 0x7fff6d8ac4e0 dlsym: cuCtxDestroy - 0x7fff6d9071b0 calling cuInit calling cuDriverGetVersion raw version 0x2f1c CUDA driver version: 12.6 calling cuDeviceGetCount device count 1 ⠇ time=2025-08-11T11:31:31.408+02:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-974624fb-16fe-4613-6582-286897a53b6a name="NVIDIA A100-SXM4-80GB" overhead="0 B" before.total="79.3 Gi B" before.free="78.8 GiB" now.total="79.3 GiB" now.free="78.8 GiB" now.used="424.2 MiB" releasing cuda driver library time=2025-08-11T11:31:31.408+02:00 level=INFO source=server.go:135 msg="system memory" total="503.7 GiB" free="488.9 GiB" free_swap="0 B" time=2025-08-11T11:31:31.408+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[78.8 GiB]" time=2025-08-11T11:31:31.408+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0 time=2025-08-11T11:31:31.409+02:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=27 layers.offload=27 layers.split="" memory.available="[78.8 GiB]" memory.gpu_overhe ad="0 B" memory.required.full="1.7 GiB" memory.required.partial="1.7 GiB" memory.required.kv="38.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="762.5 MiB" memory.weights.repeating="456.5 MiB" memory.weights.nonrepeating="306.0 MiB" memory.graph.full="514.2 MiB" memory.graph.partial="750.5 MiB" time=2025-08-11T11:31:31.410+02:00 level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[] ⠇ time=2025-08-11T11:31:31.488+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-11T11:31:31.488+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 time=2025-08-11T11:31:31.488+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.num_channels default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.embedding_length default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.head_count default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.rope.freq_scale default=1 time=2025-08-11T11:31:31.492+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.mm_tokens_per_image default=256 time=2025-08-11T11:31:31.496+02:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/work/mygroup/me/ollama/ollama/bin/ollama runner --ollama-engine --model /work/mygroup/me/ollama/mode ls/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 --ctx-size 4096 --batch-size 512 --n-gpu-layers 27 --threads 128 --parallel 1 --port 33697" time=2025-08-11T11:31:31.496+02:00 level=DEBUG source=server.go:439 msg=subprocess LD_LIBRARY_PATH=/work/mygroup/me/ollama/ollama/lib/ollama:/sw/spack-levante/nvhpc-24.7-py26uc/Linux_x86_64/24.7/compilers/ lib:/sw/spack-levante/perl-alien-svn-1.8.11.0-s7bpqh/lib/perl5/x86_64-linux-thread-multi/Alien/SVN:/work/mygroup/me/ollama/ollama/lib/ollama:/work/mygroup/me/ollama/ollama/lib/ollama ROCR_VISIBLE_DEVIC ES=0 OLLAMA_DEBUG=1 CUDA_VISIBLE_DEVICES=GPU-974624fb-16fe-4613-6582-286897a53b6a PATH=/home/b/me/perl5/bin:/work/mygroup/me/spack_software/vim-8.2.2541-2lyzm3/bin:/home/b/me/myapps/bins:/work/bd 1179/me/MyApps/bin:/home/b/me/.local/bin:/work/mygroup/me/dwd_icon_tools/icontools:/home/b/me/.local/myscripts:/home/b/me/.cargo/bin:/sw/bin:/sw/bin:/sw/spack-levante/nvhpc-24.7-py26uc/ Linux_x86_64/24.7/compilers/bin:/sw/spack-levante/cryptsetup-2.3.5-kjtctt/sbin:/sw/spack-levante/singularity-3.8.5-w53g5a/bin:/home/b/me/perl5/bin:/work/mygroup/me/spack_software/vim-8.2.2541-2lyzm3/b in:/home/b/me/myapps/bins:/work/mygroup/me/MyApps/bin:/home/b/me/.local/bin:/work/mygroup/me/dwd_icon_tools/icontools:/home/b/me/.local/myscripts:/home/b/me/.cargo/bin:/sw/bin:/sw/b in:/home/b/me/perl5/bin:/work/mygroup/me/mambaforge/condabin:/work/mygroup/me/spack_software/vim-8.2.2541-2lyzm3/bin:/home/b/me/myapps/bins:/work/mygroup/me/MyApps/bin:/home/b/me/.lo cal/bin:/work/mygroup/me/dwd_icon_tools/icontools:/home/b/me/.local/myscripts:/home/b/me/.cargo/bin:/sw/spack-levante/mambaforge-22.9.0-2-Linux-x86_64-kptncg/bin:/sw/spack-levante/git-lfs-2.11.0- oihcwo/bin:/sw/spack-levante/git-2.43.3-bm2hrp/bin:/sw/bin:/sw/bin:/sw/spack-workplace/spack/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/work/mygroup/me/MyApps/fzf/bin OLL AMA_MODELS=/work/mygroup/me/ollama/models/ GPU_DEVICE_ORDINAL=0 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/work/mygroup/me/ollama/ollama/lib/ollama time=2025-08-11T11:31:31.496+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 time=2025-08-11T11:31:31.496+02:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" time=2025-08-11T11:31:31.497+02:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" time=2025-08-11T11:31:31.509+02:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-08-11T11:31:31.509+02:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33697" time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.name default="" time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.description default="" time=2025-08-11T11:31:31.579+02:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=340 num_key_values=32 time=2025-08-11T11:31:31.579+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/work/mygroup/me/ollama/ollama/lib/ollama ⠏ load_backend: loaded CPU backend from /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cpu-haswell.so time=2025-08-11T11:31:31.669+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compile r=cgo(gcc) time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:365 msg="offloading 0 repeating layers to GPU" time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:369 msg="offloading output layer to CPU" time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:376 msg="offloaded 0/27 layers to GPU" time=2025-08-11T11:31:31.671+02:00 level=INFO source=ggml.go:379 msg="model weights" buffer=CPU size="1.0 GiB" time=2025-08-11T11:31:31.672+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 time=2025-08-11T11:31:31.672+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.num_channels default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.block_count default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.embedding_length default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.head_count default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.image_size default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.patch_size default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.rope.freq_scale default=1 time=2025-08-11T11:31:31.676+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gemma3.mm_tokens_per_image default=256 ⠙ time=2025-08-11T11:31:31.692+02:00 level=DEBUG source=ggml.go:650 msg="compute graph" nodes=1151 splits=1 time=2025-08-11T11:31:31.692+02:00 level=INFO source=ggml.go:668 msg="compute graph" backend=CPU buffer_type=CPU size="36.2 MiB" time=2025-08-11T11:31:31.692+02:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=320864256A allocated.CPU.Weights="[19491584A 19491584A 19491584A 17328128A 17328128A 19491584A 17328128A 17 328128A 19491584A 17328128A 17328128A 19491584A 17328128A 17328128A 19491584A 17328128A 17328128A 19491584A 17328128A 17328128A 19491584A 17328128A 19491584A 19491584A 19491584A 19491584A 320868864A]" allocate d.CPU.Cache="[1048576A 1048576A 1048576A 1048576A 1048576A 4194304A 1048576A 1048576A 1048576A 1048576A 1048576A 4194304A 1048576A 1048576A 1048576A 1048576A 1048576A 4194304A 1048576A 1048576A 1048576A 104857 6A 1048576A 4194304A 1048576A 1048576A 0U]" allocated.CPU.Graph=38010880A time=2025-08-11T11:31:31.748+02:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" time=2025-08-11T11:31:31.748+02:00 level=DEBUG source=server.go:643 msg="model load progress 0.06" ⠸ time=2025-08-11T11:31:31.999+02:00 level=DEBUG source=server.go:643 msg="model load progress 0.53" ⠦ time=2025-08-11T11:31:32.250+02:00 level=INFO source=server.go:637 msg="llama runner started in 0.75 seconds" time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma3:1b runner.inference=cuda runner.devices=1 runner.size="1.7 GiB" runner .vram="1.7 GiB" runner.parallel=1 runner.pid=2110853 runner.model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 runner.num_ctx=4096 [GIN] 2025/08/11 - 11:31:32 | 200 | 1.668672758s | 127.0.0.1 | POST "/api/generate" time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:501 msg="context for request finished" time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:1b runner.inference=cuda runner. devices=1 runner.size="1.7 GiB" runner.vram="1.7 GiB" runner.parallel=1 runner.pid=2110853 runner.model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac6885 20b9c01 runner.num_ctx=4096 duration=5m0s time=2025-08-11T11:31:32.250+02:00 level=DEBUG source=sched.go:359 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:1b runner.inference=cuda runner.devices=1 runner.s ize="1.7 GiB" runner.vram="1.7 GiB" runner.parallel=1 runner.pid=2110853 runner.model=/work/mygroup/me/ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 runner.num _ctx=4096 refCount=0 >>> me@gpu_machine - bin%ldd /work/mygroup/me/ollama/ollama/lib/ollama/libggml-cuda.so linux-vdso.so.1 (0x00007ffff7ffa000) libggml-base.so => /work/mygroup/me/ollama/ollama/lib/ollama/libggml-base.so (0x00007ffff7f21000) libcudart.so.12 => /work/mygroup/me/ollama/ollama/lib/ollama/libcudart.so.12 (0x00007fffaae5f000) libcublas.so.12 => /work/mygroup/me/ollama/ollama/lib/ollama/libcublas.so.12 (0x00007fffa3d54000) libcublasLt.so.12 => /work/mygroup/me/ollama/ollama/lib/ollama/libcublasLt.so.12 (0x00007fff71a62000) libcuda.so.1 => /lib64/libcuda.so.1 (0x00007fff6f899000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fff6f679000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fff6f475000) librt.so.1 => /lib64/librt.so.1 (0x00007fff6f26d000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fff6eed8000) libm.so.6 => /lib64/libm.so.6 (0x00007fff6eb56000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fff6e93e000) libc.so.6 => /lib64/libc.so.6 (0x00007fff6e579000) /lib64/ld-linux-x86-64.so.2 (0x00007ffff7dce000) ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.11.4
GiteaMirror added the bug label 2026-04-29 05:51:35 -05:00
Author
Owner

@ymilv commented on GitHub (Aug 11, 2025):

I have the same problem after upgrading to version 0.11.4.

<!-- gh-comment-id:3174301855 --> @ymilv commented on GitHub (Aug 11, 2025): I have the same problem after upgrading to version 0.11.4.
Author
Owner

@rick-github commented on GitHub (Aug 11, 2025):

Unset ROCR_VISIBLE_DEVICES. #11723

<!-- gh-comment-id:3174316820 --> @rick-github commented on GitHub (Aug 11, 2025): Unset `ROCR_VISIBLE_DEVICES`. #11723
Author
Owner

@helgehr commented on GitHub (Aug 11, 2025):

Thank you so much, unsetting that variable did fix the issue! 🙏

<!-- gh-comment-id:3174364642 --> @helgehr commented on GitHub (Aug 11, 2025): Thank you so much, unsetting that variable did fix the issue! 🙏
Author
Owner

@stackh34p commented on GitHub (Sep 18, 2025):

Had issue with Ollama running in docker locally, where it would not use my GPU even though the GPU device was exposed to docker. The issue occurred after upgrading to a newer version of the ollama image. In the logs there was this message

ollama load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so

Setting the ROCR_VISIBLE_DEVICES environment variable to the image in my docker compose, as @rick-github suggested seems to solve the issue. Thanks!

Update:

Here is how it should look in a compose, since it is somewhat not-intuitive:

environment:
  - OLLAMA_KEEP_ALIVE=24h
   - NVIDIA_VISIBLE_DEVICES=0  # Use GPU 0 only
   - NVIDIA_DRIVER_CAPABILITIES=compute,utility
   - ROCR_VISIBLE_DEVICES

The idea is to "set" the variable to empty value, so that it would override the default of the container.

<!-- gh-comment-id:3309274814 --> @stackh34p commented on GitHub (Sep 18, 2025): Had issue with Ollama running in docker locally, where it would not use my GPU even though the GPU device was exposed to docker. The issue occurred after upgrading to a newer version of the ollama image. In the logs there was this message > ollama load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so Setting the `ROCR_VISIBLE_DEVICES` environment variable to the image in my docker compose, as @rick-github suggested seems to solve the issue. Thanks! **Update:** Here is how it should look in a compose, since it is somewhat not-intuitive: ```yaml environment: - OLLAMA_KEEP_ALIVE=24h - NVIDIA_VISIBLE_DEVICES=0 # Use GPU 0 only - NVIDIA_DRIVER_CAPABILITIES=compute,utility - ROCR_VISIBLE_DEVICES ``` The idea is to "set" the variable to empty value, so that it would override the default of the container.
Author
Owner

@rick-github commented on GitHub (Sep 18, 2025):

This should have been fixed by d5a0d8d904 and so unsetting ROCR_VISIBLE_DEVICES is not required for 0.11.5+.

<!-- gh-comment-id:3309929379 --> @rick-github commented on GitHub (Sep 18, 2025): This should have been fixed by https://github.com/ollama/ollama/commit/d5a0d8d904baaf66a5326463a409fe4fa09b2dd2 and so unsetting `ROCR_VISIBLE_DEVICES` is not required for 0.11.5+.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54380