[GH-ISSUE #13241] Nvidia GPUs not available as vulkan compute unit in docker #34514

Open
opened 2026-04-22 18:08:49 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @as3ii on GitHub (Nov 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13241

What is the issue?

I have set the following environment variables for my docker container (on linux):

      - OLLAMA_NUM_PARALLEL=1
      - OLLAMA_MAX_LOADED_MODELS=1
      - OLLAMA_VULKAN=1
      - OLLAMA_SCHED_SPREAD=1
      - OLLAMA_CONTEXT_LENGTH=8192
      - OLLAMA_FLASH_ATTENTION=1
      - OLLAMA_KV_CACHE_TYPE=q8_0

This will make ollama correctly recognise the Nvidia GPU as Cuda compute and the AMD (integrated) GPU as vulkan compute. If I add CUDA_VISIBLE_DEVICES=-1 or OLLAMA_LLM_LIBRARY=vulkan to try run vulkan on both GPUs the Nvidia GPU stop being shown as available. This works outside the container.

Container log with CUDA_VISIBLE_DEVICES=-1

time=2025-11-25T22:55:52.154Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES:-1 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:true OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-25T22:55:52.155Z level=INFO source=images.go:522 msg="total blobs: 20"
time=2025-11-25T22:55:52.155Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-25T22:55:52.155Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)"
time=2025-11-25T22:55:52.156Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-25T22:55:52.156Z level=WARN source=runner.go:470 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=-1
time=2025-11-25T22:55:52.156Z level=WARN source=runner.go:474 msg="if GPUs are not correctly discovered, unset and try again"
time=2025-11-25T22:55:52.157Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38639"
time=2025-11-25T22:55:52.233Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43283"
time=2025-11-25T22:55:52.309Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36709"
time=2025-11-25T22:55:52.351Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M (RADV PHOENIX)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="17.5 GiB" available="14.6 GiB"
time=2025-11-25T22:55:52.351Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="17.5 GiB" threshold="20.0 GiB"

Container log without that env var

time=2025-11-25T22:58:36.815Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:true OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-25T22:58:36.816Z level=INFO source=images.go:522 msg="total blobs: 20"
time=2025-11-25T22:58:36.816Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-25T22:58:36.816Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)"
time=2025-11-25T22:58:36.817Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-25T22:58:36.818Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42385"
time=2025-11-25T22:58:37.009Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36637"
time=2025-11-25T22:58:37.232Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37561"
time=2025-11-25T22:58:37.283Z level=INFO source=types.go:42 msg="inference compute" id=GPU-14dc6d32-1419-ab72-9a2e-932cb01dd8db filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA RTX 2000 Ada Generation Laptop GPU" libdirs=ollama,cuda_v12 driver=13.0 pci_id=0000:01:00.0 type=discrete total="8.0 GiB" available="7.6 GiB"
time=2025-11-25T22:58:37.283Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M (RADV PHOENIX)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="17.5 GiB" available="14.7 GiB"

ollama outside the container, with CUDA_VISIBLE_DEVICES=-1

time=2025-11-25T23:42:43.140+01:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES:-1 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-25T23:42:43.141+01:00 level=INFO source=images.go:522 msg="total blobs: 20"
time=2025-11-25T23:42:43.141+01:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-25T23:42:43.141+01:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-25T23:42:43.141+01:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-25T23:42:43.142+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-25T23:42:43.143+01:00 level=INFO source=server.go:392 msg="starting runner" cmd="/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/bin/ollama runner --ollama-engine --port 43849"
time=2025-11-25T23:42:43.143+01:00 level=DEBUG source=server.go:393 msg=subprocess LD_LIBRARY_PATH=/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/lib/ollama:/nix/store/aqh13b8j7cbj0chnsbpw1i6qm9irxxsn-pipewire-1.4.9-jack/lib PATH=/nix/store/35yc81pz0q5yba14lxhn5r3jx5yg6c3l-bash-interactive-5.3p3/bin:/nix/store/8q2582rd22xp8jlcg1xn1w219q5lx5xa-patchelf-0.15.2/bin:/nix/store/vr15iyyykg9zai6fpgvhcgyw7gckl78w-gcc-wrapper-14.3.0/bin:/nix/store/kzq78n13l8w24jn8bx4djj79k5j717f1-gcc-14.3.0/bin:/nix/store/q6wgv06q39bfhx2xl8ysc05wi6m2zdss-glibc-2.40-66-bin/bin:/nix/store/imad8dvhp77h0pjbckp6wvmnyhp8dpgg-coreutils-9.8/bin:/nix/store/xwydcyvlsa3cvssk0y5llgdhlhjvmqdm-binutils-wrapper-2.44/bin:/nix/store/dc9vaz50jg7mibk9xvqw5dqv89cxzla3-binutils-2.44/bin:/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/bin:/nix/store/imad8dvhp77h0pjbckp6wvmnyhp8dpgg-coreutils-9.8/bin:/nix/store/av4xw9f56xlx5pgv862wabfif6m1yc0a-findutils-4.10.0/bin:/nix/store/20axvl7mgj15m23jgmnq97hx37fgz7bk-diffutils-3.12/bin:/nix/store/drc7kang929jaza6cy9zdx10s4gw1z5p-gnused-4.9/bin:/nix/store/x3zjxxz8m4ki88axp0gn8q8m6bldybba-gnugrep-3.12/bin:/nix/store/y2wdhdcrffp9hnkzk06d178hq3g98jay-gawk-5.3.2/bin:/nix/store/yi3c5karhx764ham5rfwk7iynr8mjf6q-gnutar-1.35/bin:/nix/store/d471xb7sfbah076s8rx02i68zpxc2r5n-gzip-1.14/bin:/nix/store/qm9rxn2sc1vrz91i443rr6f0vxm0zd82-bzip2-1.0.8-bin/bin:/nix/store/3fmzbq9y4m9nk235il7scmvwn8j9zy3p-gnumake-4.4.1/bin:/nix/store/rlq03x4cwf8zn73hxaxnx0zn5q9kifls-bash-5.3p3/bin:/nix/store/qrwznp1ikdf0qw05wia2haiwi32ik5n0-patch-2.8/bin:/nix/store/v0rfdwhg6w6i0yb6dbry4srk6pnj3xp0-xz-5.8.1-bin/bin:/nix/store/paj6a1lpzp57hz1djm5bs86b7ci221r0-file-5.45/bin:/run/wrappers/bin:/home/as3ii/.local/share/flatpak/exports/bin:/var/lib/flatpak/exports/bin:/home/as3ii/.nix-profile/bin:/nix/profile/bin:/home/as3ii/.local/state/nix/profile/bin:/etc/profiles/per-user/as3ii/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin:/home/as3ii/.local/bin OLLAMA_VULKAN=1 CUDA_VISIBLE_DEVICES=-1 OLLAMA_DEBUG=1 OLLAMA_LIBRARY_PATH=/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/lib/ollama
time=2025-11-25T23:42:43.350+01:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=208.143168ms OLLAMA_LIBRARY_PATH=[/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/lib/ollama] extra_envs=map[]
time=2025-11-25T23:42:43.350+01:00 level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=2
time=2025-11-25T23:42:43.350+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=208.655708ms
time=2025-11-25T23:42:43.350+01:00 level=INFO source=types.go:42 msg="inference compute" id=14dc6d32-1419-ab72-9a2e-932cb01dd8db filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="NVIDIA RTX 2000 Ada Generation Laptop GPU" libdirs=ollama driver=0.0 pci_id=0000:01:00.0 type=discrete total="8.0 GiB" available="7.6 GiB"
time=2025-11-25T23:42:43.350+01:00 level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M Graphics (RADV PHOENIX)" libdirs=ollama driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="17.5 GiB" available="14.7 GiB"

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.13.0 for the docker container, 0.12.11 outside

Originally created by @as3ii on GitHub (Nov 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13241 ### What is the issue? I have set the following environment variables for my docker container (on linux): ``` - OLLAMA_NUM_PARALLEL=1 - OLLAMA_MAX_LOADED_MODELS=1 - OLLAMA_VULKAN=1 - OLLAMA_SCHED_SPREAD=1 - OLLAMA_CONTEXT_LENGTH=8192 - OLLAMA_FLASH_ATTENTION=1 - OLLAMA_KV_CACHE_TYPE=q8_0 ``` This will make ollama correctly recognise the Nvidia GPU as Cuda compute and the AMD (integrated) GPU as vulkan compute. If I add `CUDA_VISIBLE_DEVICES=-1` or `OLLAMA_LLM_LIBRARY=vulkan` to try run vulkan on both GPUs the Nvidia GPU stop being shown as available. This works outside the container. Container log with `CUDA_VISIBLE_DEVICES=-1` ``` time=2025-11-25T22:55:52.154Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES:-1 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:true OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-25T22:55:52.155Z level=INFO source=images.go:522 msg="total blobs: 20" time=2025-11-25T22:55:52.155Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-25T22:55:52.155Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)" time=2025-11-25T22:55:52.156Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-25T22:55:52.156Z level=WARN source=runner.go:470 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=-1 time=2025-11-25T22:55:52.156Z level=WARN source=runner.go:474 msg="if GPUs are not correctly discovered, unset and try again" time=2025-11-25T22:55:52.157Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38639" time=2025-11-25T22:55:52.233Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43283" time=2025-11-25T22:55:52.309Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36709" time=2025-11-25T22:55:52.351Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M (RADV PHOENIX)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="17.5 GiB" available="14.6 GiB" time=2025-11-25T22:55:52.351Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="17.5 GiB" threshold="20.0 GiB" ``` Container log without that env var ``` time=2025-11-25T22:58:36.815Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:true OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-25T22:58:36.816Z level=INFO source=images.go:522 msg="total blobs: 20" time=2025-11-25T22:58:36.816Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-25T22:58:36.816Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)" time=2025-11-25T22:58:36.817Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-25T22:58:36.818Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42385" time=2025-11-25T22:58:37.009Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36637" time=2025-11-25T22:58:37.232Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37561" time=2025-11-25T22:58:37.283Z level=INFO source=types.go:42 msg="inference compute" id=GPU-14dc6d32-1419-ab72-9a2e-932cb01dd8db filter_id="" library=CUDA compute=8.9 name=CUDA0 description="NVIDIA RTX 2000 Ada Generation Laptop GPU" libdirs=ollama,cuda_v12 driver=13.0 pci_id=0000:01:00.0 type=discrete total="8.0 GiB" available="7.6 GiB" time=2025-11-25T22:58:37.283Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M (RADV PHOENIX)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="17.5 GiB" available="14.7 GiB" ``` ollama outside the container, with `CUDA_VISIBLE_DEVICES=-1` ``` time=2025-11-25T23:42:43.140+01:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES:-1 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-25T23:42:43.141+01:00 level=INFO source=images.go:522 msg="total blobs: 20" time=2025-11-25T23:42:43.141+01:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-25T23:42:43.141+01:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-25T23:42:43.141+01:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-25T23:42:43.142+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-25T23:42:43.143+01:00 level=INFO source=server.go:392 msg="starting runner" cmd="/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/bin/ollama runner --ollama-engine --port 43849" time=2025-11-25T23:42:43.143+01:00 level=DEBUG source=server.go:393 msg=subprocess LD_LIBRARY_PATH=/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/lib/ollama:/nix/store/aqh13b8j7cbj0chnsbpw1i6qm9irxxsn-pipewire-1.4.9-jack/lib PATH=/nix/store/35yc81pz0q5yba14lxhn5r3jx5yg6c3l-bash-interactive-5.3p3/bin:/nix/store/8q2582rd22xp8jlcg1xn1w219q5lx5xa-patchelf-0.15.2/bin:/nix/store/vr15iyyykg9zai6fpgvhcgyw7gckl78w-gcc-wrapper-14.3.0/bin:/nix/store/kzq78n13l8w24jn8bx4djj79k5j717f1-gcc-14.3.0/bin:/nix/store/q6wgv06q39bfhx2xl8ysc05wi6m2zdss-glibc-2.40-66-bin/bin:/nix/store/imad8dvhp77h0pjbckp6wvmnyhp8dpgg-coreutils-9.8/bin:/nix/store/xwydcyvlsa3cvssk0y5llgdhlhjvmqdm-binutils-wrapper-2.44/bin:/nix/store/dc9vaz50jg7mibk9xvqw5dqv89cxzla3-binutils-2.44/bin:/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/bin:/nix/store/imad8dvhp77h0pjbckp6wvmnyhp8dpgg-coreutils-9.8/bin:/nix/store/av4xw9f56xlx5pgv862wabfif6m1yc0a-findutils-4.10.0/bin:/nix/store/20axvl7mgj15m23jgmnq97hx37fgz7bk-diffutils-3.12/bin:/nix/store/drc7kang929jaza6cy9zdx10s4gw1z5p-gnused-4.9/bin:/nix/store/x3zjxxz8m4ki88axp0gn8q8m6bldybba-gnugrep-3.12/bin:/nix/store/y2wdhdcrffp9hnkzk06d178hq3g98jay-gawk-5.3.2/bin:/nix/store/yi3c5karhx764ham5rfwk7iynr8mjf6q-gnutar-1.35/bin:/nix/store/d471xb7sfbah076s8rx02i68zpxc2r5n-gzip-1.14/bin:/nix/store/qm9rxn2sc1vrz91i443rr6f0vxm0zd82-bzip2-1.0.8-bin/bin:/nix/store/3fmzbq9y4m9nk235il7scmvwn8j9zy3p-gnumake-4.4.1/bin:/nix/store/rlq03x4cwf8zn73hxaxnx0zn5q9kifls-bash-5.3p3/bin:/nix/store/qrwznp1ikdf0qw05wia2haiwi32ik5n0-patch-2.8/bin:/nix/store/v0rfdwhg6w6i0yb6dbry4srk6pnj3xp0-xz-5.8.1-bin/bin:/nix/store/paj6a1lpzp57hz1djm5bs86b7ci221r0-file-5.45/bin:/run/wrappers/bin:/home/as3ii/.local/share/flatpak/exports/bin:/var/lib/flatpak/exports/bin:/home/as3ii/.nix-profile/bin:/nix/profile/bin:/home/as3ii/.local/state/nix/profile/bin:/etc/profiles/per-user/as3ii/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin:/home/as3ii/.local/bin OLLAMA_VULKAN=1 CUDA_VISIBLE_DEVICES=-1 OLLAMA_DEBUG=1 OLLAMA_LIBRARY_PATH=/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/lib/ollama time=2025-11-25T23:42:43.350+01:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=208.143168ms OLLAMA_LIBRARY_PATH=[/nix/store/hbvz5l80ywg233css06w2h2lx7c05f6n-ollama-0.12.11/lib/ollama] extra_envs=map[] time=2025-11-25T23:42:43.350+01:00 level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=2 time=2025-11-25T23:42:43.350+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=208.655708ms time=2025-11-25T23:42:43.350+01:00 level=INFO source=types.go:42 msg="inference compute" id=14dc6d32-1419-ab72-9a2e-932cb01dd8db filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="NVIDIA RTX 2000 Ada Generation Laptop GPU" libdirs=ollama driver=0.0 pci_id=0000:01:00.0 type=discrete total="8.0 GiB" available="7.6 GiB" time=2025-11-25T23:42:43.350+01:00 level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M Graphics (RADV PHOENIX)" libdirs=ollama driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="17.5 GiB" available="14.7 GiB" ``` ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.13.0 for the docker container, 0.12.11 outside
GiteaMirror added the bug label 2026-04-22 18:08:49 -05:00
Author
Owner

@as3ii commented on GitHub (Nov 25, 2025):

TBH the main goal was to use both GPUs in parallel for the same model, or running 2 different models at the same time, seeing that using both CUDA and Vulkan in parallel is not supported at this time

<!-- gh-comment-id:3577984428 --> @as3ii commented on GitHub (Nov 25, 2025): TBH the main goal was to use both GPUs in parallel for the same model, or running 2 different models at the same time, [seeing that using both CUDA and Vulkan in parallel is not supported at this time](https://github.com/ollama/ollama/issues/13126#issuecomment-3554792576)
Author
Owner

@mwprado commented on GitHub (Dec 8, 2025):

Can we use a nvidia gpu with ollama vulkan backend? Create a useful cuda environment seems hard to set.

<!-- gh-comment-id:3624145256 --> @mwprado commented on GitHub (Dec 8, 2025): Can we use a nvidia gpu with ollama vulkan backend? Create a useful cuda environment seems hard to set.
Author
Owner

@ktrd734 commented on GitHub (Jan 3, 2026):

Workaround

I found a workaround for this and recently made it working on a NVidia GeForce GTX 770M (GK106M) on Ubuntu 24.04.3 LTS host OS running Xorg (not wayland), the ollama is dockerized.

The main point is to add a library, libxext6 into the ollama docker container and providing access to the host X server. libxext6 is a dependency of a proprietary NVidia library, /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0, that, AFAIK, provides Vulkan library functionality or has a role in that. (With the default, non-debugging output and logging settings, failing to load this lib will just silently fail because of missing libxext6, so will the initialization attempt of the NVidia Vulkan backend fail.)

Environment & prerequisites

  • using the current latest ollama/ollama:latest docker image
  • installing proprietary NVidia drivers suitable for your hardware
    • for me, these are nvidia 470.256.02-0ubuntu0.24.04.1 packages
    • I have not tested with other versions
  • installing nvidia-container-runtime - this will inject required NVidia proprietary libraries into the docker container
  • the user is in the video and render groups (I am not sure whether this is required)

My docker startup script

Note that not everything is required to make Vulkan-based acceleration working:

export OLLAMA_VULKAN=1
export GGML_VK_VISIBLE_DEVICES=0
export __NV_PRIME_RENDER_OFFLOAD=1
export NVIDIA_VISIBLE_DEVICES=all
export NVIDIA_DRIVER_CAPABILITIES=compute,utility
export OLLAMA_DEBUG=0
export OLLAMA_KV_CACHE_TYPE=q8_0
export OLLAMA_FLASH_ATTENTION=true

docker run -it \
        --runtime=nvidia \
        -v ollama:/root/.ollama -p 11434:11434 \
        -v /tmp/.X11-unix:/tmp/.X11-unix \
        --gpus=all \
        --device=/dev/dri \
        --group-add=video \
        -e DISPLAY=:0  \
        -e __NV_PRIME_RENDER_OFFLOAD=${__NV_PRIME_RENDER_OFFLOAD} \
        -e GGML_VK_VISIBLE_DEVICES=${GGML_VK_VISIBLE_DEVICES} \
        -e NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES} \
        -e NVIDIA_DRIVER_CAPABILITIES=${NVIDIA_DRIVER_CAPABILITIES} \
        -e OLLAMA_VULKAN=${OLLAMA_VULKAN} \
        -e OLLAMA_DEBUG=${OLLAMA_DEBUG} \
        -e OLLAMA_KV_CACHE_TYPE=${OLLAMA_KV_CACHE_TYPE} \
        -e OLLAMA_FLASH_ATTENTION=${OLLAMA_FLASH_ATTENTION} \
        --name ollama ollama/ollama

Preparation and startup steps

  • access must be provided to your X server for the docker container
    • for example, for the sake of simplicity by issuing xhost + on your host OS - this is said to be a security risk, find more secure method
  • start ollama with the script or its customized above
  • enter the ollama container e. g. by docker exec -it ollama /bin/bash
  • execute these commands in the container console: apt update && apt install libxext6 vulkan-tools
  • execute these commands on the host: docker stop ollama; docker start ollama
  • check whether the NVidia card is listed in the output after executing the vulkaninfo within the docker container, e. g.:
root@2cecad9abd02:/# vulkaninfo
...
Layers: count = 4
=================
VK_LAYER_INTEL_nullhw (INTEL NULL HW) Vulkan version 1.1.73, layer version 1:
        Layer Extensions: count = 0
        Devices: count = 3
                GPU id = 0 (NVIDIA GeForce GTX 770M)
                Layer-Device Extensions: count = 0
...
  • if it is, the Vulkan acceleration is supposed to be working and accelerated ollama is already running

Notes

  • if you delete the docker container after the preparation steps above, the changes will be deleted and you will have to repeat the libxext installation
  • a better solution is therefore to create a Docker image that contains libxext6
<!-- gh-comment-id:3706985880 --> @ktrd734 commented on GitHub (Jan 3, 2026): # Workaround I found a workaround for this and recently made it working on a NVidia GeForce GTX 770M (GK106M) on Ubuntu 24.04.3 LTS host OS running Xorg (_not wayland_), the `ollama` is dockerized. The main point is to add a library, `libxext6` into the ollama docker container and providing access to the host X server. `libxext6` is a dependency of a proprietary NVidia library, `/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0`, that, AFAIK, provides Vulkan library functionality or has a role in that. (With the default, non-debugging output and logging settings, failing to load this lib will just silently fail because of missing `libxext6`, so will the initialization attempt of the NVidia Vulkan backend fail.) ## Environment & prerequisites * using the current latest `ollama/ollama:latest` docker image * installing proprietary NVidia drivers suitable for your hardware * for me, these are `nvidia 470.256.02-0ubuntu0.24.04.1` packages * I have not tested with other versions * installing `nvidia-container-runtime` - this will inject required NVidia proprietary libraries into the docker container * the user is in the `video` and `render` groups (I am not sure whether this is required) ## My docker startup script Note that not everything is required to make Vulkan-based acceleration working: ``` export OLLAMA_VULKAN=1 export GGML_VK_VISIBLE_DEVICES=0 export __NV_PRIME_RENDER_OFFLOAD=1 export NVIDIA_VISIBLE_DEVICES=all export NVIDIA_DRIVER_CAPABILITIES=compute,utility export OLLAMA_DEBUG=0 export OLLAMA_KV_CACHE_TYPE=q8_0 export OLLAMA_FLASH_ATTENTION=true docker run -it \ --runtime=nvidia \ -v ollama:/root/.ollama -p 11434:11434 \ -v /tmp/.X11-unix:/tmp/.X11-unix \ --gpus=all \ --device=/dev/dri \ --group-add=video \ -e DISPLAY=:0 \ -e __NV_PRIME_RENDER_OFFLOAD=${__NV_PRIME_RENDER_OFFLOAD} \ -e GGML_VK_VISIBLE_DEVICES=${GGML_VK_VISIBLE_DEVICES} \ -e NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES} \ -e NVIDIA_DRIVER_CAPABILITIES=${NVIDIA_DRIVER_CAPABILITIES} \ -e OLLAMA_VULKAN=${OLLAMA_VULKAN} \ -e OLLAMA_DEBUG=${OLLAMA_DEBUG} \ -e OLLAMA_KV_CACHE_TYPE=${OLLAMA_KV_CACHE_TYPE} \ -e OLLAMA_FLASH_ATTENTION=${OLLAMA_FLASH_ATTENTION} \ --name ollama ollama/ollama ``` ## Preparation and startup steps * access must be provided to your X server for the docker container * for example, for the sake of simplicity by issuing `xhost +` on your host OS - **this is said to be a security risk**, find more secure method * start `ollama` with the script or its customized above * enter the `ollama` container e. g. by `docker exec -it ollama /bin/bash` * execute these commands in the container console: `apt update && apt install libxext6 vulkan-tools` * execute these commands on the host: `docker stop ollama; docker start ollama` * check whether the NVidia card is listed in the output after executing the `vulkaninfo` **within the docker container**, e. g.: ``` root@2cecad9abd02:/# vulkaninfo ... Layers: count = 4 ================= VK_LAYER_INTEL_nullhw (INTEL NULL HW) Vulkan version 1.1.73, layer version 1: Layer Extensions: count = 0 Devices: count = 3 GPU id = 0 (NVIDIA GeForce GTX 770M) Layer-Device Extensions: count = 0 ... ``` * if it is, the Vulkan acceleration is supposed to be working and accelerated ollama is already running ## Notes * if you delete the docker container after the preparation steps above, the changes will be deleted and you will have to repeat the `libxext` installation * a better solution is therefore to create a Docker image that contains `libxext6`
Author
Owner

@Arthur2500 commented on GitHub (Jan 21, 2026):

Here´s how I got it working without Xorg or any window manager:

On the host, the required Vulkan and NVIDIA EGL components were installed:

sudo apt install -y nvidia-vulkan-icd nvidia-vulkan-common vulkan-tools libegl-nvidia0
After installation, the NVIDIA EGL and GLX libraries were verified:

ldconfig -p | grep -E 'libEGL_nvidia\.so\.0|libGLX_nvidia\.so\.0'
When the Vulkan ICD definition for NVIDIA EGL was not available, it was added manually by creating the following file on the host:


/usr/share/vulkan/icd.d/nvidia_icd_egl.json

{
  "file_format_version": "1.0.0",
  "ICD": {
    "library_path": "libEGL_nvidia.so.0",
    "api_version": "1.3.277"
  }
}

That ICD file was then mounted into the container as read-only:

/usr/share/vulkan/icd.d/nvidia_icd_egl.json:/usr/share/vulkan/icd.d/nvidia_icd_egl.json:ro
To ensure the runtime does not select CUDA, CUDA was explicitly disabled and the driver capabilities were constrained via environment variables:

CUDA_VISIBLE_DEVICES=-1
NVIDIA_DRIVER_CAPABILITIES=graphics,utility,compute

After these changes, Ollama reported Vulkan as the active compute backend and detected both GPUs through Vulkan, including my AMD Vega 64 and NVIDIA 1080.

ollama  | time=2026-01-21T16:53:25.810Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0000-0000-0500-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="AMD Radeon RX Vega (RADV VEGA10)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:05.0 type=discrete total="8.0 GiB" available="8.0 GiB"
ollama  | time=2026-01-21T16:53:25.810Z level=INFO source=types.go:42 msg="inference compute" id=caa72bae-d396-1977-1a2c-12fbd6e4cf25 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="NVIDIA GeForce GTX 1080" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:07.0 type=discrete total="8.2 GiB" available="6.2 GiB"
ollama  | time=2026-01-21T16:53:25.810Z level=INFO source=routes.go:1708 msg="entering low vram mode" "total vram"="16.2 GiB" threshold="20.0 GiB"

Finally, ollama ps confirmed that the model (larger than vram of one gpu) was running fully on both GPUs:

ollama ps

NAME          ID              SIZE     PROCESSOR    CONTEXT    UNTIL
gemma3:12b    f4031aab637d    11 GB    100% GPU     4096       Forever
<!-- gh-comment-id:3779818454 --> @Arthur2500 commented on GitHub (Jan 21, 2026): Here´s how I got it working without Xorg or any window manager: On the host, the required Vulkan and NVIDIA EGL components were installed: `sudo apt install -y nvidia-vulkan-icd nvidia-vulkan-common vulkan-tools libegl-nvidia0 ` After installation, the NVIDIA EGL and GLX libraries were verified: `ldconfig -p | grep -E 'libEGL_nvidia\.so\.0|libGLX_nvidia\.so\.0' ` When the Vulkan ICD definition for NVIDIA EGL was not available, it was added manually by creating the following file on the host: ``` /usr/share/vulkan/icd.d/nvidia_icd_egl.json { "file_format_version": "1.0.0", "ICD": { "library_path": "libEGL_nvidia.so.0", "api_version": "1.3.277" } } ``` That ICD file was then mounted into the container as read-only: `/usr/share/vulkan/icd.d/nvidia_icd_egl.json:/usr/share/vulkan/icd.d/nvidia_icd_egl.json:ro ` To ensure the runtime does not select CUDA, CUDA was explicitly disabled and the driver capabilities were constrained via environment variables: ``` CUDA_VISIBLE_DEVICES=-1 NVIDIA_DRIVER_CAPABILITIES=graphics,utility,compute ``` After these changes, Ollama reported Vulkan as the active compute backend and detected both GPUs through Vulkan, including my AMD Vega 64 and NVIDIA 1080. ``` ollama | time=2026-01-21T16:53:25.810Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0000-0000-0500-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="AMD Radeon RX Vega (RADV VEGA10)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:05.0 type=discrete total="8.0 GiB" available="8.0 GiB" ollama | time=2026-01-21T16:53:25.810Z level=INFO source=types.go:42 msg="inference compute" id=caa72bae-d396-1977-1a2c-12fbd6e4cf25 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="NVIDIA GeForce GTX 1080" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:07.0 type=discrete total="8.2 GiB" available="6.2 GiB" ollama | time=2026-01-21T16:53:25.810Z level=INFO source=routes.go:1708 msg="entering low vram mode" "total vram"="16.2 GiB" threshold="20.0 GiB" ``` Finally, ollama ps confirmed that the model (larger than vram of one gpu) was running fully on both GPUs: ``` ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gemma3:12b f4031aab637d 11 GB 100% GPU 4096 Forever ```
Author
Owner

@Arthur2500 commented on GitHub (Jan 28, 2026):

nevermind, its now broken again.

<!-- gh-comment-id:3814448832 --> @Arthur2500 commented on GitHub (Jan 28, 2026): nevermind, its now broken again.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34514