[GH-ISSUE #15752] CUDA_VISIBLE_DEVICES=-1 incorrectly suppresses ROCm GPU detection #72101

Open
opened 2026-05-05 03:28:55 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @galuszkak on GitHub (Apr 22, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15752

What is the issue?

Setting CUDA_VISIBLE_DEVICES=-1 to hide NVIDIA GPUs unexpectedly causes the AMD ROCm GPU to disappear from inference compute as well, leaving only CPU available.

Steps to reproduce:

  1. Start ollama serve without restrictions — both CUDA and ROCm GPUs are discovered:

    inference compute ... library=CUDA ... name=CUDA0 description="NVIDIA GeForce RTX 5070 Laptop GPU"
    inference compute ... library=ROCm ... name=ROCm0 description="AMD Radeon 890M Graphics"
    
  2. Stop the server, then run:

    CUDA_VISIBLE_DEVICES=-1 ollama serve
    
  3. Observe that only CPU is listed under inference compute; the ROCm GPU is no longer detected.

Expected behavior:

CUDA_VISIBLE_DEVICES should only affect CUDA-visible devices. The AMD iGPU (Radeon 890M) should still be discovered via ROCm and available for inference when this variable is set.

Actual behavior:

The ROCm GPU is completely omitted from the compute list, forcing CPU-only inference despite the AMD iGPU being present and functional.

Relevant log output

galuszkak@Thunder:~$ ollama serve 
time=2026-04-22T20:37:41.008+02:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/galuszkak/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
[...]
time=2026-04-22T20:37:42.087+02:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-0eee6825-5aa8-a022-085a-6c135272070d filter_id="" library=CUDA compute=12.0 name=CUDA0 description="NVIDIA GeForce RTX 5070 Laptop GPU" libdirs=ollama,cuda_v13 driver=13.0 pci_id=0000:c1:00.0 type=discrete total="8.0 GiB" available="7.2 GiB"
time=2026-04-22T20:37:42.087+02:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1150 name=ROCm0 description="AMD Radeon 890M Graphics" libdirs=ollama,rocm driver=70253.21 pci_id=0000:c2:00.0 type=iGPU total="35.6 GiB" available="35.4 GiB"
time=2026-04-22T20:37:42.087+02:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="43.6 GiB" default_num_ctx=32768


and here with CUDA_VISIBLE_DEVICES=-1:

galuszkak@Thunder:~$ CUDA_VISIBLE_DEVICES=-1 ollama serve 
time=2026-04-22T20:37:54.239+02:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES:-1 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/galuszkak/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
[...]
time=2026-04-22T20:37:54.241+02:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-22T20:37:54.241+02:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=-1
time=2026-04-22T20:37:54.242+02:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2026-04-22T20:37:54.242+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46377"
time=2026-04-22T20:37:54.318+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36391"
time=2026-04-22T20:37:54.394+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41067"
time=2026-04-22T20:37:54.576+02:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-04-22T20:37:54.576+02:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="23.3 GiB" available="20.4 GiB"
time=2026-04-22T20:37:54.576+02:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096

OS

Linux

GPU

Nvidia, AMD

CPU

AMD

Ollama version

0.21.0

Originally created by @galuszkak on GitHub (Apr 22, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15752 ### What is the issue? Setting `CUDA_VISIBLE_DEVICES=-1` to hide NVIDIA GPUs unexpectedly causes the AMD ROCm GPU to disappear from inference compute as well, leaving only CPU available. Steps to reproduce: 1. Start `ollama serve` without restrictions — both CUDA and ROCm GPUs are discovered: ``` inference compute ... library=CUDA ... name=CUDA0 description="NVIDIA GeForce RTX 5070 Laptop GPU" inference compute ... library=ROCm ... name=ROCm0 description="AMD Radeon 890M Graphics" ``` 2. Stop the server, then run: ``` CUDA_VISIBLE_DEVICES=-1 ollama serve ``` 3. Observe that only CPU is listed under inference compute; the ROCm GPU is no longer detected. **Expected behavior:** `CUDA_VISIBLE_DEVICES` should only affect CUDA-visible devices. The AMD iGPU (Radeon 890M) should still be discovered via ROCm and available for inference when this variable is set. **Actual behavior:** The ROCm GPU is completely omitted from the compute list, forcing CPU-only inference despite the AMD iGPU being present and functional. ### Relevant log output ```shell galuszkak@Thunder:~$ ollama serve time=2026-04-22T20:37:41.008+02:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/galuszkak/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" [...] time=2026-04-22T20:37:42.087+02:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-0eee6825-5aa8-a022-085a-6c135272070d filter_id="" library=CUDA compute=12.0 name=CUDA0 description="NVIDIA GeForce RTX 5070 Laptop GPU" libdirs=ollama,cuda_v13 driver=13.0 pci_id=0000:c1:00.0 type=discrete total="8.0 GiB" available="7.2 GiB" time=2026-04-22T20:37:42.087+02:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1150 name=ROCm0 description="AMD Radeon 890M Graphics" libdirs=ollama,rocm driver=70253.21 pci_id=0000:c2:00.0 type=iGPU total="35.6 GiB" available="35.4 GiB" time=2026-04-22T20:37:42.087+02:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="43.6 GiB" default_num_ctx=32768 and here with CUDA_VISIBLE_DEVICES=-1: galuszkak@Thunder:~$ CUDA_VISIBLE_DEVICES=-1 ollama serve time=2026-04-22T20:37:54.239+02:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES:-1 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/galuszkak/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" [...] time=2026-04-22T20:37:54.241+02:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-22T20:37:54.241+02:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=-1 time=2026-04-22T20:37:54.242+02:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-04-22T20:37:54.242+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46377" time=2026-04-22T20:37:54.318+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36391" time=2026-04-22T20:37:54.394+02:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41067" time=2026-04-22T20:37:54.576+02:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-04-22T20:37:54.576+02:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="23.3 GiB" available="20.4 GiB" time=2026-04-22T20:37:54.576+02:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 ``` ### OS Linux ### GPU Nvidia, AMD ### CPU AMD ### Ollama version 0.21.0
GiteaMirror added the bug label 2026-05-05 03:28:55 -05:00
Author
Owner

@galuszkak commented on GitHub (Apr 25, 2026):

After doing some investigation I found this in ROCm documentation:
https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#cuda-visible-devices

CUDA_VISIBLE_DEVICES
Provided for CUDA compatibility, has the same effect as HIP_VISIBLE_DEVICES on the AMD platform.

Unsure now what to do with this - it looks like I should select on CUDA_VISIBLE_DEVICES also AMD devices? Not sure if this bug then should be closed because of that - or at least put note in documentation so people are also aware of this behaviour as it's not easy to find.

<!-- gh-comment-id:4317828675 --> @galuszkak commented on GitHub (Apr 25, 2026): After doing some investigation I found this in ROCm documentation: https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#cuda-visible-devices > CUDA_VISIBLE_DEVICES > Provided for CUDA compatibility, has the same effect as HIP_VISIBLE_DEVICES on the AMD platform. Unsure now what to do with this - it looks like I should select on CUDA_VISIBLE_DEVICES also AMD devices? Not sure if this bug then should be closed because of that - or at least put note in documentation so people are also aware of this behaviour as it's not easy to find.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72101