[GH-ISSUE #14562] Vulkan backend fails to detect AMD Radeon 860M iGPU (gfx1152/RDNA 3.5) on Windows #35205

Open
opened 2026-04-22 19:34:51 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @vicduch on GitHub (Mar 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14562

System Info

  • OS: Windows 11 Build 26200
  • CPU: AMD Ryzen AI 7 350 (Zen 5)
  • GPU: AMD Radeon 860M (RDNA 3.5, gfx1152) — integrated GPU
  • VRAM: 8 GB allocated (shared from 32 GB system RAM)
  • Driver: AMD Adrenalin 24.30.62.28
  • Ollama: v0.17.4 and v0.17.5 (both affected)

Vulkan system status

vulkaninfo --summary correctly detects the GPU:

GPU0: AMD Radeon(TM) 860M Graphics
apiVersion = 1.3.302
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
vendorID = 0x1002
deviceID = 0x1114
driverID = DRIVER_ID_AMD_PROPRIETARY

Problem

With OLLAMA_VULKAN=1, the Vulkan discovery runner finds zero devices, even though Vulkan is fully functional at the system level. The runner falls back to CPU with total_vram="0 B".

Debug logs

With OLLAMA_DEBUG=true and OLLAMA_NEW_ENGINE=true:

1. Vulkan runner — no device found

bootstrap discovery took 690ms
OLLAMA_LIBRARY_PATH=[...\lib\ollama ...\lib\ollama\vulkan]
extra_envs=map[]

No GPU reported. Falls through to CPU.

2. ROCm runner — device found but rejected

verifying if device is supported
  library=...\lib\ollama\rocm
  description="AMD Radeon(TM) 860M Graphics"
  compute=gfx1152
  id=0
  pci_id=0000:63:00.0

filtering device which didn't fully initialize
  id=0
  libdir=...\lib\ollama\rocm
  pci_id=0000:63:00.0
  library=ROCm

Device is detected by ROCm but fails verification (gfx1152 not in supported list). Setting HSA_OVERRIDE_GFX_VERSION=11.0.0 does not help — device still gets filtered.

Final result

inference compute id=cpu library=cpu total="23.8 GiB"
vram-based default context total_vram="0 B"

Root cause hypothesis

Ollama bundles vulkan-1.dll v1.4.321.1 in lib/ollama/vulkan/. Replacing it with the system's vulkan-1.dll v1.3.300.0 (which vulkaninfo uses successfully) did not fix the issue. The problem appears to be in ggml-vulkan.dll's device enumeration — it may be filtering out integrated GPUs with UMA (unified memory architecture) or devices reporting 0 bytes of dedicated VRAM.

Proof that Vulkan works on this GPU

KoboldCpp v1.108.2 using its own build of ggml-vulkan detects the GPU immediately:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 860M Graphics (AMD proprietary driver)
  | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64
  | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat

llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon(TM) 860M Graphics)
load_tensors: offloaded 33/33 layers to GPU

Performance: Prefill 750 tok/s, Generation 8.6 tok/s — fully functional.

Expected behavior

OLLAMA_VULKAN=1 should detect and use the AMD Radeon 860M via Vulkan, as KoboldCpp does successfully with the same hardware, driver, and Vulkan API.

Workaround

Use KoboldCpp with --usevulkan --gpulayers 99 pointing to the GGUF blob in ~/.ollama/models/blobs/.

Originally created by @vicduch on GitHub (Mar 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14562 ## System Info - **OS:** Windows 11 Build 26200 - **CPU:** AMD Ryzen AI 7 350 (Zen 5) - **GPU:** AMD Radeon 860M (RDNA 3.5, gfx1152) — integrated GPU - **VRAM:** 8 GB allocated (shared from 32 GB system RAM) - **Driver:** AMD Adrenalin 24.30.62.28 - **Ollama:** v0.17.4 and v0.17.5 (both affected) ## Vulkan system status `vulkaninfo --summary` correctly detects the GPU: ``` GPU0: AMD Radeon(TM) 860M Graphics apiVersion = 1.3.302 deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU vendorID = 0x1002 deviceID = 0x1114 driverID = DRIVER_ID_AMD_PROPRIETARY ``` ## Problem With `OLLAMA_VULKAN=1`, the Vulkan discovery runner finds **zero devices**, even though Vulkan is fully functional at the system level. The runner falls back to CPU with `total_vram="0 B"`. ## Debug logs With `OLLAMA_DEBUG=true` and `OLLAMA_NEW_ENGINE=true`: ### 1. Vulkan runner — no device found ``` bootstrap discovery took 690ms OLLAMA_LIBRARY_PATH=[...\lib\ollama ...\lib\ollama\vulkan] extra_envs=map[] ``` No GPU reported. Falls through to CPU. ### 2. ROCm runner — device found but rejected ``` verifying if device is supported library=...\lib\ollama\rocm description="AMD Radeon(TM) 860M Graphics" compute=gfx1152 id=0 pci_id=0000:63:00.0 filtering device which didn't fully initialize id=0 libdir=...\lib\ollama\rocm pci_id=0000:63:00.0 library=ROCm ``` Device is detected by ROCm but fails verification (gfx1152 not in supported list). Setting `HSA_OVERRIDE_GFX_VERSION=11.0.0` does not help — device still gets filtered. ### Final result ``` inference compute id=cpu library=cpu total="23.8 GiB" vram-based default context total_vram="0 B" ``` ## Root cause hypothesis Ollama bundles `vulkan-1.dll` v1.4.321.1 in `lib/ollama/vulkan/`. Replacing it with the system's `vulkan-1.dll` v1.3.300.0 (which `vulkaninfo` uses successfully) did **not** fix the issue. The problem appears to be in `ggml-vulkan.dll`'s device enumeration — it may be filtering out integrated GPUs with UMA (unified memory architecture) or devices reporting 0 bytes of dedicated VRAM. ## Proof that Vulkan works on this GPU **KoboldCpp v1.108.2** using its own build of `ggml-vulkan` detects the GPU immediately: ``` ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon(TM) 860M Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon(TM) 860M Graphics) load_tensors: offloaded 33/33 layers to GPU ``` Performance: Prefill 750 tok/s, Generation 8.6 tok/s — fully functional. ## Expected behavior `OLLAMA_VULKAN=1` should detect and use the AMD Radeon 860M via Vulkan, as KoboldCpp does successfully with the same hardware, driver, and Vulkan API. ## Workaround Use KoboldCpp with `--usevulkan --gpulayers 99` pointing to the GGUF blob in `~/.ollama/models/blobs/`.
Author
Owner

@Jasdfgh commented on GitHub (Mar 12, 2026):

nice writeup. the koboldcpp comparison is really the key piece — their ggml-vulkan picks up the 860M fine, so it's not a driver or system-level thing, it's somewhere in how ollama's vulkan build does device enumeration. probably filtering out devices with 0 dedicated VRAM or something UMA-related?

the vulkan backend is actually maintained upstream at ggml-org/llama.cpp, not in ollama itself — so might be worth cross-filing there. your logs from koboldcpp showing it working on the exact same hardware would make it pretty easy for them to narrow down.

<!-- gh-comment-id:4045392118 --> @Jasdfgh commented on GitHub (Mar 12, 2026): nice writeup. the koboldcpp comparison is really the key piece — their ggml-vulkan picks up the 860M fine, so it's not a driver or system-level thing, it's somewhere in how ollama's vulkan build does device enumeration. probably filtering out devices with 0 dedicated VRAM or something UMA-related? the vulkan backend is actually maintained upstream at ggml-org/llama.cpp, not in ollama itself — so might be worth cross-filing there. your logs from koboldcpp showing it working on the exact same hardware would make it pretty easy for them to narrow down.
Author
Owner

@hidp123 commented on GitHub (Apr 2, 2026):

Any solution after to get Ollama to use the GPU on AMD Ryzen AI 7 350 w/ Radeon 860M?

<!-- gh-comment-id:4179154941 --> @hidp123 commented on GitHub (Apr 2, 2026): Any solution after to get Ollama to use the GPU on AMD Ryzen AI 7 350 w/ Radeon 860M?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35205