[GH-ISSUE #14702] ROCm error: invalid device function in ggml_cuda_mul_mat_q on RX 6750 GRE (gfx1031) with ollama-for-amd patch on Windows #35272

Closed
opened 2026-04-22 19:39:58 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ducheng on GitHub (Mar 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14702

What is the issue?

This happens on both HSA_OVERRIDE_GFX_VERSION=10.3.0 and 10.3.1.

Environment

  • OS: Windows 10/11 (assuming from path C:/a/ollama/...)
  • Ollama version: 0.17.7 (also tested v0.11.4, same crash)
  • GPU: AMD Radeon RX 6750 GRE 10GB (gfx1031, Navi 22)
  • Driver: AMD Adrenalin 60450.10 (or latest, same issue)
  • Patch used: ollama-for-amd (from https://github.com/likelovewant/ollama-for-amd) + ROCm libs (tried rocm.gfx1031.for.hip.sdk.6.1.2, 6.2.4, littlewu's logic variants)
  • Model: qwen2.5:7b (Q4_K_M), also tested smaller models
  • Environment variables: HSA_OVERRIDE_GFX_VERSION=10.3.0 (or 10.3.1), OLLAMA_DEBUG=1

Steps to reproduce

  1. Install Ollama Windows
  2. Replace rocm folder and rocblas library with gfx1031 patch from likelovewant/ollama-for-amd or ROCmLibs-for-gfx1103-AMD780M-APU
  3. Set HSA_OVERRIDE_GFX_VERSION=10.3.0
  4. Run ollama serve
  5. Run ollama run qwen2.5:7b or any model → crash with above error

Logs snippet

llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context: ROCm_Host output buffer size = 0.59 MiB
llama_kv_cache: ROCm0 KV buffer size = 224.00 MiB
llama_kv_cache: size = 224.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 112.00 MiB, V (f16): 112.00 MiB
llama_context: Flash Attention was auto, set to enabled
ROCm error: invalid device function
current device: 0, in function ggml_cuda_mul_mat_q at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/mmq.cu:128
hipGetLastError()
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-08T07:24:03.618+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error"
time=2026-03-08T07:24:03.774+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1"
time=2026-03-08T07:24:03.870+08:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\Administrator.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 error="llama runner process has terminated: ROCm error"
[GIN] 2026/03/08 - 07:24:03 | 500 | 7.3890475s | 127.0.0.1 | POST "/api/generate"

Additional info

Expected behavior

Model loads and offloads layers to ROCm without kernel execution error.

Possible cause

Likely ggml-cuda/hip kernel regression in newer Ollama versions not compatible with gfx1031 patched rocblas library (Tensile kernels not rebuilt for the new ggml interface?).

Thanks for the great project! Happy to provide more logs or test patches.

Relevant log output


OS

windows 10

GPU

AMD Radeon RX 6750 GRE 10GB

CPU

i5-13400F

Ollama version

0.17.7

Originally created by @ducheng on GitHub (Mar 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14702 ### What is the issue? This happens on both HSA_OVERRIDE_GFX_VERSION=10.3.0 and 10.3.1. ### Environment - OS: Windows 10/11 (assuming from path C:/a/ollama/...) - Ollama version: 0.17.7 (also tested v0.11.4, same crash) - GPU: AMD Radeon RX 6750 GRE 10GB (gfx1031, Navi 22) - Driver: AMD Adrenalin 60450.10 (or latest, same issue) - Patch used: ollama-for-amd (from https://github.com/likelovewant/ollama-for-amd) + ROCm libs (tried rocm.gfx1031.for.hip.sdk.6.1.2, 6.2.4, littlewu's logic variants) - Model: qwen2.5:7b (Q4_K_M), also tested smaller models - Environment variables: HSA_OVERRIDE_GFX_VERSION=10.3.0 (or 10.3.1), OLLAMA_DEBUG=1 ### Steps to reproduce 1. Install Ollama Windows 2. Replace rocm folder and rocblas library with gfx1031 patch from likelovewant/ollama-for-amd or ROCmLibs-for-gfx1103-AMD780M-APU 3. Set HSA_OVERRIDE_GFX_VERSION=10.3.0 4. Run `ollama serve` 5. Run `ollama run qwen2.5:7b` or any model → crash with above error ### Logs snippet llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 10000.0 llama_context: freq_scale = 1 llama_context: n_ctx_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_context: ROCm_Host output buffer size = 0.59 MiB llama_kv_cache: ROCm0 KV buffer size = 224.00 MiB llama_kv_cache: size = 224.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 112.00 MiB, V (f16): 112.00 MiB llama_context: Flash Attention was auto, set to enabled ROCm error: invalid device function current device: 0, in function ggml_cuda_mul_mat_q at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/mmq.cu:128 hipGetLastError() C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-08T07:24:03.618+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error" time=2026-03-08T07:24:03.774+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" time=2026-03-08T07:24:03.870+08:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\Administrator\.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 error="llama runner process has terminated: ROCm error" [GIN] 2026/03/08 - 07:24:03 | 500 | 7.3890475s | 127.0.0.1 | POST "/api/generate" ### Additional info ### Expected behavior Model loads and offloads layers to ROCm without kernel execution error. ### Possible cause Likely ggml-cuda/hip kernel regression in newer Ollama versions not compatible with gfx1031 patched rocblas library (Tensile kernels not rebuilt for the new ggml interface?). Thanks for the great project! Happy to provide more logs or test patches. ### Relevant log output ```shell ``` ### OS windows 10 ### GPU AMD Radeon RX 6750 GRE 10GB ### CPU i5-13400F ### Ollama version 0.17.7
GiteaMirror added the bug label 2026-04-22 19:39:58 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 8, 2026):

Problems with ollama-for-amd should be posted in the issue tracker for ollama-for-amd. As a possible alternative, the Vulkan backend in ollama may support your GPU.

<!-- gh-comment-id:4018852035 --> @rick-github commented on GitHub (Mar 8, 2026): Problems with ollama-for-amd should be posted in the [issue tracker](https://github.com/likelovewant/ollama-for-amd/issues) for ollama-for-amd. As a possible alternative, the [Vulkan](https://docs.ollama.com/gpu#vulkan-gpu-support) backend in ollama may support your GPU.
Author
Owner

@Jasdfgh commented on GitHub (Mar 9, 2026):

The crash at mmq.cu:128 is in ggml's own HIP kernel, not in rocBLAS — so replacing rocBLAS/ROCm libraries won't help here. The official Ollama binary doesn't include gfx1031 in its build targets, and HSA_OVERRIDE_GFX_VERSION doesn't work on Windows (same root cause as #7694).

Two paths forward:

  1. Easiest: try Vulkan. Set OLLAMA_VULKAN=1 — bypasses ROCm entirely, no patching needed. Docs here. It's experimental but should work with your GPU.
  2. ROCm path: You need the ollama-for-amd binary (the ollama-windows-amd64.7z that replaces the whole rocm directory), not just the rocBLAS libs. That fork compiles with gfx1031 natively. Note: latest release (v0.16.1) requires ROCm 6.4.2 libs, not the 6.1.2/6.2.4 you tried.

From your steps it looks like you installed official Ollama and only swapped the ROCm/rocblas files — can you confirm?

<!-- gh-comment-id:4021658223 --> @Jasdfgh commented on GitHub (Mar 9, 2026): The crash at mmq.cu:128 is in ggml's own HIP kernel, not in rocBLAS — so replacing rocBLAS/ROCm libraries won't help here. The official Ollama binary doesn't include gfx1031 in its build targets, and HSA_OVERRIDE_GFX_VERSION [doesn't work on Windows](https://github.com/ollama/ollama/issues/7694#issuecomment-2484191539) (same root cause as #7694). Two paths forward: 1. Easiest: try Vulkan. Set OLLAMA_VULKAN=1 — bypasses ROCm entirely, no patching needed. [Docs here](https://docs.ollama.com/gpu#vulkan-gpu-support). It's experimental but should work with your GPU. 2. ROCm path: You need the [ollama-for-amd](https://github.com/likelovewant/ollama-for-amd) binary (the ollama-windows-amd64.7z that replaces the whole rocm directory), not just the rocBLAS libs. That fork compiles with gfx1031 natively. Note: latest release (v0.16.1) requires [ROCm 6.4.2 libs](https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.6.4.2), not the 6.1.2/6.2.4 you tried. From your steps it looks like you installed official Ollama and only swapped the ROCm/rocblas files — can you confirm?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35272