[PR #15509] [CLOSED] Add OLLAMA_SKIP_GPU_VALIDATION env var to bypass broken GPU validation on Strix Halo (gfx1151) #77475

Closed
opened 2026-05-05 10:08:23 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15509
Author: @JeremiahM37
Created: 4/11/2026
Status: Closed

Base: mainHead: strix-halo-gpu-fix


📝 Commits (1)

  • 6d333bb Add OLLAMA_SKIP_GPU_VALIDATION env var to bypass broken GPU validation

📊 Changes

3 files changed (+21 additions, -0 deletions)

View changed files

📝 envconfig/config.go (+6 -0)
📝 ml/device.go (+6 -0)
📝 runner/ollamarunner/runner.go (+9 -0)

📄 Description

Problem

The GPU validation subprocess added in 0.18+ silently filters out AMD GPUs that crash during the deep init check. This affects AMD Strix Halo (gfx1151) and is reported in:

  • #15336 — "ollama 17.7 last version working on strix halo, all 18.x fallback to cpu"
  • #13589 — "gfx1151 silently falls back to CPU on Linux despite rocminfo detecting GPU"
  • #15261 — "Vulkan causing unrelated output with gemma4:e4b (AMD/Ryzen iGPU)"

Root cause

Two separate crashes prevent gfx1151 from working on 0.18+:

1. Bootstrap validation crash

NeedsInitValidation() triggers a runner subprocess with GGML_CUDA_INIT=1 that calls rocblas_initialize(). On gfx1151 with the bundled ROCm libraries, this crashes because TensileLibrary_lazy_gfx1151.dat cannot be loaded from the expected hipblaslt path. The Go discovery code interprets the empty subprocess output as "filtering device which didn't fully initialize" and removes the GPU.

2. Worst-case graph reservation crash

Even after working around the bootstrap, reserveWorstCaseGraph() in the new ollamarunner calls ggml_backend_sched_reserve() which crashes with SIGSEGV inside libamdhip64 — a HIP runtime memory allocator bug specific to gfx1151.

Fix

This patch adds an OLLAMA_SKIP_GPU_VALIDATION env var that:

  1. Skips NeedsInitValidation() for ROCm/CUDA devices (so the bootstrap subprocess uses bare device enumeration without the crashing rocblas init)
  2. Skips reserveWorstCaseGraph() in ollamarunner.allocModel() (memory is allocated lazily during inference instead, which works fine in practice)

The user takes responsibility for ensuring their GPU is actually compatible. This is documented in the env var description.

Tested

  • Hardware: AMD Ryzen AI MAX+ PRO 395 (Strix Halo, gfx1151), 96GB GTT
  • OS: Debian 12 in unprivileged Proxmox LXC, kernel 6.17
  • Drivers: mesa-vulkan-drivers 25.0.7 from bookworm-backports
  • Ollama: built from this branch

Results with OLLAMA_SKIP_GPU_VALIDATION=1 and OLLAMA_VULKAN=1:

Model Backend Avg latency (warm) Tokens/call
qwen3.5:4b Vulkan (gfx1151) 1.89s ~63
qwen3.5:4b CPU (without patch) 15.6s ~155

Performance via Vulkan is comparable to or faster than 0.17.7 with native ROCm support. Full 33/33 layers offload to GPU. KHR_coopmat cooperative matrix support is active.

Risk

  • Low blast radius: opt-in via env var, no behavior change for users who don't set it
  • No new dependencies: uses existing envconfig package
  • Backwards compatible: existing GPU validation logic untouched

Future work

The underlying bugs in rocblas tensile loading and HIP memory allocator should ideally be fixed upstream, but this gives Strix Halo users a working escape hatch in the meantime without forking.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15509 **Author:** [@JeremiahM37](https://github.com/JeremiahM37) **Created:** 4/11/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `strix-halo-gpu-fix` --- ### 📝 Commits (1) - [`6d333bb`](https://github.com/ollama/ollama/commit/6d333bbed154ab95b3d9bbf938be711e4115e80e) Add OLLAMA_SKIP_GPU_VALIDATION env var to bypass broken GPU validation ### 📊 Changes **3 files changed** (+21 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+6 -0) 📝 `ml/device.go` (+6 -0) 📝 `runner/ollamarunner/runner.go` (+9 -0) </details> ### 📄 Description ## Problem The GPU validation subprocess added in 0.18+ silently filters out AMD GPUs that crash during the deep init check. This affects AMD Strix Halo (gfx1151) and is reported in: - #15336 — "ollama 17.7 last version working on strix halo, all 18.x fallback to cpu" - #13589 — "gfx1151 silently falls back to CPU on Linux despite rocminfo detecting GPU" - #15261 — "Vulkan causing unrelated output with gemma4:e4b (AMD/Ryzen iGPU)" ## Root cause Two separate crashes prevent gfx1151 from working on 0.18+: **1. Bootstrap validation crash** `NeedsInitValidation()` triggers a runner subprocess with `GGML_CUDA_INIT=1` that calls `rocblas_initialize()`. On gfx1151 with the bundled ROCm libraries, this crashes because `TensileLibrary_lazy_gfx1151.dat` cannot be loaded from the expected hipblaslt path. The Go discovery code interprets the empty subprocess output as `"filtering device which didn't fully initialize"` and removes the GPU. **2. Worst-case graph reservation crash** Even after working around the bootstrap, `reserveWorstCaseGraph()` in the new `ollamarunner` calls `ggml_backend_sched_reserve()` which crashes with SIGSEGV inside libamdhip64 — a HIP runtime memory allocator bug specific to gfx1151. ## Fix This patch adds an `OLLAMA_SKIP_GPU_VALIDATION` env var that: 1. Skips `NeedsInitValidation()` for ROCm/CUDA devices (so the bootstrap subprocess uses bare device enumeration without the crashing rocblas init) 2. Skips `reserveWorstCaseGraph()` in `ollamarunner.allocModel()` (memory is allocated lazily during inference instead, which works fine in practice) The user takes responsibility for ensuring their GPU is actually compatible. This is documented in the env var description. ## Tested - **Hardware**: AMD Ryzen AI MAX+ PRO 395 (Strix Halo, gfx1151), 96GB GTT - **OS**: Debian 12 in unprivileged Proxmox LXC, kernel 6.17 - **Drivers**: mesa-vulkan-drivers 25.0.7 from bookworm-backports - **Ollama**: built from this branch **Results with `OLLAMA_SKIP_GPU_VALIDATION=1` and `OLLAMA_VULKAN=1`**: | Model | Backend | Avg latency (warm) | Tokens/call | |---|---|---|---| | qwen3.5:4b | Vulkan (gfx1151) | **1.89s** | ~63 | | qwen3.5:4b | CPU (without patch) | 15.6s | ~155 | Performance via Vulkan is comparable to or faster than 0.17.7 with native ROCm support. Full 33/33 layers offload to GPU. `KHR_coopmat` cooperative matrix support is active. ## Risk - **Low blast radius**: opt-in via env var, no behavior change for users who don't set it - **No new dependencies**: uses existing `envconfig` package - **Backwards compatible**: existing GPU validation logic untouched ## Future work The underlying bugs in rocblas tensile loading and HIP memory allocator should ideally be fixed upstream, but this gives Strix Halo users a working escape hatch in the meantime without forking. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:08:23 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77475