[GH-ISSUE #15420] Bug: --ollama-engine runner ignores GGML_CUDA_INIT=1, breaking ROCm bootstrap on gfx1151 APUs #35619

New Issue

GiteaMirror · 2026-04-22T20:15:50-05:00

GiteaMirror commented

2026-04-22 20:15:50 -05:00

Originally created by @ftmng on GitHub (Apr 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15420

Bug: `--ollama-engine` runner ignores `GGML_CUDA_INIT=1`, breaking ROCm bootstrap on gfx1151 APUs

System Information

CPU/GPU: AMD Ryzen AI MAX+ 395 (Strix Halo) — Radeon 8060S Graphics (gfx1151)
RAM: 128 GB unified memory (~92 GB available as GPU VRAM via Vulkan, ~64 GB dedicated VRAM)
OS: Ubuntu 25.10, kernel 6.x
ROCm: 7.2.1 at /opt/rocm-7.2.1
Ollama: v0.20.3
Vulkan: mesa-vulkan-drivers 25.2.8, libvulkan1 1.4.321

Summary

Ollama 0.20.3 uses a two-step GPU bootstrap:

Discovery probe (no GGML_CUDA_INIT) → finds GPU ✓
Verification probe (with GGML_CUDA_INIT=1) → supposed to confirm GPU can init → always fails ✗

The verification probe starts the new --ollama-engine runner subprocess, which:

Starts and listens on a port
Returns {"status":2,"progress":0} from /health
Never opens /dev/kfd — GPU is never initialized
Status stays at 2 forever (never becomes 1 = "ready")
After ~91-101ms Ollama kills it and logs: filtering device which didn't fully initialize

Root cause: GGML_CUDA_INIT=1 was an env var for the old llama.cpp/ggml runner. The new --ollama-engine runner (Go-based) does not implement it. The runner ignores the variable entirely and never initializes the GPU during the bootstrap health check. This means ROCm devices that require NeedsInitValidation() (all ROCm and CUDA devices per ml/device.go:541) are always filtered out on 0.20.x.

What Works

rocminfo + rocm-smi recognize the GPU with HSA_OVERRIDE_GFX_VERSION=11.5.1
gfx1151 kernels confirmed compiled into /usr/local/lib/ollama/rocm/libggml-hip.so
All ROCm shared libraries load fine (ldd shows no missing deps)
ollama user is in render + video groups → has /dev/kfd and /dev/dri access
Discovery probe finds the GPU correctly (finds "Radeon 8060S Graphics, gfx1151")
Vulkan backend works with OLLAMA_VULKAN=1 — 4.12 tok/s on qwen2.5:72b-instruct-q5_K_M (100% GPU, 92 GiB available)
CPU inference works but at half speed (2.13 tok/s on same model)

Detailed Evidence

1. Discovery succeeds, verification fails

From OLLAMA_DEBUG=1 journal output:

discovering available GPUs...
bootstrap discovery took duration=189.56ms OLLAMA_LIBRARY_PATH="[... /rocm]"
evaluating which, if any, devices to filter out initial_count=1
verifying if device is supported library=/usr/local/lib/ollama/rocm description="Radeon 8060S Graphics" compute=gfx1151 id=0
starting runner cmd="/usr/local/bin/ollama runner --ollama-engine --port 37243"
subprocess ... LD_LIBRARY_PATH=.../rocm ROCR_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1
bootstrap discovery took duration=101.133ms ... extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
filtering device which didn't fully initialize id=0 libdir=/usr/local/lib/ollama/rocm library=ROCm
inference compute id=cpu library=cpu ... total="61.2 GiB" available="54.4 GiB"

2. Runner permanently stuck at status:2

Manually starting the runner with the exact same env vars Ollama uses:

HSA_OVERRIDE_GFX_VERSION=11.5.1 \
LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm \
ROCR_VISIBLE_DEVICES=0 \
GGML_CUDA_INIT=1 \
/usr/local/bin/ollama runner --ollama-engine --port 55555

Polling health:

sleep 1;  curl -s http://127.0.0.1:55555/health → {"status":2,"progress":0}
sleep 5;  curl -s http://127.0.0.1:55555/health → {"status":2,"progress":0}
sleep 10; curl -s http://127.0.0.1:55555/health → {"status":2,"progress":0}

Same result without GGML_CUDA_INIT=1 — confirming the variable is completely ignored.

lsof confirms the runner never opens /dev/kfd — the GPU is never touched.

3. Vulkan works perfectly

With OLLAMA_VULKAN=1 and VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.json:

inference compute id=... library=Vulkan name=Vulkan0 description="AMD Radeon Graphics (RADV GFX1151)" total="94.6 GiB" available="92.2 GiB"

ollama ps → qwen2.5:72b-instruct-q5_K_M  68 GB  100% GPU
eval rate: 4.12 tokens/s

4. OLLAMA_NEW_ENGINE=false has no effect

Setting OLLAMA_NEW_ENGINE=false in the systemd override does not change behavior — the runner still starts with --ollama-engine.

Code Analysis

The issue is in the interaction between two functions in ml/device.go:

// Line 538
func (d DeviceInfo) NeedsInitValidation() bool {
    return d.Library == "ROCm" || d.Library == "CUDA"
}

// Line 545
func (d DeviceInfo) AddInitValidation(env map[string]string) {
    env["GGML_CUDA_INIT"] = "1"
}

And discover/runner.go line 152:

if len(bootstrapDevices(ctx2ndPass, devices[i].LibraryPath, extraEnvs)) == 0 {
    // filtering device which didn't fully initialize
    needsDelete[i] = true
}

The old llama.cpp runner honored GGML_CUDA_INIT=1 by deeply initializing the GPU during startup. The new --ollama-engine runner doesn't implement this — it starts in an idle state (status:2) waiting for a model load, regardless of GGML_CUDA_INIT.

Proposed Fix

Option A (minimal, targeted): Skip init validation for gfx115x APUs since the verification mechanism is broken:

func (d DeviceInfo) NeedsInitValidation() bool {
    if d.Library == "ROCm" && strings.HasPrefix(d.Compute(), "gfx115") {
        return false
    }
    return d.Library == "ROCm" || d.Library == "CUDA"
}

Option B (proper fix): Make the --ollama-engine runner implement GGML_CUDA_INIT=1 by actually initializing the GPU backend during startup and reporting status:1 once successful.

Option C (pragmatic): Increase the bootstrap timeout and have the runner respond to GGML_CUDA_INIT=1 with a quick GPU probe that returns status:1 if /dev/kfd is accessible and the ROCm runtime loads.

Current Workaround

Vulkan backend via systemd override:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="OLLAMA_KEEP_ALIVE=-1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_VULKAN=1"
Environment="VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.json"

This works (4.12 tok/s on 72B) but native ROCm should be significantly faster based on community reports (~40 tok/s on 30B models with v0.18.0 where the old runner still worked).

#14855 — Working ROCm guide for gfx1151 (uses v0.18.0, before the --ollama-engine change)
#13589 — gfx1151 silently falls back to CPU on Linux
#12062 — APU VRAM/GTT memory reporting issue
rjmalagon/ollama-linux-amd-apu#37 — Identical bootstrap failure with ROCm v7 and gfx1151

Originally created by @ftmng on GitHub (Apr 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15420 # Bug: `--ollama-engine` runner ignores `GGML_CUDA_INIT=1`, breaking ROCm bootstrap on gfx1151 APUs ## System Information - **CPU/GPU**: AMD Ryzen AI MAX+ 395 (Strix Halo) — Radeon 8060S Graphics (gfx1151) - **RAM**: 128 GB unified memory (~92 GB available as GPU VRAM via Vulkan, ~64 GB dedicated VRAM) - **OS**: Ubuntu 25.10, kernel 6.x - **ROCm**: 7.2.1 at `/opt/rocm-7.2.1` - **Ollama**: v0.20.3 - **Vulkan**: mesa-vulkan-drivers 25.2.8, libvulkan1 1.4.321 ## Summary Ollama 0.20.3 uses a two-step GPU bootstrap: 1. **Discovery probe** (no `GGML_CUDA_INIT`) → finds GPU ✓ 2. **Verification probe** (with `GGML_CUDA_INIT=1`) → supposed to confirm GPU can init → **always fails** ✗ The verification probe starts the new `--ollama-engine` runner subprocess, which: - Starts and listens on a port - Returns `{"status":2,"progress":0}` from `/health` - **Never opens `/dev/kfd`** — GPU is never initialized - Status stays at 2 **forever** (never becomes 1 = "ready") - After ~91-101ms Ollama kills it and logs: `filtering device which didn't fully initialize` **Root cause**: `GGML_CUDA_INIT=1` was an env var for the old llama.cpp/ggml runner. The new `--ollama-engine` runner (Go-based) **does not implement it**. The runner ignores the variable entirely and never initializes the GPU during the bootstrap health check. This means ROCm devices that require `NeedsInitValidation()` (all ROCm and CUDA devices per `ml/device.go:541`) are **always filtered out** on 0.20.x. ## What Works - `rocminfo` + `rocm-smi` recognize the GPU with `HSA_OVERRIDE_GFX_VERSION=11.5.1` - `gfx1151` kernels confirmed compiled into `/usr/local/lib/ollama/rocm/libggml-hip.so` - All ROCm shared libraries load fine (`ldd` shows no missing deps) - `ollama` user is in `render` + `video` groups → has `/dev/kfd` and `/dev/dri` access - **Discovery probe finds the GPU correctly** (finds "Radeon 8060S Graphics, gfx1151") - **Vulkan backend works** with `OLLAMA_VULKAN=1` — 4.12 tok/s on qwen2.5:72b-instruct-q5_K_M (100% GPU, 92 GiB available) - CPU inference works but at half speed (2.13 tok/s on same model) ## Detailed Evidence ### 1. Discovery succeeds, verification fails From `OLLAMA_DEBUG=1` journal output: ``` discovering available GPUs... bootstrap discovery took duration=189.56ms OLLAMA_LIBRARY_PATH="[... /rocm]" evaluating which, if any, devices to filter out initial_count=1 verifying if device is supported library=/usr/local/lib/ollama/rocm description="Radeon 8060S Graphics" compute=gfx1151 id=0 starting runner cmd="/usr/local/bin/ollama runner --ollama-engine --port 37243" subprocess ... LD_LIBRARY_PATH=.../rocm ROCR_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1 bootstrap discovery took duration=101.133ms ... extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" filtering device which didn't fully initialize id=0 libdir=/usr/local/lib/ollama/rocm library=ROCm inference compute id=cpu library=cpu ... total="61.2 GiB" available="54.4 GiB" ``` ### 2. Runner permanently stuck at status:2 Manually starting the runner with the exact same env vars Ollama uses: ```bash HSA_OVERRIDE_GFX_VERSION=11.5.1 \ LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm \ ROCR_VISIBLE_DEVICES=0 \ GGML_CUDA_INIT=1 \ /usr/local/bin/ollama runner --ollama-engine --port 55555 ``` Polling health: ``` sleep 1; curl -s http://127.0.0.1:55555/health → {"status":2,"progress":0} sleep 5; curl -s http://127.0.0.1:55555/health → {"status":2,"progress":0} sleep 10; curl -s http://127.0.0.1:55555/health → {"status":2,"progress":0} ``` **Same result without `GGML_CUDA_INIT=1`** — confirming the variable is completely ignored. `lsof` confirms the runner **never opens `/dev/kfd`** — the GPU is never touched. ### 3. Vulkan works perfectly With `OLLAMA_VULKAN=1` and `VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.json`: ``` inference compute id=... library=Vulkan name=Vulkan0 description="AMD Radeon Graphics (RADV GFX1151)" total="94.6 GiB" available="92.2 GiB" ``` ``` ollama ps → qwen2.5:72b-instruct-q5_K_M 68 GB 100% GPU eval rate: 4.12 tokens/s ``` ### 4. OLLAMA_NEW_ENGINE=false has no effect Setting `OLLAMA_NEW_ENGINE=false` in the systemd override does not change behavior — the runner still starts with `--ollama-engine`. ## Code Analysis The issue is in the interaction between two functions in `ml/device.go`: ```go // Line 538 func (d DeviceInfo) NeedsInitValidation() bool { return d.Library == "ROCm" || d.Library == "CUDA" } // Line 545 func (d DeviceInfo) AddInitValidation(env map[string]string) { env["GGML_CUDA_INIT"] = "1" } ``` And `discover/runner.go` line 152: ```go if len(bootstrapDevices(ctx2ndPass, devices[i].LibraryPath, extraEnvs)) == 0 { // filtering device which didn't fully initialize needsDelete[i] = true } ``` The old llama.cpp runner honored `GGML_CUDA_INIT=1` by deeply initializing the GPU during startup. The new `--ollama-engine` runner doesn't implement this — it starts in an idle state (status:2) waiting for a model load, regardless of `GGML_CUDA_INIT`. ## Proposed Fix **Option A** (minimal, targeted): Skip init validation for gfx115x APUs since the verification mechanism is broken: ```go func (d DeviceInfo) NeedsInitValidation() bool { if d.Library == "ROCm" && strings.HasPrefix(d.Compute(), "gfx115") { return false } return d.Library == "ROCm" || d.Library == "CUDA" } ``` **Option B** (proper fix): Make the `--ollama-engine` runner implement `GGML_CUDA_INIT=1` by actually initializing the GPU backend during startup and reporting status:1 once successful. **Option C** (pragmatic): Increase the bootstrap timeout and have the runner respond to `GGML_CUDA_INIT=1` with a quick GPU probe that returns status:1 if `/dev/kfd` is accessible and the ROCm runtime loads. ## Current Workaround Vulkan backend via systemd override: ```ini [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1" Environment="OLLAMA_KEEP_ALIVE=-1" Environment="OLLAMA_FLASH_ATTENTION=1" Environment="OLLAMA_DEBUG=1" Environment="OLLAMA_VULKAN=1" Environment="VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.json" ``` This works (4.12 tok/s on 72B) but native ROCm should be significantly faster based on community reports (~40 tok/s on 30B models with v0.18.0 where the old runner still worked). ## Related Issues - #14855 — Working ROCm guide for gfx1151 (uses v0.18.0, before the `--ollama-engine` change) - #13589 — gfx1151 silently falls back to CPU on Linux - #12062 — APU VRAM/GTT memory reporting issue - rjmalagon/ollama-linux-amd-apu#37 — Identical bootstrap failure with ROCm v7 and gfx1151

GiteaMirror closed this issue

2026-04-22 20:15:50 -05:00

GiteaMirror commented

2026-04-22 20:15:53 -05:00

@ftmng commented on GitHub (Apr 8, 2026):

Update: Source patch confirms two separate bugs

Discovery bug — FIXED with source patch

We patched NeedsInitValidation() in ml/device.go to skip init validation for gfx115x APUs and rebuilt the binary via Docker (golang:1.24 image, CGO_ENABLED=1 go build -trimpath -buildmode=pie).

The patch:

func (d DeviceInfo) NeedsInitValidation() bool {
    // Skip init validation for ROCm APUs (gfx115x) - the ollama-engine runner
    // does not implement GGML_CUDA_INIT, so validation always fails
    if d.Library == "ROCm" && strings.HasPrefix(d.Compute(), "gfx115") {
        return false
    }
    return d.Library == "ROCm" || d.Library == "CUDA"
}

Result: ROCm is now fully recognized:

inference compute id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 
  description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 
  pci_id=0000:c5:00.0 type=iGPU total="94.6 GiB" available="91.8 GiB"
vram-based default context total_vram="94.6 GiB" default_num_ctx=262144

No more "filtering device which didn't fully initialize". The discovery bug from the original report is confirmed and the fix works.

Note: we also attempted a binary hex-patch approach before the Docker build. It failed because the Go compiler inlines NeedsInitValidation() at multiple call sites — we found 10 occurrences of the ROCm 4-byte comparison pattern in the binary, of which 5 matched the NeedsInitValidation signature. Patching all of them still didn't work, confirming that a proper source rebuild is necessary.

Runtime bug — NEW, separate issue

With the patched binary, ROCm discovery succeeds but model loading crashes with SIGSEGV:

ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 98618327040 total: 101566676992 llama_model_load_from_file_impl: using device ROCm0 (Radeon 8060S Graphics) - 94049 MiB free

load_tensors: CPU_Mapped model buffer size = 266.16 MiB load_tensors: ROCm0 model buffer size = 1252.41 MiB SIGSEGV: segmentation violation PC=0x79add8441170 m=9 sigcode=1 addr=0x34 signal arrived during cgo execution

The crash occurs inside llama_model_load_from_file during tensor buffer allocation. This happens with both small (llama3.2:1b, 1.2 GiB) and large (qwen2.5:72b, 68 GiB) models. The last log line before the crash is:

load_tensors: tensor 'token_embd.weight' (q5_K) cannot be used with preferred 
  buffer type ROCm_Host, using CPU instead

Tested with Ollama's bundled ROCm libraries (version 7.2.0 based on shared library versions). System ROCm is 7.2.1 but we did not separately test with system libraries replacing the bundled ones — only OLLAMA_LLM_LIBRARY=rocm was set, which still uses Ollama's bundled libggml-hip.so.

System memory layout:

mem_info_vram_total: 68719476736  (~64 GiB dedicated)
mem_info_gtt_total:  32847200256  (~31 GiB GTT)
mem_info_gtt_used:      154226688 (~147 MiB)
Kernel boot param: amdgpu.cwsr_enable=0

Summary

There are two bugs affecting gfx1151 on Ollama 0.20.3:

Bug	Component	Status	Workaround
Discovery: --ollama-engine runner ignores GGML_CUDA_INIT=1, GPU always filtered	ml/device.go / discover/runner.go	Fix confirmed (patch above)	OLLAMA_VULKAN=1
Runtime: SIGSEGV in libggml-hip.so during tensor buffer allocation on gfx1151 APU	libggml-hip.so / ggml-cuda	Open	OLLAMA_VULKAN=1

</html>

@ftmng commented on GitHub (Apr 8, 2026): <html><head></head><body><h2>Update: Source patch confirms two separate bugs</h2> <h3>Discovery bug — FIXED with source patch</h3> <p>We patched <code>NeedsInitValidation()</code> in <code>ml/device.go</code> to skip init validation for gfx115x APUs and rebuilt the binary via Docker (<code>golang:1.24</code> image, <code>CGO_ENABLED=1 go build -trimpath -buildmode=pie</code>).</p> <p><strong>The patch:</strong></p> <pre><code class="language-go">func (d DeviceInfo) NeedsInitValidation() bool { // Skip init validation for ROCm APUs (gfx115x) - the ollama-engine runner // does not implement GGML_CUDA_INIT, so validation always fails if d.Library == "ROCm" && strings.HasPrefix(d.Compute(), "gfx115") { return false } return d.Library == "ROCm" || d.Library == "CUDA" } </code></pre> <p><strong>Result: ROCm is now fully recognized:</strong></p> <pre><code>inference compute id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c5:00.0 type=iGPU total="94.6 GiB" available="91.8 GiB" vram-based default context total_vram="94.6 GiB" default_num_ctx=262144 </code></pre> <p>No more "filtering device which didn't fully initialize". The discovery bug from the original report is confirmed and the fix works.</p> <p>Note: we also attempted a binary hex-patch approach before the Docker build. It failed because the Go compiler inlines <code>NeedsInitValidation()</code> at multiple call sites — we found 10 occurrences of the ROCm 4-byte comparison pattern in the binary, of which 5 matched the <code>NeedsInitValidation</code> signature. Patching all of them still didn't work, confirming that a proper source rebuild is necessary.</p> <h3>Runtime bug — NEW, separate issue</h3> <p>With the patched binary, ROCm discovery succeeds but <strong>model loading crashes with SIGSEGV</strong>:</p> <pre><code>ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 98618327040 total: 101566676992 llama_model_load_from_file_impl: using device ROCm0 (Radeon 8060S Graphics) - 94049 MiB free load_tensors: CPU_Mapped model buffer size = 266.16 MiB load_tensors: ROCm0 model buffer size = 1252.41 MiB SIGSEGV: segmentation violation PC=0x79add8441170 m=9 sigcode=1 addr=0x34 signal arrived during cgo execution </code></pre> <p>The crash occurs inside <code>llama_model_load_from_file</code> during tensor buffer allocation. This happens with both small (llama3.2:1b, 1.2 GiB) and large (qwen2.5:72b, 68 GiB) models. The last log line before the crash is:</p> <pre><code>load_tensors: tensor 'token_embd.weight' (q5_K) cannot be used with preferred buffer type ROCm_Host, using CPU instead </code></pre> <p>Tested with Ollama's bundled ROCm libraries (version 7.2.0 based on shared library versions). System ROCm is 7.2.1 but we did not separately test with system libraries replacing the bundled ones — only <code>OLLAMA_LLM_LIBRARY=rocm</code> was set, which still uses Ollama's bundled <code>libggml-hip.so</code>.</p> <p>System memory layout:</p> <pre><code>mem_info_vram_total: 68719476736 (~64 GiB dedicated) mem_info_gtt_total: 32847200256 (~31 GiB GTT) mem_info_gtt_used: 154226688 (~147 MiB) Kernel boot param: amdgpu.cwsr_enable=0 </code></pre> <h3>Summary</h3> <p>There are <strong>two bugs</strong> affecting gfx1151 on Ollama 0.20.3:</p> Bug | Component | Status | Workaround -- | -- | -- | -- Discovery: --ollama-engine runner ignores GGML_CUDA_INIT=1, GPU always filtered | ml/device.go / discover/runner.go | Fix confirmed (patch above) | OLLAMA_VULKAN=1 Runtime: SIGSEGV in libggml-hip.so during tensor buffer allocation on gfx1151 APU | libggml-hip.so / ggml-cuda | Open | OLLAMA_VULKAN=1 </body></html>

GiteaMirror commented

2026-04-22 20:15:54 -05:00

@rick-github commented on GitHub (Apr 8, 2026):

ollama  | time=2026-04-08T12:23:27.480Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB"

0.20.3-rocm discovers the GPU fine on my gfx1151, which suggests that while your patch masks the problem, it doesn't fix the root cause.

@rick-github commented on GitHub (Apr 8, 2026): ``` ollama | time=2026-04-08T12:23:27.480Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" ``` 0.20.3-rocm discovers the GPU fine on my gfx1151, which suggests that while your patch masks the problem, it doesn't fix the root cause.

GiteaMirror commented

2026-04-22 20:15:54 -05:00

@ftmng commented on GitHub (Apr 8, 2026):

Update: Found a working Docker image, still broken on 0.20.3

Working setup

ollama/ollama:0.16.0-rocm in Docker gives full native ROCm acceleration on gfx1151:

RENDER_GID=$(getent group render | cut -d: -f3)
VIDEO_GID=$(getent group video | cut -d: -f3)
docker run -d 

--name ollama 

--restart unless-stopped 

--device /dev/kfd 

--device /dev/dri 

--group-add $VIDEO_GID 

--group-add $RENDER_GID 

-v /usr/share/ollama/.ollama:/root/.ollama 

-p 11434:11434 

-e OLLAMA_HOST=0.0.0.0 

-e HSA_OVERRIDE_GFX_VERSION=11.5.1 

-e OLLAMA_KEEP_ALIVE=-1 

-e OLLAMA_FLASH_ATTENTION=1 

-e OLLAMA_DEBUG=1 

ollama/ollama:0.16.0-rocm

Note: use numeric GIDs for --group-add because render doesn't exist as a named group inside the container.

inference compute id=0 library=ROCm compute=gfx1151 name=ROCm0 
  description="AMD Radeon Graphics" driver=60342.13 type=iGPU 
  total="94.6 GiB" available="92.4 GiB"

Performance on qwen2.5:72b-instruct-q5_K_M (68 GB model)

Backend	Prompt eval	Generation
ROCm native (Docker 0.16.0-rocm)	117 tok/s	4.4 tok/s
Vulkan (native 0.20.3, OLLAMA_VULKAN=1)	41 tok/s	4.1 tok/s
CPU fallback	7.5 tok/s	2.1 tok/s

Still broken on 0.20.3

We tested every combination we could think of on 0.20.3 — none of them result in working ROCm inference:

Native 0.20.3 — GPU discovered but filtered during verification (filtering device which didn't fully initialize)
Native 0.20.3 + source patch — we patched NeedsInitValidation() in ml/device.go to skip verification for gfx115x, rebuilt via Docker (golang:1.24). Discovery succeeds (library=ROCm total="94.6 GiB"), but model loading crashes with SIGSEGV in libggml-hip.so during buffer allocation
Docker 0.20.3-rocm — same filtering behavior as native
Docker 0.18.0-rocm — same filtering behavior
Docker 0.16.0-rocm — ✅ works

The only workaround on 0.20.3 is OLLAMA_VULKAN=1, which provides GPU acceleration but with slower prompt processing (41 vs 117 tok/s).

What might help fix this for newer versions

The working 0.16.0-rocm image bundles ROCm 6.x libraries (driver=60342.13) and uses the older runner. Something changed between 0.16.0 and 0.18.0 that broke both the GPU bootstrap verification and the ROCm runtime on gfx1151. Comparing what changed in the discovery/runner code and the bundled ROCm libraries between these versions might help identify the regression.

Note on GPU memory

On Strix Halo, the default VRAM allocation may be too small for large models via ROCm. If you see cudaMalloc failed: out of memory, increase the GPU memory allocation in BIOS/UEFI.

</html>

@ftmng commented on GitHub (Apr 8, 2026): <html><head></head><body><h2>Update: Found a working Docker image, still broken on 0.20.3</h2> <h3>Working setup</h3> <p><code>ollama/ollama:0.16.0-rocm</code> in Docker gives full native ROCm acceleration on gfx1151:</p> <pre><code class="language-bash">RENDER_GID=$(getent group render | cut -d: -f3) VIDEO_GID=$(getent group video | cut -d: -f3) docker run -d \ --name ollama \ --restart unless-stopped \ --device /dev/kfd \ --device /dev/dri \ --group-add $VIDEO_GID \ --group-add $RENDER_GID \ -v /usr/share/ollama/.ollama:/root/.ollama \ -p 11434:11434 \ -e OLLAMA_HOST=0.0.0.0 \ -e HSA_OVERRIDE_GFX_VERSION=11.5.1 \ -e OLLAMA_KEEP_ALIVE=-1 \ -e OLLAMA_FLASH_ATTENTION=1 \ -e OLLAMA_DEBUG=1 \ ollama/ollama:0.16.0-rocm </code></pre> <p>Note: use numeric GIDs for <code>--group-add</code> because <code>render</code> doesn't exist as a named group inside the container.</p> <pre><code>inference compute id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" driver=60342.13 type=iGPU total="94.6 GiB" available="92.4 GiB" </code></pre> <h3>Performance on qwen2.5:72b-instruct-q5_K_M (68 GB model)</h3> Backend | Prompt eval | Generation -- | -- | -- ROCm native (Docker 0.16.0-rocm) | 117 tok/s | 4.4 tok/s Vulkan (native 0.20.3, OLLAMA_VULKAN=1) | 41 tok/s | 4.1 tok/s CPU fallback | 7.5 tok/s | 2.1 tok/s <h3>Still broken on 0.20.3</h3> <p>We tested every combination we could think of on 0.20.3 — none of them result in working ROCm inference:</p> <ul> <li><strong>Native 0.20.3</strong> — GPU discovered but filtered during verification (<code>filtering device which didn't fully initialize</code>)</li> <li><strong>Native 0.20.3 + source patch</strong> — we patched <code>NeedsInitValidation()</code> in <code>ml/device.go</code> to skip verification for gfx115x, rebuilt via Docker (<code>golang:1.24</code>). Discovery succeeds (<code>library=ROCm total="94.6 GiB"</code>), but model loading crashes with SIGSEGV in <code>libggml-hip.so</code> during buffer allocation</li> <li><strong>Docker <code>0.20.3-rocm</code></strong> — same filtering behavior as native</li> <li><strong>Docker <code>0.18.0-rocm</code></strong> — same filtering behavior</li> <li><strong>Docker <code>0.16.0-rocm</code></strong> — ✅ works</li> </ul> <p>The only workaround on 0.20.3 is <code>OLLAMA_VULKAN=1</code>, which provides GPU acceleration but with slower prompt processing (41 vs 117 tok/s).</p> <h3>What might help fix this for newer versions</h3> <p>The working 0.16.0-rocm image bundles ROCm 6.x libraries (<code>driver=60342.13</code>) and uses the older runner. Something changed between 0.16.0 and 0.18.0 that broke both the GPU bootstrap verification and the ROCm runtime on gfx1151. Comparing what changed in the discovery/runner code and the bundled ROCm libraries between these versions might help identify the regression.</p> <h3>Note on GPU memory</h3> <p>On Strix Halo, the default VRAM allocation may be too small for large models via ROCm. If you see <code>cudaMalloc failed: out of memory</code>, increase the GPU memory allocation in BIOS/UEFI.</p></body></html>

GiteaMirror commented

2026-04-22 20:15:56 -05:00

@ftmng commented on GitHub (Apr 8, 2026):

ollama  | time=2026-04-08T12:23:27.480Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB"
0.20.3-rocm discovers the GPU fine on my gfx1151, which suggests that while your patch masks the problem, it doesn't fix the root cause.

Thanks for the reply. We actually did test ollama/ollama:0.20.3-rocm in Docker with GPU passthrough (--device /dev/kfd --device /dev/dri) and it showed the same filtering behavior on our system:

filtering device which didn't fully initialize id=0 libdir=/usr/lib/ollama/rocm library=ROCm
inference compute id=cpu

Could you share your full setup?

On our end, ollama/ollama:0.16.0-rocm in Docker is the only configuration that gives us working ROCm inference.

@ftmng commented on GitHub (Apr 8, 2026): > ``` > ollama | time=2026-04-08T12:23:27.480Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" > ``` > > 0.20.3-rocm discovers the GPU fine on my gfx1151, which suggests that while your patch masks the problem, it doesn't fix the root cause. Thanks for the reply. We actually did test `ollama/ollama:0.20.3-rocm` in Docker with GPU passthrough (`--device /dev/kfd --device /dev/dri`) and it showed the same filtering behavior on our system: ``` filtering device which didn't fully initialize id=0 libdir=/usr/lib/ollama/rocm library=ROCm inference compute id=cpu ``` Could you share your full setup? On our end, `ollama/ollama:0.16.0-rocm` in Docker is the only configuration that gives us working ROCm inference.

GiteaMirror commented

2026-04-22 20:15:59 -05:00

@rick-github commented on GitHub (Apr 8, 2026):

$ docker run --rm --device /dev/kfd --device /dev/dri -e OLLAMA_DEBUG=2 ollama/ollama:0.20.3-rocm
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBIatb+Fu9PNt4ex9/iczk9AEAA1AunFxG3VkMg+1Tcc

time=2026-04-08T14:46:15.000Z level=INFO source=routes.go:1744 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-04-08T14:46:15.000Z level=INFO source=routes.go:1746 msg="Ollama cloud disabled: false"
time=2026-04-08T14:46:15.000Z level=INFO source=images.go:499 msg="total blobs: 0"
time=2026-04-08T14:46:15.000Z level=INFO source=images.go:506 msg="total unused blobs removed: 0"
time=2026-04-08T14:46:15.000Z level=INFO source=routes.go:1802 msg="Listening on [::]:11434 (version 0.20.3)"
time=2026-04-08T14:46:15.000Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-04-08T14:46:15.000Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-08T14:46:15.000Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs=map[]
time=2026-04-08T14:46:15.000Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36913"
time=2026-04-08T14:46:15.001Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm
time=2026-04-08T14:46:15.007Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-08T14:46:15.008Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:36913"
time=2026-04-08T14:46:15.019Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-04-08T14:46:15.019Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.file_type default=0
time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default=""
time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default=""
time=2026-04-08T14:46:15.019Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-04-08T14:46:15.022Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2026-04-08T14:46:15.074Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.pooling_type default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.expert_count default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.embedding_length default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-04-08T14:46:15.074Z level=DEBUG source=runner.go:1392 msg="dummy model load took" duration=62.205584ms
ggml_hip_get_device_memory searching for device 0000:c6:00.0
ggml_backend_cuda_device_get_memory device 0000:c6:00.0 utilizing AMD specific memory reporting free: 119539990528 total: 119713927168
time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:1397 msg="gathering device infos took" duration=237.084µs
time=2026-04-08T14:46:15.075Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:Radeon 8060S Graphics FilterID: Integrated:true PCIID:0000:c6:00.0 TotalMemory:119713927168 FreeMemory:119539990528 ComputeMajor:17 ComputeMinor:81 DriverMajor:70226 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=75.056478ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=map[]
time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/rocm description="Radeon 8060S Graphics" compute=gfx1151 id=0 pci_id=0000:c6:00.0
time=2026-04-08T14:46:15.075Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
time=2026-04-08T14:46:15.076Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37447"
time=2026-04-08T14:46:15.076Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ROCR_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1
time=2026-04-08T14:46:15.084Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-08T14:46:15.084Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:37447"
time=2026-04-08T14:46:15.087Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string
time=2026-04-08T14:46:15.087Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.file_type default=0
time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default=""
time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default=""
time=2026-04-08T14:46:15.087Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-04-08T14:46:15.090Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
ggml_cuda_init: initializing rocBLAS on device 0
ggml_cuda_init: rocBLAS initialized on device 0
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2026-04-08T14:46:15.529Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.pooling_type default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.expert_count default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.embedding_length default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.key_length default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2026-04-08T14:46:15.529Z level=DEBUG source=runner.go:1392 msg="dummy model load took" duration=442.444552ms
ggml_hip_get_device_memory searching for device 0000:c6:00.0
ggml_backend_cuda_device_get_memory device 0000:c6:00.0 utilizing AMD specific memory reporting free: 119220338688 total: 119713927168
time=2026-04-08T14:46:15.529Z level=DEBUG source=runner.go:1397 msg="gathering device infos took" duration=222.874µs
time=2026-04-08T14:46:15.530Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:Radeon 8060S Graphics FilterID: Integrated:true PCIID:0000:c6:00.0 TotalMemory:119713927168 FreeMemory:119220338688 ComputeMajor:17 ComputeMinor:81 DriverMajor:70226 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
time=2026-04-08T14:46:15.530Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=454.482997ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
time=2026-04-08T14:46:15.530Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]]
time=2026-04-08T14:46:15.530Z level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2026-04-08T14:46:15.530Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=529.880267ms
time=2026-04-08T14:46:15.530Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB"
time=2026-04-08T14:46:15.530Z level=INFO source=routes.go:1852 msg="vram-based default context" total_vram="111.5 GiB" default_num_ctx=262144

$ cat /opt/rocm-7.2.1/.info/version
7.2.1

$ rocminfo
ROCk module version 6.16.13 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.18
Runtime Ext Version:     1.15
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5185                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32489676(0x1efc0cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    32489676(0x1efc0cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32489676(0x1efc0cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32489676(0x1efc0cc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1151                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 5510(0x1586)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2900                               
  BDFID:                   50688                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        2147483647(0x7fffffff)             
    y                        65535(0xffff)                      
    z                        65535(0xffff)                      
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 26                                 
  SDMA engine uCode::      14                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    100663296(0x6000000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    100663296(0x6000000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1151         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
    ISA 2                    
      Name:                    amdgcn-amd-amdhsa--gfx11-generic   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
*** Done ***

$ dmesg | egrep 'amd|Command line:'
[    0.000000] Linux version 6.11.0-29-generic (buildd@lcy02-amd64-008) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 26 14:16:59 UTC 2 (Ubuntu 6.11.0-29.29~24.04.1-generic 6.11.11)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.11.0-29-generic root=UUID=4cf4db79-a163-471a-b3a5-cff735c89944 ro quiet splash
[    0.295680] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    3.759794] kvm_amd: TSC scaling supported
[    3.759796] kvm_amd: Nested Virtualization enabled
[    3.759797] kvm_amd: Nested Paging enabled
[    3.759798] kvm_amd: LBR virtualization supported
[    3.759802] kvm_amd: Virtual VMLOAD VMSAVE supported
[    3.759802] kvm_amd: Virtual GIF supported
[    3.759802] kvm_amd: Virtual NMI enabled
[    3.786382] amdkcl: loading out-of-tree module taints kernel.
[    3.786386] amdkcl: module verification failed: signature and/or required key missing - tainting kernel
[    3.904258] amd_atl: AMD Address Translation Library initialized
[    5.084995] [drm] amdgpu kernel modesetting enabled.
[    5.084997] [drm] amdgpu version: 6.16.13
[    5.086411] amdgpu: Virtual CRAT table created for CPU
[    5.086420] amdgpu: Topology: Add CPU node
[    5.089203] amdgpu 0000:c6:00.0: enabling device (0006 -> 0007)
[    5.089239] amdgpu 0000:c6:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x1586 0x2014:0x801D 0xC1).
[    5.089248] amdgpu 0000:c6:00.0: amdgpu: register mmio base: 0xA0200000
[    5.089249] amdgpu 0000:c6:00.0: amdgpu: register mmio size: 1048576
[    5.093064] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 0 <common_v1_0_0> (soc21_common)
[    5.093066] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 1 <gmc_v11_0_0> (gmc_v11_0)
[    5.093067] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 2 <ih_v6_0_0> (ih_v6_1)
[    5.093068] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 3 <psp_v13_0_0> (psp)
[    5.093069] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 4 <smu_v14_0_0> (smu)
[    5.093070] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 5 <dce_v1_0_0> (dm)
[    5.093071] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 6 <gfx_v11_0_0> (gfx_v11_0)
[    5.093072] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 7 <sdma_v6_0_0> (sdma_v6_0)
[    5.093073] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 8 <vcn_v4_0_5> (vcn_v4_0_5)
[    5.093074] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0_5> (jpeg_v4_0_5)
[    5.093074] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 10 <mes_v11_0_0> (mes_v11_0)
[    5.093075] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 11 <vpe_v6_1_0> (vpe_v6_1)
[    5.093080] amdgpu 0000:c6:00.0: amdgpu: Fetched VBIOS from VFCT
[    5.093081] amdgpu: ATOM BIOS: 113-STRXLGEN-001
[    5.099505] amdgpu 0000:c6:00.0: amdgpu: VPE: collaborate mode true
[    5.099511] amdgpu 0000:c6:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[    5.099548] amdgpu 0000:c6:00.0: amdgpu: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    5.099571] amdgpu 0000:c6:00.0: amdgpu: VRAM: 98304M 0x0000008000000000 - 0x00000097FFFFFFFF (98304M used)
[    5.099572] amdgpu 0000:c6:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    5.099665] amdgpu 0000:c6:00.0: amdgpu: amdgpu: 98304M of VRAM memory ready
[    5.099668] amdgpu 0000:c6:00.0: amdgpu: amdgpu: 15864M of GTT memory ready.
[    5.100782] amdgpu 0000:c6:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x09000F00
[    5.101174] amdgpu 0000:c6:00.0: amdgpu: [VCN instance 0] Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 3
[    5.101218] amdgpu 0000:c6:00.0: amdgpu: [VCN instance 1] Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 3
[    5.124731] amdgpu 0000:c6:00.0: amdgpu: reserve 0x8c00000 from 0x97e0000000 for PSP TMR
[    5.458130] amdgpu 0000:c6:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    5.462129] amdgpu 0000:c6:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    5.462130] amdgpu 0000:c6:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
[    5.497123] amdgpu 0000:c6:00.0: amdgpu: SMU is initialized successfully!
[    5.499425] amdgpu 0000:c6:00.0: amdgpu: [drm] Display Core v3.2.359 initialized on DCN 3.5.1
[    5.499427] amdgpu 0000:c6:00.0: amdgpu: [drm] DP-HDMI FRL PCON supported
[    5.502250] amdgpu 0000:c6:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x09000F00
[    5.505039] snd_hda_intel 0000:c6:00.1: bound 0000:c6:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[    5.505805] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.505927] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506075] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506226] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506374] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506448] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506516] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506589] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.506668] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
[    5.507964] amdgpu 0000:c6:00.0: amdgpu: MES FW version must be >= 0x7f to enable LR compute workaround.
[    5.657735] amdgpu: HMM registered 98304MB device memory
[    5.658852] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    5.658868] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    5.659443] amdgpu: Virtual CRAT table created for GPU
[    5.661382] amdgpu: Topology: Add dGPU node [0x1586:0x1002]
[    5.661384] kfd kfd: amdgpu: added device 1002:1586
[    5.661396] amdgpu 0000:c6:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 40
[    5.661401] amdgpu 0000:c6:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    5.661402] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    5.661403] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    5.661404] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    5.661404] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    5.661405] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    5.661405] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    5.661406] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    5.661406] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    5.661407] amdgpu 0000:c6:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    5.661408] amdgpu 0000:c6:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[    5.661408] amdgpu 0000:c6:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[    5.661409] amdgpu 0000:c6:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8
[    5.661410] amdgpu 0000:c6:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8
[    5.661410] amdgpu 0000:c6:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[    5.661411] amdgpu 0000:c6:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8
[    5.662609] amdgpu 0000:c6:00.0: amdgpu: Runtime PM not available
[    5.663331] [drm] Initialized amdgpu 3.64.0 for 0000:c6:00.0 on minor 0

$ rocm-smi -a


============================ ROCm System Management Interface ============================
============================== Version of System Component ===============================
Driver version: 6.16.13
==========================================================================================
=========================================== ID ===========================================
GPU[0]          : Device Name:          AMD Radeon Graphics
GPU[0]          : Device ID:            0x1586
GPU[0]          : Device Rev:           0xc1
GPU[0]          : Subsystem ID:         -0x7fe3
GPU[0]          : GUID:                 51834
==========================================================================================
======================================= Unique ID ========================================
GPU[0]          : Unique ID: 0x0
==========================================================================================
========================================= VBIOS ==========================================
GPU[0]          : VBIOS version: 113-STRXLGEN-001
==========================================================================================
====================================== Temperature =======================================
GPU[0]          : Temperature (Sensor edge) (C): 31.0
==========================================================================================
=============================== Current clock frequencies ================================
GPU[0]          : sclk clock level: 1: (605Mhz)
==========================================================================================
=================================== Current Fan Metric ===================================
GPU[0]          : Not supported
==========================================================================================
================================= Show Performance Level =================================
GPU[0]          : Performance Level: auto
==========================================================================================
==================================== OverDrive Level =====================================
GPU[0]          : get_overdrive_level_sclk, Not supported on the given system
==========================================================================================
==================================== OverDrive Level =====================================
GPU[0]          : get_mem_overdrive_level_mclk, Not supported on the given system
==========================================================================================
======================================= Power Cap ========================================
GPU[0]          : get_power_cap, Not supported on the given system
GPU[0]          : Max Graphics Package Power Unsupported
==========================================================================================
================================== Show Power Profiles ===================================
GPU[0]          : get_power_profiles, Not supported on the given system
==========================================================================================
=================================== Power Consumption ====================================
GPU[0]          : Current Socket Graphics Package Power (W): 4.051
==========================================================================================
============================== Supported clock frequencies ===============================
GPU[0]          : Clock [dcefclk] on device [0] exists but EMPTY! Likely driver error!
GPU[0]          : Clock [fclk] on device [0] exists but EMPTY! Likely driver error!
GPU[0]          : Clock [mclk] on device [0] exists but EMPTY! Likely driver error!
GPU[0]          : Supported sclk frequencies on GPU0
GPU[0]          : 0: 600Mhz
GPU[0]          : 1: 605Mhz *
GPU[0]          : 2: 2900Mhz
GPU[0]          : 
GPU[0]          : Clock [socclk] on device [0] exists but EMPTY! Likely driver error!
------------------------------------------------------------------------------------------
==========================================================================================
=================================== % time GPU is busy ===================================
GPU[0]          : GPU use (%): 0
==========================================================================================
=================================== Current Memory Use ===================================
GPU[0]          : GPU Memory Allocated (VRAM%): 0
GPU[0]          : Memory Activity: N/A
GPU[0]          : Not supported on the given system
==========================================================================================
===================================== Memory Vendor ======================================
GPU[0]          : get_vram_vendor, Not supported on the given system
==========================================================================================
================================== PCIe Replay Counter ===================================
GPU[0]          : PCIe Replay Count, Not supported on the given system
==========================================================================================
===================================== Serial Number ======================================
GPU[0]          : get_serial_number, Not supported on the given system
GPU[0]          : Serial Number: N/A
==========================================================================================
===================================== KFD Processes ======================================
No KFD PIDs currently running
==========================================================================================
================================== GPUs Indexed by PID ===================================
No KFD PIDs currently running
==========================================================================================
======================= GPU Memory clock frequencies and voltages ========================
GPU[0]          : OD_SCLK:
GPU[0]          : 0: 600Mhz
GPU[0]          : 1: 2900Mhz
GPU[0]          : OD_MCLK:
GPU[0]          : 0: 18446744073709Mhz
GPU[0]          : 1: 18446744073709Mhz
GPU[0]          : OD_RANGE:
GPU[0]          : SCLK:     600Mhz        2900Mhz
==========================================================================================
==================================== Current voltage =====================================
GPU[0]          : Voltage (mV): 0
==========================================================================================
======================================= PCI Bus ID =======================================
GPU[0]          : PCI Bus: 0000:C6:00.0
==========================================================================================
================================== Firmware Information ==================================
GPU[0]          : ASD firmware version:         0x210000e8
GPU[0]          : ME firmware version:          29
GPU[0]          : MEC firmware version:         26
GPU[0]          : MES firmware version:         0x0000006e
GPU[0]          : MES KIQ firmware version:     0x0000006c
GPU[0]          : PFP firmware version:         39
GPU[0]          : RLC firmware version:         290653441
GPU[0]          : SDMA firmware version:        14
GPU[0]          : SMC firmware version:         10.100.02.00
GPU[0]          : VCN firmware version:         0x09117003
==========================================================================================
====================================== Product Info ======================================
GPU[0]          : Card Series:          AMD Radeon Graphics
GPU[0]          : Card Model:           0x1586
GPU[0]          : Card Vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]          : Card SKU:             STRXLGEN
GPU[0]          : Subsystem ID:         -0x7fe3
GPU[0]          : Device Rev:           0xc1
GPU[0]          : Node ID:              1
GPU[0]          : GUID:                 51834
GPU[0]          : GFX Version:          gfx1151
==========================================================================================
======================================= Pages Info =======================================
GPU[0]          : ras, Not supported on the given system
================================= Show Valid sclk Range ==================================
GPU[0]          : Valid sclk range: 600Mhz - 2900Mhz
==========================================================================================
================================= Show Valid mclk Range ==================================
GPU[0]          : Unable to display mclk range
==========================================================================================
================================ Show Valid voltage Range ================================
WARNING: GPU[0] : Voltage curve regions unsupported.
==========================================================================================
================================== Voltage Curve Points ==================================
==========================================================================================
==================================== Consumed Energy =====================================
GPU[0]          : % Energy Counter, Unexpected data received
==========================================================================================
=============================== Current Compute Partition ================================
GPU[0]          : Not supported on the given system
==========================================================================================
================================ Current Memory Partition ================================
GPU[0]          : Not supported on the given system
==========================================================================================
====================================== GPU Metrics =======================================
GPU[0]          : Failed to retrieve GPU metrics, metric version may not be supported for this device.
==========================================================================================
================================== End of ROCm SMI Log ===================================

@rick-github commented on GitHub (Apr 8, 2026): ```console $ docker run --rm --device /dev/kfd --device /dev/dri -e OLLAMA_DEBUG=2 ollama/ollama:0.20.3-rocm Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBIatb+Fu9PNt4ex9/iczk9AEAA1AunFxG3VkMg+1Tcc time=2026-04-08T14:46:15.000Z level=INFO source=routes.go:1744 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-04-08T14:46:15.000Z level=INFO source=routes.go:1746 msg="Ollama cloud disabled: false" time=2026-04-08T14:46:15.000Z level=INFO source=images.go:499 msg="total blobs: 0" time=2026-04-08T14:46:15.000Z level=INFO source=images.go:506 msg="total unused blobs removed: 0" time=2026-04-08T14:46:15.000Z level=INFO source=routes.go:1802 msg="Listening on [::]:11434 (version 0.20.3)" time=2026-04-08T14:46:15.000Z level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-04-08T14:46:15.000Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-08T14:46:15.000Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs=map[] time=2026-04-08T14:46:15.000Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36913" time=2026-04-08T14:46:15.001Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm time=2026-04-08T14:46:15.007Z level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-08T14:46:15.008Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:36913" time=2026-04-08T14:46:15.019Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-04-08T14:46:15.019Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.file_type default=0 time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default="" time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default="" time=2026-04-08T14:46:15.019Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-04-08T14:46:15.019Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-04-08T14:46:15.022Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2026-04-08T14:46:15.074Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.pooling_type default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.expert_count default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.embedding_length default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-04-08T14:46:15.074Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-04-08T14:46:15.074Z level=DEBUG source=runner.go:1392 msg="dummy model load took" duration=62.205584ms ggml_hip_get_device_memory searching for device 0000:c6:00.0 ggml_backend_cuda_device_get_memory device 0000:c6:00.0 utilizing AMD specific memory reporting free: 119539990528 total: 119713927168 time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:1397 msg="gathering device infos took" duration=237.084µs time=2026-04-08T14:46:15.075Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:Radeon 8060S Graphics FilterID: Integrated:true PCIID:0000:c6:00.0 TotalMemory:119713927168 FreeMemory:119539990528 ComputeMajor:17 ComputeMinor:81 DriverMajor:70226 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=75.056478ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=map[] time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1 time=2026-04-08T14:46:15.075Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/rocm description="Radeon 8060S Graphics" compute=gfx1151 id=0 pci_id=0000:c6:00.0 time=2026-04-08T14:46:15.075Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" time=2026-04-08T14:46:15.076Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37447" time=2026-04-08T14:46:15.076Z level=DEBUG source=server.go:433 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ROCR_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1 time=2026-04-08T14:46:15.084Z level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-08T14:46:15.084Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:37447" time=2026-04-08T14:46:15.087Z level=DEBUG source=gguf.go:604 msg=general.architecture type=string time=2026-04-08T14:46:15.087Z level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.file_type default=0 time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default="" time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default="" time=2026-04-08T14:46:15.087Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2026-04-08T14:46:15.087Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2026-04-08T14:46:15.090Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 ggml_cuda_init: rocBLAS initialized on device 0 Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2026-04-08T14:46:15.529Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.pooling_type default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.expert_count default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.block_count default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.embedding_length default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.key_length default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2026-04-08T14:46:15.529Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2026-04-08T14:46:15.529Z level=DEBUG source=runner.go:1392 msg="dummy model load took" duration=442.444552ms ggml_hip_get_device_memory searching for device 0000:c6:00.0 ggml_backend_cuda_device_get_memory device 0000:c6:00.0 utilizing AMD specific memory reporting free: 119220338688 total: 119713927168 time=2026-04-08T14:46:15.529Z level=DEBUG source=runner.go:1397 msg="gathering device infos took" duration=222.874µs time=2026-04-08T14:46:15.530Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:Radeon 8060S Graphics FilterID: Integrated:true PCIID:0000:c6:00.0 TotalMemory:119713927168 FreeMemory:119220338688 ComputeMajor:17 ComputeMinor:81 DriverMajor:70226 DriverMinor:1 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2026-04-08T14:46:15.530Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=454.482997ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" time=2026-04-08T14:46:15.530Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]] time=2026-04-08T14:46:15.530Z level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0 time=2026-04-08T14:46:15.530Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=529.880267ms time=2026-04-08T14:46:15.530Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70226.1 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" time=2026-04-08T14:46:15.530Z level=INFO source=routes.go:1852 msg="vram-based default context" total_vram="111.5 GiB" default_num_ctx=262144 ``` ```console $ cat /opt/rocm-7.2.1/.info/version 7.2.1 ``` ```console $ rocminfo ROCk module version 6.16.13 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.18 Runtime Ext Version: 1.15 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S Uuid: CPU-XX Marketing Name: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5185 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 32489676(0x1efc0cc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 32489676(0x1efc0cc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32489676(0x1efc0cc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 32489676(0x1efc0cc) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1151 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB L3: 32768(0x8000) KB Chip ID: 5510(0x1586) ASIC Revision: 0(0x0) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2900 BDFID: 50688 Internal Node ID: 1 Compute Unit: 40 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 26 SDMA engine uCode:: 14 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 100663296(0x6000000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 100663296(0x6000000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1151 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) FBarrier Max Size: 32 *** Done *** ``` ```console $ dmesg | egrep 'amd|Command line:' [ 0.000000] Linux version 6.11.0-29-generic (buildd@lcy02-amd64-008) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 26 14:16:59 UTC 2 (Ubuntu 6.11.0-29.29~24.04.1-generic 6.11.11) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.11.0-29-generic root=UUID=4cf4db79-a163-471a-b3a5-cff735c89944 ro quiet splash [ 0.295680] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank). [ 3.759794] kvm_amd: TSC scaling supported [ 3.759796] kvm_amd: Nested Virtualization enabled [ 3.759797] kvm_amd: Nested Paging enabled [ 3.759798] kvm_amd: LBR virtualization supported [ 3.759802] kvm_amd: Virtual VMLOAD VMSAVE supported [ 3.759802] kvm_amd: Virtual GIF supported [ 3.759802] kvm_amd: Virtual NMI enabled [ 3.786382] amdkcl: loading out-of-tree module taints kernel. [ 3.786386] amdkcl: module verification failed: signature and/or required key missing - tainting kernel [ 3.904258] amd_atl: AMD Address Translation Library initialized [ 5.084995] [drm] amdgpu kernel modesetting enabled. [ 5.084997] [drm] amdgpu version: 6.16.13 [ 5.086411] amdgpu: Virtual CRAT table created for CPU [ 5.086420] amdgpu: Topology: Add CPU node [ 5.089203] amdgpu 0000:c6:00.0: enabling device (0006 -> 0007) [ 5.089239] amdgpu 0000:c6:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x1586 0x2014:0x801D 0xC1). [ 5.089248] amdgpu 0000:c6:00.0: amdgpu: register mmio base: 0xA0200000 [ 5.089249] amdgpu 0000:c6:00.0: amdgpu: register mmio size: 1048576 [ 5.093064] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 0 <common_v1_0_0> (soc21_common) [ 5.093066] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 1 <gmc_v11_0_0> (gmc_v11_0) [ 5.093067] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 2 <ih_v6_0_0> (ih_v6_1) [ 5.093068] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 3 <psp_v13_0_0> (psp) [ 5.093069] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 4 <smu_v14_0_0> (smu) [ 5.093070] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 5 <dce_v1_0_0> (dm) [ 5.093071] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 6 <gfx_v11_0_0> (gfx_v11_0) [ 5.093072] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 7 <sdma_v6_0_0> (sdma_v6_0) [ 5.093073] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 8 <vcn_v4_0_5> (vcn_v4_0_5) [ 5.093074] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0_5> (jpeg_v4_0_5) [ 5.093074] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 10 <mes_v11_0_0> (mes_v11_0) [ 5.093075] amdgpu 0000:c6:00.0: amdgpu: detected ip block number 11 <vpe_v6_1_0> (vpe_v6_1) [ 5.093080] amdgpu 0000:c6:00.0: amdgpu: Fetched VBIOS from VFCT [ 5.093081] amdgpu: ATOM BIOS: 113-STRXLGEN-001 [ 5.099505] amdgpu 0000:c6:00.0: amdgpu: VPE: collaborate mode true [ 5.099511] amdgpu 0000:c6:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 5.099548] amdgpu 0000:c6:00.0: amdgpu: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit [ 5.099571] amdgpu 0000:c6:00.0: amdgpu: VRAM: 98304M 0x0000008000000000 - 0x00000097FFFFFFFF (98304M used) [ 5.099572] amdgpu 0000:c6:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF [ 5.099665] amdgpu 0000:c6:00.0: amdgpu: amdgpu: 98304M of VRAM memory ready [ 5.099668] amdgpu 0000:c6:00.0: amdgpu: amdgpu: 15864M of GTT memory ready. [ 5.100782] amdgpu 0000:c6:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x09000F00 [ 5.101174] amdgpu 0000:c6:00.0: amdgpu: [VCN instance 0] Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 3 [ 5.101218] amdgpu 0000:c6:00.0: amdgpu: [VCN instance 1] Found VCN firmware Version ENC: 1.23 DEC: 9 VEP: 0 Revision: 3 [ 5.124731] amdgpu 0000:c6:00.0: amdgpu: reserve 0x8c00000 from 0x97e0000000 for PSP TMR [ 5.458130] amdgpu 0000:c6:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 5.462129] amdgpu 0000:c6:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 5.462130] amdgpu 0000:c6:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available [ 5.497123] amdgpu 0000:c6:00.0: amdgpu: SMU is initialized successfully! [ 5.499425] amdgpu 0000:c6:00.0: amdgpu: [drm] Display Core v3.2.359 initialized on DCN 3.5.1 [ 5.499427] amdgpu 0000:c6:00.0: amdgpu: [drm] DP-HDMI FRL PCON supported [ 5.502250] amdgpu 0000:c6:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x09000F00 [ 5.505039] snd_hda_intel 0000:c6:00.1: bound 0000:c6:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 5.505805] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.505927] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506075] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506226] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506374] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506448] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506516] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506589] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.506668] amdgpu 0000:c6:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0 [ 5.507964] amdgpu 0000:c6:00.0: amdgpu: MES FW version must be >= 0x7f to enable LR compute workaround. [ 5.657735] amdgpu: HMM registered 98304MB device memory [ 5.658852] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 5.658868] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 [ 5.659443] amdgpu: Virtual CRAT table created for GPU [ 5.661382] amdgpu: Topology: Add dGPU node [0x1586:0x1002] [ 5.661384] kfd kfd: amdgpu: added device 1002:1586 [ 5.661396] amdgpu 0000:c6:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 40 [ 5.661401] amdgpu 0000:c6:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 5.661402] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 5.661403] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 5.661404] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [ 5.661404] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [ 5.661405] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [ 5.661405] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [ 5.661406] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 [ 5.661406] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 [ 5.661407] amdgpu 0000:c6:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 5.661408] amdgpu 0000:c6:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8 [ 5.661408] amdgpu 0000:c6:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8 [ 5.661409] amdgpu 0000:c6:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8 [ 5.661410] amdgpu 0000:c6:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8 [ 5.661410] amdgpu 0000:c6:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0 [ 5.661411] amdgpu 0000:c6:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8 [ 5.662609] amdgpu 0000:c6:00.0: amdgpu: Runtime PM not available [ 5.663331] [drm] Initialized amdgpu 3.64.0 for 0000:c6:00.0 on minor 0 ``` ```console $ rocm-smi -a ============================ ROCm System Management Interface ============================ ============================== Version of System Component =============================== Driver version: 6.16.13 ========================================================================================== =========================================== ID =========================================== GPU[0] : Device Name: AMD Radeon Graphics GPU[0] : Device ID: 0x1586 GPU[0] : Device Rev: 0xc1 GPU[0] : Subsystem ID: -0x7fe3 GPU[0] : GUID: 51834 ========================================================================================== ======================================= Unique ID ======================================== GPU[0] : Unique ID: 0x0 ========================================================================================== ========================================= VBIOS ========================================== GPU[0] : VBIOS version: 113-STRXLGEN-001 ========================================================================================== ====================================== Temperature ======================================= GPU[0] : Temperature (Sensor edge) (C): 31.0 ========================================================================================== =============================== Current clock frequencies ================================ GPU[0] : sclk clock level: 1: (605Mhz) ========================================================================================== =================================== Current Fan Metric =================================== GPU[0] : Not supported ========================================================================================== ================================= Show Performance Level ================================= GPU[0] : Performance Level: auto ========================================================================================== ==================================== OverDrive Level ===================================== GPU[0] : get_overdrive_level_sclk, Not supported on the given system ========================================================================================== ==================================== OverDrive Level ===================================== GPU[0] : get_mem_overdrive_level_mclk, Not supported on the given system ========================================================================================== ======================================= Power Cap ======================================== GPU[0] : get_power_cap, Not supported on the given system GPU[0] : Max Graphics Package Power Unsupported ========================================================================================== ================================== Show Power Profiles =================================== GPU[0] : get_power_profiles, Not supported on the given system ========================================================================================== =================================== Power Consumption ==================================== GPU[0] : Current Socket Graphics Package Power (W): 4.051 ========================================================================================== ============================== Supported clock frequencies =============================== GPU[0] : Clock [dcefclk] on device [0] exists but EMPTY! Likely driver error! GPU[0] : Clock [fclk] on device [0] exists but EMPTY! Likely driver error! GPU[0] : Clock [mclk] on device [0] exists but EMPTY! Likely driver error! GPU[0] : Supported sclk frequencies on GPU0 GPU[0] : 0: 600Mhz GPU[0] : 1: 605Mhz * GPU[0] : 2: 2900Mhz GPU[0] : GPU[0] : Clock [socclk] on device [0] exists but EMPTY! Likely driver error! ------------------------------------------------------------------------------------------ ========================================================================================== =================================== % time GPU is busy =================================== GPU[0] : GPU use (%): 0 ========================================================================================== =================================== Current Memory Use =================================== GPU[0] : GPU Memory Allocated (VRAM%): 0 GPU[0] : Memory Activity: N/A GPU[0] : Not supported on the given system ========================================================================================== ===================================== Memory Vendor ====================================== GPU[0] : get_vram_vendor, Not supported on the given system ========================================================================================== ================================== PCIe Replay Counter =================================== GPU[0] : PCIe Replay Count, Not supported on the given system ========================================================================================== ===================================== Serial Number ====================================== GPU[0] : get_serial_number, Not supported on the given system GPU[0] : Serial Number: N/A ========================================================================================== ===================================== KFD Processes ====================================== No KFD PIDs currently running ========================================================================================== ================================== GPUs Indexed by PID =================================== No KFD PIDs currently running ========================================================================================== ======================= GPU Memory clock frequencies and voltages ======================== GPU[0] : OD_SCLK: GPU[0] : 0: 600Mhz GPU[0] : 1: 2900Mhz GPU[0] : OD_MCLK: GPU[0] : 0: 18446744073709Mhz GPU[0] : 1: 18446744073709Mhz GPU[0] : OD_RANGE: GPU[0] : SCLK: 600Mhz 2900Mhz ========================================================================================== ==================================== Current voltage ===================================== GPU[0] : Voltage (mV): 0 ========================================================================================== ======================================= PCI Bus ID ======================================= GPU[0] : PCI Bus: 0000:C6:00.0 ========================================================================================== ================================== Firmware Information ================================== GPU[0] : ASD firmware version: 0x210000e8 GPU[0] : ME firmware version: 29 GPU[0] : MEC firmware version: 26 GPU[0] : MES firmware version: 0x0000006e GPU[0] : MES KIQ firmware version: 0x0000006c GPU[0] : PFP firmware version: 39 GPU[0] : RLC firmware version: 290653441 GPU[0] : SDMA firmware version: 14 GPU[0] : SMC firmware version: 10.100.02.00 GPU[0] : VCN firmware version: 0x09117003 ========================================================================================== ====================================== Product Info ====================================== GPU[0] : Card Series: AMD Radeon Graphics GPU[0] : Card Model: 0x1586 GPU[0] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI] GPU[0] : Card SKU: STRXLGEN GPU[0] : Subsystem ID: -0x7fe3 GPU[0] : Device Rev: 0xc1 GPU[0] : Node ID: 1 GPU[0] : GUID: 51834 GPU[0] : GFX Version: gfx1151 ========================================================================================== ======================================= Pages Info ======================================= GPU[0] : ras, Not supported on the given system ================================= Show Valid sclk Range ================================== GPU[0] : Valid sclk range: 600Mhz - 2900Mhz ========================================================================================== ================================= Show Valid mclk Range ================================== GPU[0] : Unable to display mclk range ========================================================================================== ================================ Show Valid voltage Range ================================ WARNING: GPU[0] : Voltage curve regions unsupported. ========================================================================================== ================================== Voltage Curve Points ================================== ========================================================================================== ==================================== Consumed Energy ===================================== GPU[0] : % Energy Counter, Unexpected data received ========================================================================================== =============================== Current Compute Partition ================================ GPU[0] : Not supported on the given system ========================================================================================== ================================ Current Memory Partition ================================ GPU[0] : Not supported on the given system ========================================================================================== ====================================== GPU Metrics ======================================= GPU[0] : Failed to retrieve GPU metrics, metric version may not be supported for this device. ========================================================================================== ================================== End of ROCm SMI Log =================================== ```

GiteaMirror commented

2026-04-22 20:16:01 -05:00

@ftmng commented on GitHub (Apr 8, 2026):

Thanks for the detailed logs, very helpful!

I see a key difference: your verification probe successfully initializes rocBLAS:

ggml_cuda_init: initializing rocBLAS on device 0
ggml_cuda_init: rocBLAS initialized on device 0

and takes 454ms to complete. On our system, the probe never gets to rocBLAS initialization and takes only ~90ms before being killed.

Comparing our setups:

	Our system	Your system
Kernel	6.17.0-20-generic (Ubuntu 25.10 stock)	6.11.0-29-generic (Ubuntu 24.04 + ROCm DKMS)
amdgpu driver	6.16.6 (stock kernel module)	6.16.13 (ROCm DKMS amdgpu-install)
HSA_OVERRIDE_GFX_VERSION	11.5.1 (required)	not set
Bootstrap probe duration	~90ms (killed before rocBLAS init)	~454ms (rocBLAS initializes successfully)
VRAM	~96 GiB	~96 GiB

It looks like the ROCm DKMS driver (amdgpu-install package) provides better gfx1151 support than the stock Ubuntu kernel module — faster GPU initialization that completes within the bootstrap timeout, and no need for HSA_OVERRIDE_GFX_VERSION.

For now, ollama/ollama:0.16.0-rocm in Docker works on our system with full ROCm acceleration (117 tok/s prompt eval, 4.4 tok/s generation on qwen2.5:72b). We'll try installing the ROCm DKMS driver to see if that fixes 0.20.3 as well.

</html>

@ftmng commented on GitHub (Apr 8, 2026): <html><head></head><body><p>Thanks for the detailed logs, very helpful!</p> <p>I see a key difference: your verification probe successfully initializes rocBLAS:</p> <pre><code>ggml_cuda_init: initializing rocBLAS on device 0 ggml_cuda_init: rocBLAS initialized on device 0 </code></pre> <p>and takes <strong>454ms</strong> to complete. On our system, the probe never gets to rocBLAS initialization and takes only <strong>~90ms</strong> before being killed.</p> <p>Comparing our setups:</p> | Our system | Your system -- | -- | -- Kernel | 6.17.0-20-generic (Ubuntu 25.10 stock) | 6.11.0-29-generic (Ubuntu 24.04 + ROCm DKMS) amdgpu driver | 6.16.6 (stock kernel module) | 6.16.13 (ROCm DKMS amdgpu-install) HSA_OVERRIDE_GFX_VERSION | 11.5.1 (required) | not set Bootstrap probe duration | ~90ms (killed before rocBLAS init) | ~454ms (rocBLAS initializes successfully) VRAM | ~96 GiB | ~96 GiB <p>It looks like the ROCm DKMS driver (<code>amdgpu-install</code> package) provides better gfx1151 support than the stock Ubuntu kernel module — faster GPU initialization that completes within the bootstrap timeout, and no need for <code>HSA_OVERRIDE_GFX_VERSION</code>.</p> <p>For now, <code>ollama/ollama:0.16.0-rocm</code> in Docker works on our system with full ROCm acceleration (117 tok/s prompt eval, 4.4 tok/s generation on qwen2.5:72b). We'll try installing the ROCm DKMS driver to see if that fixes 0.20.3 as well.</p></body></html>

GiteaMirror commented

2026-04-22 20:16:01 -05:00

@ftmng commented on GitHub (Apr 8, 2026):

Resolved: amdgpu-dkms driver update fixes everything

First, I want to apologize — this turned out not to be an Ollama bug. And thank you @rick-github for taking the time to share your detailed logs. The comparison between our setups pointed us in the right direction.

The fix

Updating the amdgpu-dkms kernel module from 6.16.6 to 6.18.4 resolved all issues. Ollama 0.20.3 now works natively with full ROCm acceleration on gfx1151 — no patches, no Vulkan workaround, no Docker, no HSA_OVERRIDE_GFX_VERSION needed.

# Add the repo (if not already configured)
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/31.10/ubuntu noble main" | sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
sudo apt install amdgpu-dkms
sudo reboot

After reboot, the bootstrap probe completes successfully and ROCm is recognized:

inference compute id=0 library=ROCm compute=gfx1151 name=ROCm0 
  description="Radeon 8060S Graphics" total="111.2 GiB" available="108.8 GiB"

Performance on qwen2.5:72b-instruct-q5_K_M (68 GB model)

	Prompt eval	Generation
Before (amdgpu 6.16.6)	7.5 tok/s (CPU fallback)	2.1 tok/s
After (amdgpu 6.18.4)	109 tok/s (ROCm)	4.5 tok/s

Tip for other Strix Halo / gfx1151 users

If Ollama shows filtering device which didn't fully initialize and falls back to CPU, check your amdgpu kernel module version:

modinfo amdgpu | grep "^version"

If it's older than 6.16.13, update via the AMD repository. The stock Ubuntu kernel modules (even on 25.10) may ship an older version where the GPU doesn't fully initialize during Ollama's bootstrap probe.

</html>

@ftmng commented on GitHub (Apr 8, 2026): <html><head></head><body><h2>Resolved: amdgpu-dkms driver update fixes everything</h2> <p>First, I want to apologize — this turned out not to be an Ollama bug. And thank you @rick-github for taking the time to share your detailed logs. The comparison between our setups pointed us in the right direction.</p> <h3>The fix</h3> <p>Updating the <code>amdgpu-dkms</code> kernel module from <strong>6.16.6</strong> to <strong>6.18.4</strong> resolved all issues. Ollama 0.20.3 now works natively with full ROCm acceleration on gfx1151 — no patches, no Vulkan workaround, no Docker, no <code>HSA_OVERRIDE_GFX_VERSION</code> needed.</p> <pre><code class="language-bash"># Add the repo (if not already configured) echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/31.10/ubuntu noble main" | sudo tee /etc/apt/sources.list.d/amdgpu.list sudo apt update sudo apt install amdgpu-dkms sudo reboot </code></pre> <p>After reboot, the bootstrap probe completes successfully and ROCm is recognized:</p> <pre><code>inference compute id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" total="111.2 GiB" available="108.8 GiB" </code></pre> <h3>Performance on qwen2.5:72b-instruct-q5_K_M (68 GB model)</h3> | Prompt eval | Generation -- | -- | -- Before (amdgpu 6.16.6) | 7.5 tok/s (CPU fallback) | 2.1 tok/s After (amdgpu 6.18.4) | 109 tok/s (ROCm) | 4.5 tok/s <h3>Tip for other Strix Halo / gfx1151 users</h3> <p>If Ollama shows <code>filtering device which didn't fully initialize</code> and falls back to CPU, check your <code>amdgpu</code> kernel module version:</p> <pre><code class="language-bash">modinfo amdgpu | grep "^version" </code></pre> <p>If it's older than <strong>6.16.13</strong>, update via the AMD repository. The stock Ubuntu kernel modules (even on 25.10) may ship an older version where the GPU doesn't fully initialize during Ollama's bootstrap probe.</p></body></html>

GiteaMirror commented

2026-04-22 20:16:02 -05:00

@rick-github commented on GitHub (Apr 8, 2026):

Thank you for taking the effort to examine, experiment and find a resolution.

@rick-github commented on GitHub (Apr 8, 2026): Thank you for taking the effort to examine, experiment and find a resolution.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#35619