[GH-ISSUE #13589] gfx1151 (Radeon 8050S) silently falls back to CPU on Linux despite rocminfo detecting GPU #34706

Closed
opened 2026-04-22 18:28:28 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @blue-az on GitHub (Dec 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13589

Issue

On AMD Strix Halo systems with gfx1151 (Radeon 8050S), Ollama silently falls back to CPU inference on Linux even though:

  • rocminfo correctly detects the GPU
  • ROCm 6.4.2 is installed
  • The same hardware works on Windows

System Information

  • Hardware: ASUS ROG Flow Z13 (GZ302EA)
  • CPU/GPU: AMD Ryzen AI MAX 390 w/ Radeon 8050S (gfx1151)
  • OS: Fedora 43, kernel 6.17.12-300.fc43.x86_64
  • ROCm: 6.4.2 (Fedora packages)
  • Ollama: Built from main (commit from Dec 30, 2025, post PR #13196 GTT fix)

Symptoms

$ ollama ps
NAME               ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
granite4:latest    4235724a127c    2.4 GB    100% CPU     4096       4 minutes from now

GPU utilization stays at 1% during inference:

$ cat /sys/class/drm/card1/device/gpu_busy_percent
1

rocminfo Output

GPU is correctly detected:

Agent 2                  
*******                  
  Name:                    gfx1151                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Radeon 8050S Graphics              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Device Type:             GPU                                

ROCm Packages Installed

rocm-runtime-6.4.2-2.fc43.x86_64
rocm-llvm-19-14.rocm6.4.2.fc43.x86_64
rocm-lld-19-14.rocm6.4.2.fc43.x86_64
rocm-clang-libs-19-14.rocm6.4.2.fc43.x86_64

Device Permissions

$ ls -la /dev/kfd /dev/dri/render*
crw-rw-rw-. 1 root render 226, 128 /dev/dri/renderD128
crw-rw-rw-. 1 root render 235,   0 /dev/kfd

What's Been Tried

  1. Built from main (Dec 30, 2025) to include PR #13196 GTT fix - still CPU fallback
  2. HSA_OVERRIDE_GFX_VERSION=11.5.0 - still CPU fallback
  3. Verified permissions - /dev/kfd and /dev/dri/renderD128 are world-accessible

Windows Works

The same hardware (dual-boot) works with Ollama on Windows - GPU is utilized. This appears to be Linux-specific.

  • #9553 - gfx1151 ROCm broken on Windows (crashes)
  • #10993 - gfx1151 crashes on Windows
  • #12062 - GTT memory fix (merged, but didn't resolve this)

This issue is different: on Linux, there's no crash or error - it just silently falls back to CPU.

Expected Behavior

Ollama should use the GPU for inference since rocminfo detects gfx1151 correctly and gfx1151 is listed as a supported GPU type.

Originally created by @blue-az on GitHub (Dec 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13589 ## Issue On AMD Strix Halo systems with gfx1151 (Radeon 8050S), Ollama silently falls back to CPU inference on Linux even though: - `rocminfo` correctly detects the GPU - ROCm 6.4.2 is installed - The same hardware works on Windows ## System Information - **Hardware**: ASUS ROG Flow Z13 (GZ302EA) - **CPU/GPU**: AMD Ryzen AI MAX 390 w/ Radeon 8050S (gfx1151) - **OS**: Fedora 43, kernel 6.17.12-300.fc43.x86_64 - **ROCm**: 6.4.2 (Fedora packages) - **Ollama**: Built from main (commit from Dec 30, 2025, post PR #13196 GTT fix) ## Symptoms ``` $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL granite4:latest 4235724a127c 2.4 GB 100% CPU 4096 4 minutes from now ``` GPU utilization stays at 1% during inference: ``` $ cat /sys/class/drm/card1/device/gpu_busy_percent 1 ``` ## rocminfo Output GPU is correctly detected: ``` Agent 2 ******* Name: gfx1151 Uuid: GPU-XX Marketing Name: Radeon 8050S Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Device Type: GPU ``` ## ROCm Packages Installed ``` rocm-runtime-6.4.2-2.fc43.x86_64 rocm-llvm-19-14.rocm6.4.2.fc43.x86_64 rocm-lld-19-14.rocm6.4.2.fc43.x86_64 rocm-clang-libs-19-14.rocm6.4.2.fc43.x86_64 ``` ## Device Permissions ``` $ ls -la /dev/kfd /dev/dri/render* crw-rw-rw-. 1 root render 226, 128 /dev/dri/renderD128 crw-rw-rw-. 1 root render 235, 0 /dev/kfd ``` ## What's Been Tried 1. **Built from main** (Dec 30, 2025) to include PR #13196 GTT fix - still CPU fallback 2. **HSA_OVERRIDE_GFX_VERSION=11.5.0** - still CPU fallback 3. **Verified permissions** - /dev/kfd and /dev/dri/renderD128 are world-accessible ## Windows Works The same hardware (dual-boot) works with Ollama on Windows - GPU is utilized. This appears to be Linux-specific. ## Related Issues - #9553 - gfx1151 ROCm broken on Windows (crashes) - #10993 - gfx1151 crashes on Windows - #12062 - GTT memory fix (merged, but didn't resolve this) This issue is different: on Linux, there's no crash or error - it just silently falls back to CPU. ## Expected Behavior Ollama should use the GPU for inference since rocminfo detects gfx1151 correctly and gfx1151 is listed as a supported GPU type.
Author
Owner

@rick-github commented on GitHub (Dec 30, 2025):

Server log will help with debugging.

<!-- gh-comment-id:3700862277 --> @rick-github commented on GitHub (Dec 30, 2025): [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) will help with debugging.
Author
Owner

@blue-az commented on GitHub (Jan 1, 2026):

Here's the server log showing the GPU discovery failure:

$ journalctl -u ollama -b

Jan 01 15:43:33 fedora systemd[1]: Started ollama.service - Ollama Service.
Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.266-07:00 level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 ... OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.269-07:00 level=INFO source=routes.go:1607 msg="Listening on 127.0.0.1:11434 (version 0.13.5)"
Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.271-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.272-07:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35241"
Jan 01 15:43:35 fedora ollama[1547]: time=2026-01-01T15:43:35.456-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
Jan 01 15:43:36 fedora systemd-coredump[1966]: Process 1705 (ollama) of user 974 dumped core.
                Module libggml-hip.so without build-id.
                Stack trace of thread 1716:
                #0  0x00005560a76d7281 n/a (/usr/local/bin/ollama + 0x380281)
                ...
                #8  0x00007fc25de75617 _ZN4rocr4core7Runtime14VMFaultHandlerElPv (libhsa-runtime64.so.1 + 0x75617)
                #9  0x00007fc25de73ec9 _ZN4rocr4core7Runtime15AsyncEventsLoopEPv (libhsa-runtime64.so.1 + 0x73ec9)
Jan 01 15:43:36 fedora ollama[1547]: time=2026-01-01T15:43:36.665-07:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" error="runner crashed"
Jan 01 15:43:36 fedora ollama[1547]: time=2026-01-01T15:43:36.666-07:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="27.0 GiB" available="25.2 GiB"
Jan 01 15:43:36 fedora ollama[1547]: time=2026-01-01T15:43:36.666-07:00 level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

Summary: During GPU discovery, libggml-hip.so crashes with a VM fault in libhsa-runtime64.so.1, causing fallback to CPU.

Environment:

  • Ollama 0.13.5
  • Fedora 43, kernel 6.17.12
  • ROCm 6.4.2 (Fedora packages)
  • GPU: gfx1151 / Radeon 8050S (Strix Halo APU)

rocminfo detects the GPU correctly as Agent 2 (gfx1151, 32 CUs, 13.5GB VRAM pool). The crash occurs in Ollama's bundled HIP library when trying to initialize it.

<!-- gh-comment-id:3704204645 --> @blue-az commented on GitHub (Jan 1, 2026): Here's the server log showing the GPU discovery failure: ``` $ journalctl -u ollama -b Jan 01 15:43:33 fedora systemd[1]: Started ollama.service - Ollama Service. Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.266-07:00 level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 ... OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.269-07:00 level=INFO source=routes.go:1607 msg="Listening on 127.0.0.1:11434 (version 0.13.5)" Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.271-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." Jan 01 15:43:33 fedora ollama[1547]: time=2026-01-01T15:43:33.272-07:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35241" Jan 01 15:43:35 fedora ollama[1547]: time=2026-01-01T15:43:35.456-07:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" Jan 01 15:43:36 fedora systemd-coredump[1966]: Process 1705 (ollama) of user 974 dumped core. Module libggml-hip.so without build-id. Stack trace of thread 1716: #0 0x00005560a76d7281 n/a (/usr/local/bin/ollama + 0x380281) ... #8 0x00007fc25de75617 _ZN4rocr4core7Runtime14VMFaultHandlerElPv (libhsa-runtime64.so.1 + 0x75617) #9 0x00007fc25de73ec9 _ZN4rocr4core7Runtime15AsyncEventsLoopEPv (libhsa-runtime64.so.1 + 0x73ec9) Jan 01 15:43:36 fedora ollama[1547]: time=2026-01-01T15:43:36.665-07:00 level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" error="runner crashed" Jan 01 15:43:36 fedora ollama[1547]: time=2026-01-01T15:43:36.666-07:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="27.0 GiB" available="25.2 GiB" Jan 01 15:43:36 fedora ollama[1547]: time=2026-01-01T15:43:36.666-07:00 level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` **Summary**: During GPU discovery, `libggml-hip.so` crashes with a VM fault in `libhsa-runtime64.so.1`, causing fallback to CPU. **Environment**: - Ollama 0.13.5 - Fedora 43, kernel 6.17.12 - ROCm 6.4.2 (Fedora packages) - GPU: gfx1151 / Radeon 8050S (Strix Halo APU) `rocminfo` detects the GPU correctly as Agent 2 (gfx1151, 32 CUs, 13.5GB VRAM pool). The crash occurs in Ollama's bundled HIP library when trying to initialize it.
Author
Owner

@blue-az commented on GitHub (Jan 1, 2026):

Tried HSA_OVERRIDE_GFX_VERSION=11.0.0 workaround - same crash:

Jan 01 15:57:19 fedora ollama[48386]: level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0
Jan 01 15:57:23 fedora systemd-coredump[48600]: Process 48556 (ollama) of user 974 dumped core.
                Module libggml-hip.so without build-id.
                #8  _ZN4rocr4core7Runtime14VMFaultHandlerElPv (libhsa-runtime64.so.1 + 0x75617)
Jan 01 15:57:23 fedora ollama[48386]: msg="failure during GPU discovery" error="runner crashed"

The VM fault occurs regardless of gfx version override. This suggests the issue is deeper than architecture detection - possibly incompatibility between Ollama's bundled libggml-hip.so and Fedora's ROCm 6.4.2 / kernel 6.17.

<!-- gh-comment-id:3704206896 --> @blue-az commented on GitHub (Jan 1, 2026): Tried `HSA_OVERRIDE_GFX_VERSION=11.0.0` workaround - same crash: ``` Jan 01 15:57:19 fedora ollama[48386]: level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0 Jan 01 15:57:23 fedora systemd-coredump[48600]: Process 48556 (ollama) of user 974 dumped core. Module libggml-hip.so without build-id. #8 _ZN4rocr4core7Runtime14VMFaultHandlerElPv (libhsa-runtime64.so.1 + 0x75617) Jan 01 15:57:23 fedora ollama[48386]: msg="failure during GPU discovery" error="runner crashed" ``` The VM fault occurs regardless of gfx version override. This suggests the issue is deeper than architecture detection - possibly incompatibility between Ollama's bundled `libggml-hip.so` and Fedora's ROCm 6.4.2 / kernel 6.17.
Author
Owner

@blue-az commented on GitHub (Jan 1, 2026):

Workaround found: Vulkan backend works!

Disabling HIP and enabling Vulkan successfully uses the GPU:

# /etc/systemd/system/ollama.service
Environment="OLLAMA_VULKAN=1"
Environment="HIP_VISIBLE_DEVICES=-1"

Result:

$ ollama ps
NAME               ID              SIZE      PROCESSOR    CONTEXT    UNTIL
granite4:latest    4235724a127c    2.7 GB    100% GPU     4096       4 minutes from now

$ ollama run granite4 "say hello" --verbose
eval rate:            58.73 tokens/s
level=INFO msg="inference compute" library=Vulkan name=Vulkan0 description="Radeon 8050S Graphics (RADV GFX1151)" type=iGPU total="17.5 GiB" available="17.2 GiB"

The underlying issue remains: HIP/ROCm backend crashes with VM fault on gfx1151. But Vulkan is a viable workaround for now.

<!-- gh-comment-id:3704211129 --> @blue-az commented on GitHub (Jan 1, 2026): **Workaround found: Vulkan backend works!** Disabling HIP and enabling Vulkan successfully uses the GPU: ```ini # /etc/systemd/system/ollama.service Environment="OLLAMA_VULKAN=1" Environment="HIP_VISIBLE_DEVICES=-1" ``` Result: ``` $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL granite4:latest 4235724a127c 2.7 GB 100% GPU 4096 4 minutes from now $ ollama run granite4 "say hello" --verbose eval rate: 58.73 tokens/s ``` ``` level=INFO msg="inference compute" library=Vulkan name=Vulkan0 description="Radeon 8050S Graphics (RADV GFX1151)" type=iGPU total="17.5 GiB" available="17.2 GiB" ``` The underlying issue remains: **HIP/ROCm backend crashes with VM fault on gfx1151**. But Vulkan is a viable workaround for now.
Author
Owner

@snippetsBySam commented on GitHub (Jan 2, 2026):

Have you tried using ROCm 7.1.1? I have a gpu from the same family (Radeon 8060S) and my GPU is detected properly (logs below). Also I noticed that the HSA_OVERRIDE_GFX_VERSION is missing from the first set of logs you provided - could it be a configuration error on your end?

time=2025-12-31T02:51:57.859Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.5.1 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-12-31T02:51:57.859Z level=INFO source=images.go:493 msg="total blobs: 0"
time=2025-12-31T02:51:57.859Z level=INFO source=images.go:500 msg="total unused blobs removed: 0"
time=2025-12-31T02:51:57.859Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.13.5)"
time=2025-12-31T02:51:57.859Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-12-31T02:51:57.860Z level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.5.1
time=2025-12-31T02:51:57.860Z level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2025-12-31T02:51:57.860Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36447"
time=2025-12-31T02:51:58.627Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46081"
time=2025-12-31T02:51:59.299Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="1.0 GiB" available="241.4 MiB"
time=2025-12-31T02:51:59.299Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="1.0 GiB" threshold="20.0 GiB"

fyi, I'm running it using the ollama rocm docker image so maybe you might also have some success in tryng that. Here is my docker-compose:

services:
  ollama:
    image: ollama/ollama:rocm
    hostname: ${HOSTNAME}
    container_name: ollama
    environment:
      - TZ=${TZ}
      - HSA_OVERRIDE_GFX_VERSION=11.5.1
    devices:
      - /dev/kfd
      - /dev/dri
    volumes:
      - "./ollama:/root/.ollama"
    ports:
      - "11434:11434"
    restart: unless-stopped
<!-- gh-comment-id:3705190636 --> @snippetsBySam commented on GitHub (Jan 2, 2026): Have you tried using ROCm 7.1.1? I have a gpu from the same family (Radeon 8060S) and my GPU is detected properly (logs below). Also I noticed that the `HSA_OVERRIDE_GFX_VERSION` is missing from the first set of logs you provided - could it be a configuration error on your end? ``` time=2025-12-31T02:51:57.859Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.5.1 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-12-31T02:51:57.859Z level=INFO source=images.go:493 msg="total blobs: 0" time=2025-12-31T02:51:57.859Z level=INFO source=images.go:500 msg="total unused blobs removed: 0" time=2025-12-31T02:51:57.859Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.13.5)" time=2025-12-31T02:51:57.859Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-12-31T02:51:57.860Z level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.5.1 time=2025-12-31T02:51:57.860Z level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2025-12-31T02:51:57.860Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36447" time=2025-12-31T02:51:58.627Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46081" time=2025-12-31T02:51:59.299Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="1.0 GiB" available="241.4 MiB" time=2025-12-31T02:51:59.299Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="1.0 GiB" threshold="20.0 GiB" ``` fyi, I'm running it using the ollama rocm docker image so maybe you might also have some success in tryng that. Here is my docker-compose: ``` services: ollama: image: ollama/ollama:rocm hostname: ${HOSTNAME} container_name: ollama environment: - TZ=${TZ} - HSA_OVERRIDE_GFX_VERSION=11.5.1 devices: - /dev/kfd - /dev/dri volumes: - "./ollama:/root/.ollama" ports: - "11434:11434" restart: unless-stopped ```
Author
Owner

@rick-github commented on GitHub (Jan 4, 2026):

HSA_OVERRIDE_GFX_VERSION is not necessary for 8050/8060, ollama will detect the GPU.

@blue-az Set OLLAMA_DEBUG=2 in the server enviroment and then post the log from the start up to the inference compute line.

<!-- gh-comment-id:3707764215 --> @rick-github commented on GitHub (Jan 4, 2026): `HSA_OVERRIDE_GFX_VERSION` is not necessary for 8050/8060, ollama will detect the GPU. @blue-az Set `OLLAMA_DEBUG=2` in the server enviroment and then post the log from the start up to the `inference compute` line.
Author
Owner

@blue-az commented on GitHub (Jan 7, 2026):

With OLLAMA_DEBUG=2 enabled, here are the logs from startup to inference:

Environment:

  • GPU: AMD Radeon 8050S (gfx1151) - ASUS ROG Flow Z13
  • Ollama version: 0.13.5
  • OS: Fedora 43
  • Workaround: OLLAMA_VULKAN=1, HIP_VISIBLE_DEVICES=-1

Server startup config:
OLLAMA_VULKAN:true
HIP_VISIBLE_DEVICES:-1

GPU Discovery:
The system correctly fails ROCm (as expected with HIP_VISIBLE_DEVICES=-1):

ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected

Then successfully detects GPU via Vulkan:

detected GPU: AMD Radeon 8050S (gfx1151)
loaded Vulkan backend from /usr/local/lib/ollama/vulkan/libggml-vulkan.so
GPU.0.NAME: AMD Radeon 8050S (gfx1151)
GPU.0.MEMORY: 11968 MiB

Inference works correctly - model loads and runs on GPU successfully.

The workaround is stable and performs well.

<!-- gh-comment-id:3717195335 --> @blue-az commented on GitHub (Jan 7, 2026): With OLLAMA_DEBUG=2 enabled, here are the logs from startup to inference: **Environment:** - GPU: AMD Radeon 8050S (gfx1151) - ASUS ROG Flow Z13 - Ollama version: 0.13.5 - OS: Fedora 43 - Workaround: OLLAMA_VULKAN=1, HIP_VISIBLE_DEVICES=-1 **Server startup config:** OLLAMA_VULKAN:true HIP_VISIBLE_DEVICES:-1 **GPU Discovery:** The system correctly fails ROCm (as expected with HIP_VISIBLE_DEVICES=-1): ``` ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected ``` Then successfully detects GPU via Vulkan: ``` detected GPU: AMD Radeon 8050S (gfx1151) loaded Vulkan backend from /usr/local/lib/ollama/vulkan/libggml-vulkan.so GPU.0.NAME: AMD Radeon 8050S (gfx1151) GPU.0.MEMORY: 11968 MiB ``` **Inference works correctly** - model loads and runs on GPU successfully. The workaround is stable and performs well.
Author
Owner

@rick-github commented on GitHub (Jan 7, 2026):

The logs appear to have been misplaced, but if you are satisfied with using Vulkan, feel free to close the issue.

<!-- gh-comment-id:3717573187 --> @rick-github commented on GitHub (Jan 7, 2026): The logs appear to have been misplaced, but if you are satisfied with using Vulkan, feel free to close the issue.
Author
Owner

@blue-az commented on GitHub (Jan 7, 2026):

Here are the requested debug logs with OLLAMA_DEBUG=2 set. These show successful GPU detection and inference using the Vulkan backend on AMD Radeon 8050S (gfx1151).

System Configuration:

  • OS: Fedora 43
  • GPU: AMD Radeon 8050S (gfx1151)
  • Ollama: 0.13.5
  • Environment: OLLAMA_VULKAN=1, HIP_VISIBLE_DEVICES=-1, OLLAMA_DEBUG=2

Server Startup

time=2026-01-07T00:54:35.067-07:00 level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:-1 HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-01-07T00:54:35.068-07:00 level=INFO source=images.go:493 msg="total blobs: 20"
time=2026-01-07T00:54:35.068-07:00 level=INFO source=images.go:500 msg="total unused blobs removed: 0"
time=2026-01-07T00:54:35.069-07:00 level=INFO source=routes.go:1607 msg="Listening on 127.0.0.1:11434 (version 0.13.5)"
time=2026-01-07T00:54:35.069-07:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler"

GPU Discovery Process

time=2026-01-07T00:54:35.069-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-01-07T00:54:35.070-07:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HIP_VISIBLE_DEVICES=-1
time=2026-01-07T00:54:35.070-07:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"

time=2026-01-07T00:54:35.103-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)

Vulkan GPU Detection (Successful)

ggml_vulkan: Found 1 Vulkan devices:
load_backend: loaded Vulkan backend from /usr/local/lib/ollama/vulkan/libggml-vulkan.so
time=2026-01-07T00:54:35.412-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]" devices="[{DeviceID:{ID:00000000-c400-0000-0000-000000000000 Library:Vulkan} Name:Vulkan0 Description:Radeon 8050S Graphics (RADV GFX1151) FilterID: Integrated:true PCIID:0000:c4:00.0 TotalMemory:18811338752 FreeMemory:18144481280 ComputeMajor:0 ComputeMinor:0 DriverMajor:0 DriverMinor:0 LibraryPath:[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]}]"
time=2026-01-07T00:54:35.412-07:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[Vulkan:map[/usr/local/lib/ollama/vulkan:map[00000000-c400-0000-0000-000000000000:0]]]

Inference Compute Line

time=2026-01-07T00:54:35.412-07:00 level=INFO source=types.go:42 msg="inference compute" id=00000000-c400-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Radeon 8050S Graphics (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c4:00.0 type=iGPU total="17.5 GiB" available="16.9 GiB"

Model Loading (granite4)

time=2026-01-07T00:54:58.281-07:00 level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c400-0000-0000-000000000000 library=Vulkan available="16.5 GiB" free="16.9 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-01-07T00:54:58.282-07:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="2.0 GiB"
time=2026-01-07T00:54:58.282-07:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="320.0 MiB"
time=2026-01-07T00:54:58.282-07:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="266.7 MiB"

llama_model_load_from_file_impl: using device Vulkan0 (Radeon 8050S Graphics (RADV GFX1151)) (0000:c4:00.0) - 17305 MiB free
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors:      Vulkan0 model buffer size =  1998.84 MiB
load_tensors:  Vulkan_Host model buffer size =   200.98 MiB

llama_kv_cache:    Vulkan0 KV buffer size =   320.00 MiB
llama_context:    Vulkan0 compute buffer size =   201.00 MiB
llama_context: Vulkan_Host compute buffer size =    13.02 MiB

time=2026-01-07T00:55:00.114-07:00 level=INFO source=server.go:1376 msg="llama runner started in 1.83 seconds"
time=2026-01-07T00:55:00.115-07:00 level=DEBUG source=sched.go:529 msg="finished setting up" runner.name=registry.ollama.ai/library/granite4:latest runner.inference="[{ID:00000000-c400-0000-0000-000000000000 Library:Vulkan}]" runner.size="2.5 GiB" runner.vram="2.5 GiB" runner.parallel=1

The Vulkan workaround is working perfectly for gfx1151. GPU acceleration is confirmed with all 41 layers offloaded to the GPU. I'm satisfied with this solution and can close the issue if there's nothing else to investigate regarding the HIP/ROCm crash.

<!-- gh-comment-id:3717726396 --> @blue-az commented on GitHub (Jan 7, 2026): Here are the requested debug logs with `OLLAMA_DEBUG=2` set. These show successful GPU detection and inference using the Vulkan backend on AMD Radeon 8050S (gfx1151). **System Configuration:** - OS: Fedora 43 - GPU: AMD Radeon 8050S (gfx1151) - Ollama: 0.13.5 - Environment: `OLLAMA_VULKAN=1`, `HIP_VISIBLE_DEVICES=-1`, `OLLAMA_DEBUG=2` ## Server Startup ``` time=2026-01-07T00:54:35.067-07:00 level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:-1 HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-01-07T00:54:35.068-07:00 level=INFO source=images.go:493 msg="total blobs: 20" time=2026-01-07T00:54:35.068-07:00 level=INFO source=images.go:500 msg="total unused blobs removed: 0" time=2026-01-07T00:54:35.069-07:00 level=INFO source=routes.go:1607 msg="Listening on 127.0.0.1:11434 (version 0.13.5)" time=2026-01-07T00:54:35.069-07:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler" ``` ## GPU Discovery Process ``` time=2026-01-07T00:54:35.069-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-01-07T00:54:35.070-07:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HIP_VISIBLE_DEVICES=-1 time=2026-01-07T00:54:35.070-07:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-01-07T00:54:35.103-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ``` ## Vulkan GPU Detection (Successful) ``` ggml_vulkan: Found 1 Vulkan devices: load_backend: loaded Vulkan backend from /usr/local/lib/ollama/vulkan/libggml-vulkan.so time=2026-01-07T00:54:35.412-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]" devices="[{DeviceID:{ID:00000000-c400-0000-0000-000000000000 Library:Vulkan} Name:Vulkan0 Description:Radeon 8050S Graphics (RADV GFX1151) FilterID: Integrated:true PCIID:0000:c4:00.0 TotalMemory:18811338752 FreeMemory:18144481280 ComputeMajor:0 ComputeMinor:0 DriverMajor:0 DriverMinor:0 LibraryPath:[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]}]" time=2026-01-07T00:54:35.412-07:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[Vulkan:map[/usr/local/lib/ollama/vulkan:map[00000000-c400-0000-0000-000000000000:0]]] ``` ## Inference Compute Line ``` time=2026-01-07T00:54:35.412-07:00 level=INFO source=types.go:42 msg="inference compute" id=00000000-c400-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Radeon 8050S Graphics (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c4:00.0 type=iGPU total="17.5 GiB" available="16.9 GiB" ``` ## Model Loading (granite4) ``` time=2026-01-07T00:54:58.281-07:00 level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c400-0000-0000-000000000000 library=Vulkan available="16.5 GiB" free="16.9 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-01-07T00:54:58.282-07:00 level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="2.0 GiB" time=2026-01-07T00:54:58.282-07:00 level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="320.0 MiB" time=2026-01-07T00:54:58.282-07:00 level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="266.7 MiB" llama_model_load_from_file_impl: using device Vulkan0 (Radeon 8050S Graphics (RADV GFX1151)) (0000:c4:00.0) - 17305 MiB free load_tensors: offloading 40 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 41/41 layers to GPU load_tensors: Vulkan0 model buffer size = 1998.84 MiB load_tensors: Vulkan_Host model buffer size = 200.98 MiB llama_kv_cache: Vulkan0 KV buffer size = 320.00 MiB llama_context: Vulkan0 compute buffer size = 201.00 MiB llama_context: Vulkan_Host compute buffer size = 13.02 MiB time=2026-01-07T00:55:00.114-07:00 level=INFO source=server.go:1376 msg="llama runner started in 1.83 seconds" time=2026-01-07T00:55:00.115-07:00 level=DEBUG source=sched.go:529 msg="finished setting up" runner.name=registry.ollama.ai/library/granite4:latest runner.inference="[{ID:00000000-c400-0000-0000-000000000000 Library:Vulkan}]" runner.size="2.5 GiB" runner.vram="2.5 GiB" runner.parallel=1 ``` --- The Vulkan workaround is working perfectly for gfx1151. GPU acceleration is confirmed with all 41 layers offloaded to the GPU. I'm satisfied with this solution and can close the issue if there's nothing else to investigate regarding the HIP/ROCm crash.
Author
Owner

@alexhegit commented on GitHub (Jan 27, 2026):

Same issue with Ubunt24.04 + ROCm7.1.1

lex@GZ302EA:~/Videos$ ls /opt/rocm
rocm/       rocm-7.1.1/ 
alex@GZ302EA:~/Videos$ uname -a
Linux GZ302EA 6.14.0-24-generic #24~24.04.3-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul  7 16:39:17 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

lex@GZ302EA:~/Videos$ amd-smi 
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.0+021c61fc      amdgpu version: 6.16.6   ROCm version: 7.1.1    |
| VBIOS version: 00107962                                                      |
| Platform: Linux Baremetal                                                    |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:c4:00.0    AMD Radeon Graphics | N/A        N/A   0             N/A/0 W |
|   0       0     N/A             N/A | N/A        N/A            1454/8192 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|  No running processes found                                                  |

Check of ollama and gpu

alex@GZ302EA:~/Videos$ ollama ps
NAME          ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
qwen3:0.6b    7df6b6e09427    1.0 GB    100% CPU     4096       4 minutes from now    
qwen3:8b      e4b5fd7f8af0    5.9 GB    100% CPU     4096       4 minutes from now    
alex@GZ302EA:~/Videos$ rocminfo | grep gfx
  Name:                    gfx1151                            
      Name:                    amdgcn-amd-amdhsa--gfx1151         
      Name:                    amdgcn-amd-amdhsa--gfx11-generic   
alex@GZ302EA:~/Videos$ ollama --version
ollama version is 0.13.5
<!-- gh-comment-id:3805285309 --> @alexhegit commented on GitHub (Jan 27, 2026): Same issue with Ubunt24.04 + ROCm7.1.1 ``` lex@GZ302EA:~/Videos$ ls /opt/rocm rocm/ rocm-7.1.1/ alex@GZ302EA:~/Videos$ uname -a Linux GZ302EA 6.14.0-24-generic #24~24.04.3-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 7 16:39:17 UTC 2 x86_64 x86_64 x86_64 GNU/Linux lex@GZ302EA:~/Videos$ amd-smi +------------------------------------------------------------------------------+ | AMD-SMI 26.2.0+021c61fc amdgpu version: 6.16.6 ROCm version: 7.1.1 | | VBIOS version: 00107962 | | Platform: Linux Baremetal | |-------------------------------------+----------------------------------------| | BDF GPU-Name | Mem-Uti Temp UEC Power-Usage | | GPU HIP-ID OAM-ID Partition-Mode | GFX-Uti Fan Mem-Usage | |=====================================+========================================| | 0000:c4:00.0 AMD Radeon Graphics | N/A N/A 0 N/A/0 W | | 0 0 N/A N/A | N/A N/A 1454/8192 MB | +-------------------------------------+----------------------------------------+ +------------------------------------------------------------------------------+ | Processes: | | GPU PID Process Name GTT_MEM VRAM_MEM MEM_USAGE CU % | |==============================================================================| | No running processes found | ``` Check of ollama and gpu ``` alex@GZ302EA:~/Videos$ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen3:0.6b 7df6b6e09427 1.0 GB 100% CPU 4096 4 minutes from now qwen3:8b e4b5fd7f8af0 5.9 GB 100% CPU 4096 4 minutes from now alex@GZ302EA:~/Videos$ rocminfo | grep gfx Name: gfx1151 Name: amdgcn-amd-amdhsa--gfx1151 Name: amdgcn-amd-amdhsa--gfx11-generic alex@GZ302EA:~/Videos$ ollama --version ollama version is 0.13.5 ```
Author
Owner

@Znuff commented on GitHub (Feb 3, 2026):

I'm also having the same issue with ollama 0.15.4 running under Docker (ollama:rocm image), under Ubuntu 24.04, 6.14.0-37-generic, ROCm 7.2.0

Relevant docker compose section:

services:
  ollama:
    image: ollama/ollama:rocm
    container_name: ollama
    volumes:
      - ./ollama:/root/.ollama
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
    environment:
      - OLLAMA_DEBUG=2

Initialization log:

ollama  | time=2026-02-03T00:14:22.144Z level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:h
ttp://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost
http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy:
no_proxy:]"
ollama  | time=2026-02-03T00:14:22.145Z level=INFO source=images.go:473 msg="total blobs: 11"
ollama  | time=2026-02-03T00:14:22.145Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
ollama  | time=2026-02-03T00:14:22.145Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.4)"
ollama  | time=2026-02-03T00:14:22.145Z level=DEBUG source=sched.go:121 msg="starting llm scheduler"
ollama  | time=2026-02-03T00:14:22.145Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
ollama  | time=2026-02-03T00:14:22.145Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs=map[]
ollama  | time=2026-02-03T00:14:22.146Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46855"
ollama  | time=2026-02-03T00:14:22.146Z level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm
ollama  | time=2026-02-03T00:14:22.154Z level=INFO source=runner.go:1405 msg="starting ollama engine"
ollama  | time=2026-02-03T00:14:22.155Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:46855"
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default=""
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default=""
ollama  | time=2026-02-03T00:14:22.158Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
ollama  | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
ollama  | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
ollama  | time=2026-02-03T00:14:22.162Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
ollama  | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ollama  | ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ollama  | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ollama  | ggml_cuda_init: found 1 ROCm devices:
ollama  |   Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
ollama  | load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
ollama  | time=2026-02-03T00:14:22.977Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.pooling_type default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.expert_count default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.pre default=""
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.embedding_length default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count_kv default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.key_length default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.dimension_count default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.freq_base default=100000
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.scaling.factor default=1
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=820.124159ms
ollama  | ggml_hip_get_device_memory searching for device 0000:c5:00.0
ollama  | ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 102057496576 total: 102238334976
ollama  | time=2026-02-03T00:14:22.977Z level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=242.532µs
ollama  | time=2026-02-03T00:14:22.978Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:102238334976 FreeMemory:102057496576 ComputeMajor:17 Comp
uteMinor:81 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
ollama  | time=2026-02-03T00:14:22.978Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=832.510369ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=map[]
ollama  | time=2026-02-03T00:14:22.978Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
ollama  | time=2026-02-03T00:14:22.978Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1151 id=0 pci_id=0000:c5:00.0
ollama  | time=2026-02-03T00:14:22.978Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
ollama  | time=2026-02-03T00:14:22.978Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41363"
ollama  | time=2026-02-03T00:14:22.978Z level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ROCR_VISIBLE
_DEVICES=0 GGML_CUDA_INIT=1
ollama  | time=2026-02-03T00:14:22.987Z level=INFO source=runner.go:1405 msg="starting ollama engine"
ollama  | time=2026-02-03T00:14:22.987Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:41363"
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default=""
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default=""
ollama  | time=2026-02-03T00:14:22.990Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
ollama  | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
ollama  | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
ollama  | time=2026-02-03T00:14:22.993Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
ollama  | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory

ollama  | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ollama  | ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ollama  | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ollama  | ggml_cuda_init: found 1 ROCm devices:
ollama  | ggml_cuda_init: initializing rocBLAS on device 0
ollama  | Memory access fault by GPU node-1 (Agent handle: 0x7148306f12c0) on address 0x714850011000. Reason: Page not present or supervisor privilege.
ollama  | time=2026-02-03T00:14:23.726Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" error="runner crashed"
ollama  | time=2026-02-03T00:14:23.727Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices=[]
ollama  | time=2026-02-03T00:14:23.727Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=748.715272ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
ollama  | time=2026-02-03T00:14:23.727Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=/usr/lib/ollama/rocm pci_id=0000:c5:00.0 library=ROCm
ollama  | time=2026-02-03T00:14:23.727Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[]
ollama  | time=2026-02-03T00:14:23.727Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1151 pci_id=0000:c5:00.0
ollama  | time=2026-02-03T00:14:23.727Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.5815509s
ollama  | time=2026-02-03T00:14:23.727Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="62.4 GiB" available="62.4 GiB"
ollama  | time=2026-02-03T00:14:23.727Z level=INFO source=routes.go:1725 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

I guess the key is:

ollama  | Memory access fault by GPU node-1 (Agent handle: 0x7148306f12c0) on address 0x714850011000. Reason: Page not present or supervisor privilege.

amd-smi output:

# amd-smi
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.1+fc0010cf6a    amdgpu version: 6.14.0-37 ROCm version: 7.2.0    |
| VBIOS version: 023.011.000.039.000001                                        |
| Platform: Linux Baremetal                                                    |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:c5:00.0    AMD Radeon Graphics | N/A        N/A   0                 N/A |
|   0       0     N/A             N/A | N/A        N/A            154/65536 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|  No running processes found                                                  |
+------------------------------------------------------------------------------+

rocminfo lists:

Agent 2
*******
  Name:                    gfx1151
  Marketing Name:          AMD Radeon Graphics
  Device Type:             GPU

[...]

Agent 3
*******
  Name:                    aie2p
  Marketing Name:          RyzenAI-npu5
  Device Type:             DSP

EDIT: Extra info

For the people who will inevitably read this at some point, setting up OLLAMA_VULKAN=1, HIP_VISIBLE_DEVICES=-1 OLLAMA_DEBUG=2 does not fix the issue with the ollama/ollama:rocm image.

You will need to swap to ollama/ollama:latest for OLLAMA_VULKAN=1 to work.

Keep in mind that, while my original issue is with the Docker image, the same memory issue (memory access fault) seems to also happen with ollama running on the host directly.

In addition to this, every time ollama tries to initialize with ROCm, the driver seems to be having a fit:

[  441.835360] amdgpu 0000:c5:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32770)
[  441.835384] amdgpu 0000:c5:00.0: amdgpu:  in process ollama pid 13213 thread ollama pid 13223)
[  441.835392] amdgpu 0000:c5:00.0: amdgpu:   in page starting at address 0x00007c687ea7f000 from client 10
[  441.835400] amdgpu 0000:c5:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932
[  441.835407] amdgpu 0000:c5:00.0: amdgpu:      Faulty UTCL2 client ID: CPF (0x4)
[  441.835413] amdgpu 0000:c5:00.0: amdgpu:      MORE_FAULTS: 0x0
[  441.835419] amdgpu 0000:c5:00.0: amdgpu:      WALKER_ERROR: 0x1
[  441.835424] amdgpu 0000:c5:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[  441.835430] amdgpu 0000:c5:00.0: amdgpu:      MAPPING_ERROR: 0x1
[  441.835435] amdgpu 0000:c5:00.0: amdgpu:      RW: 0x0
[  441.908221] amdgpu: Freeing queue vital buffer 0x7c666a600000, queue evicted

I'm wondering if this is an actual issue with the firmware that Ubuntu 24.04, I seem to recall that Fedora had an issue with faulty firmware pushed by AMD a while ago, and then they reverted the changes because stuff was crashing. Nope: changed firmware to the "reverted" ones and didn't fix.

My last thought is that ollama detects the wrong amount of VRAM available, specifically:

ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 102061383680 total: 102238334976

But the current VRAM allocated is:

[    3.660869] amdgpu 0000:c5:00.0: amdgpu: VRAM: 65536M 0x0000008000000000 - 0x0000008FFFFFFFFF (65536M used)
[    3.660872] amdgpu 0000:c5:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    3.660892] [drm] Detected VRAM RAM=65536M, BAR=65536M
[    3.660893] [drm] RAM width 256bits LPDDR5
[    3.661019] [drm] amdgpu: 65536M of VRAM memory ready
[    3.661020] [drm] amdgpu: 31966M of GTT memory ready.
[    3.661034] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    3.661716] [drm] PCIE GART of 512M enabled (table at 0x0000008000700000).
[    3.662108] [drm] Loading DMUB firmware via PSP: version=0x09003500
[    3.662487] [drm] Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 27

I even tried patching ggml_hip_get_device_memory to report a smaller amount (the GTT amount) but it would still crash.

<!-- gh-comment-id:3838088774 --> @Znuff commented on GitHub (Feb 3, 2026): I'm also having the same issue with `ollama 0.15.4` running under Docker (`ollama:rocm` image), under **Ubuntu 24.04**, `6.14.0-37-generic`, `ROCm 7.2.0` Relevant docker compose section: ``` services: ollama: image: ollama/ollama:rocm container_name: ollama volumes: - ./ollama:/root/.ollama devices: - /dev/kfd - /dev/dri group_add: - video environment: - OLLAMA_DEBUG=2 ``` ## Initialization log: ``` ollama | time=2026-02-03T00:14:22.144Z level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:h ttp://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" ollama | time=2026-02-03T00:14:22.145Z level=INFO source=images.go:473 msg="total blobs: 11" ollama | time=2026-02-03T00:14:22.145Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" ollama | time=2026-02-03T00:14:22.145Z level=INFO source=routes.go:1684 msg="Listening on [::]:11434 (version 0.15.4)" ollama | time=2026-02-03T00:14:22.145Z level=DEBUG source=sched.go:121 msg="starting llm scheduler" ollama | time=2026-02-03T00:14:22.145Z level=INFO source=runner.go:67 msg="discovering available GPUs..." ollama | time=2026-02-03T00:14:22.145Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs=map[] ollama | time=2026-02-03T00:14:22.146Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46855" ollama | time=2026-02-03T00:14:22.146Z level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ollama | time=2026-02-03T00:14:22.154Z level=INFO source=runner.go:1405 msg="starting ollama engine" ollama | time=2026-02-03T00:14:22.155Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:46855" ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0 ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default="" ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default="" ollama | time=2026-02-03T00:14:22.158Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 ollama | time=2026-02-03T00:14:22.158Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama ollama | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so ollama | time=2026-02-03T00:14:22.162Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm ollama | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ollama | ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ollama | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ollama | ggml_cuda_init: found 1 ROCm devices: ollama | Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 ollama | load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so ollama | time=2026-02-03T00:14:22.977Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.pooling_type default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.expert_count default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=tokenizer.ggml.pre default="" ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.block_count default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.embedding_length default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.head_count_kv default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.key_length default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.dimension_count default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.freq_base default=100000 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=llama.rope.scaling.factor default=1 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=820.124159ms ollama | ggml_hip_get_device_memory searching for device 0000:c5:00.0 ollama | ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 102057496576 total: 102238334976 ollama | time=2026-02-03T00:14:22.977Z level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=242.532µs ollama | time=2026-02-03T00:14:22.978Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:102238334976 FreeMemory:102057496576 ComputeMajor:17 Comp uteMinor:81 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" ollama | time=2026-02-03T00:14:22.978Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=832.510369ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=map[] ollama | time=2026-02-03T00:14:22.978Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1 ollama | time=2026-02-03T00:14:22.978Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1151 id=0 pci_id=0000:c5:00.0 ollama | time=2026-02-03T00:14:22.978Z level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/usr/lib/ollama /usr/lib/ollama/rocm]" extraEnvs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" ollama | time=2026-02-03T00:14:22.978Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41363" ollama | time=2026-02-03T00:14:22.978Z level=DEBUG source=server.go:430 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ROCR_VISIBLE _DEVICES=0 GGML_CUDA_INIT=1 ollama | time=2026-02-03T00:14:22.987Z level=INFO source=runner.go:1405 msg="starting ollama engine" ollama | time=2026-02-03T00:14:22.987Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:41363" ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=gguf.go:589 msg=general.architecture type=string ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=gguf.go:589 msg=tokenizer.ggml.model type=string ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.alignment default=32 ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.file_type default=0 ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.name default="" ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:298 msg="key with type not found" key=general.description default="" ollama | time=2026-02-03T00:14:22.990Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 ollama | time=2026-02-03T00:14:22.990Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama ollama | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so ollama | time=2026-02-03T00:14:22.993Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm ollama | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ollama | /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ollama | ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ollama | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ollama | ggml_cuda_init: found 1 ROCm devices: ollama | ggml_cuda_init: initializing rocBLAS on device 0 ollama | Memory access fault by GPU node-1 (Agent handle: 0x7148306f12c0) on address 0x714850011000. Reason: Page not present or supervisor privilege. ollama | time=2026-02-03T00:14:23.726Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" error="runner crashed" ollama | time=2026-02-03T00:14:23.727Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices=[] ollama | time=2026-02-03T00:14:23.727Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=748.715272ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" ollama | time=2026-02-03T00:14:23.727Z level=DEBUG source=runner.go:153 msg="filtering device which didn't fully initialize" id=0 libdir=/usr/lib/ollama/rocm pci_id=0000:c5:00.0 library=ROCm ollama | time=2026-02-03T00:14:23.727Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[] ollama | time=2026-02-03T00:14:23.727Z level=TRACE source=runner.go:183 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1151 pci_id=0000:c5:00.0 ollama | time=2026-02-03T00:14:23.727Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.5815509s ollama | time=2026-02-03T00:14:23.727Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="62.4 GiB" available="62.4 GiB" ollama | time=2026-02-03T00:14:23.727Z level=INFO source=routes.go:1725 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` I guess the key is: ``` ollama | Memory access fault by GPU node-1 (Agent handle: 0x7148306f12c0) on address 0x714850011000. Reason: Page not present or supervisor privilege. ``` `amd-smi` output: ``` # amd-smi +------------------------------------------------------------------------------+ | AMD-SMI 26.2.1+fc0010cf6a amdgpu version: 6.14.0-37 ROCm version: 7.2.0 | | VBIOS version: 023.011.000.039.000001 | | Platform: Linux Baremetal | |-------------------------------------+----------------------------------------| | BDF GPU-Name | Mem-Uti Temp UEC Power-Usage | | GPU HIP-ID OAM-ID Partition-Mode | GFX-Uti Fan Mem-Usage | |=====================================+========================================| | 0000:c5:00.0 AMD Radeon Graphics | N/A N/A 0 N/A | | 0 0 N/A N/A | N/A N/A 154/65536 MB | +-------------------------------------+----------------------------------------+ +------------------------------------------------------------------------------+ | Processes: | | GPU PID Process Name GTT_MEM VRAM_MEM MEM_USAGE CU % | |==============================================================================| | No running processes found | +------------------------------------------------------------------------------+ ``` `rocminfo` lists: ``` Agent 2 ******* Name: gfx1151 Marketing Name: AMD Radeon Graphics Device Type: GPU [...] Agent 3 ******* Name: aie2p Marketing Name: RyzenAI-npu5 Device Type: DSP ``` ## EDIT: Extra info For the people who will inevitably read this at some point, setting up `OLLAMA_VULKAN=1`, `HIP_VISIBLE_DEVICES=-1` `OLLAMA_DEBUG=2` __does not__ fix the issue with the `ollama/ollama:rocm` image. You will need to swap to `ollama/ollama:latest` for `OLLAMA_VULKAN=1` to work. Keep in mind that, while my original issue is with the Docker image, the same memory issue (memory access fault) seems to also happen with `ollama` running on the host directly. In addition to this, every time `ollama` tries to initialize with ROCm, the driver seems to be having a fit: ``` [ 441.835360] amdgpu 0000:c5:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32770) [ 441.835384] amdgpu 0000:c5:00.0: amdgpu: in process ollama pid 13213 thread ollama pid 13223) [ 441.835392] amdgpu 0000:c5:00.0: amdgpu: in page starting at address 0x00007c687ea7f000 from client 10 [ 441.835400] amdgpu 0000:c5:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932 [ 441.835407] amdgpu 0000:c5:00.0: amdgpu: Faulty UTCL2 client ID: CPF (0x4) [ 441.835413] amdgpu 0000:c5:00.0: amdgpu: MORE_FAULTS: 0x0 [ 441.835419] amdgpu 0000:c5:00.0: amdgpu: WALKER_ERROR: 0x1 [ 441.835424] amdgpu 0000:c5:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 441.835430] amdgpu 0000:c5:00.0: amdgpu: MAPPING_ERROR: 0x1 [ 441.835435] amdgpu 0000:c5:00.0: amdgpu: RW: 0x0 [ 441.908221] amdgpu: Freeing queue vital buffer 0x7c666a600000, queue evicted ``` ~~I'm wondering if this is an actual issue with the firmware that Ubuntu 24.04, I seem to recall that Fedora had an issue with faulty firmware pushed by AMD a while ago, and then they reverted the changes because stuff was crashing.~~ Nope: changed firmware to the "reverted" ones and didn't fix. My last thought is that `ollama` detects the wrong amount of VRAM available, specifically: ``` ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 102061383680 total: 102238334976 ``` But the current VRAM allocated is: ``` [ 3.660869] amdgpu 0000:c5:00.0: amdgpu: VRAM: 65536M 0x0000008000000000 - 0x0000008FFFFFFFFF (65536M used) [ 3.660872] amdgpu 0000:c5:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF [ 3.660892] [drm] Detected VRAM RAM=65536M, BAR=65536M [ 3.660893] [drm] RAM width 256bits LPDDR5 [ 3.661019] [drm] amdgpu: 65536M of VRAM memory ready [ 3.661020] [drm] amdgpu: 31966M of GTT memory ready. [ 3.661034] [drm] GART: num cpu pages 131072, num gpu pages 131072 [ 3.661716] [drm] PCIE GART of 512M enabled (table at 0x0000008000700000). [ 3.662108] [drm] Loading DMUB firmware via PSP: version=0x09003500 [ 3.662487] [drm] Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 27 ``` I even tried patching `ggml_hip_get_device_memory` to report a smaller amount (the GTT amount) but it would still crash.
Author
Owner

@rick-github commented on GitHub (Feb 3, 2026):

ollama  | Memory access fault by GPU node-1 (Agent handle: 0x7148306f12c0) on address 0x714850011000. Reason: Page not present or supervisor privilege.

This seems to be a kernel issue with the amdgpu driver. I rolled back to 6.11.0-29-generic and my system has been more (not completely) reliable.

My last thought is that ollama detects the wrong amount of VRAM available, specifically:

As of 0.14.0, (#13196) the ollama server combines VRAM and GTT into a single pool through the amdgpu driver.

<!-- gh-comment-id:3840712134 --> @rick-github commented on GitHub (Feb 3, 2026): ``` ollama | Memory access fault by GPU node-1 (Agent handle: 0x7148306f12c0) on address 0x714850011000. Reason: Page not present or supervisor privilege. ``` This seems to be a kernel issue with the amdgpu driver. I rolled back to 6.11.0-29-generic and my system has been more (not completely) reliable. > My last thought is that ollama detects the wrong amount of VRAM available, specifically: As of 0.14.0, (#13196) the ollama server combines VRAM and GTT into a single pool through the amdgpu driver.
Author
Owner

@snippetsBySam commented on GitHub (Feb 3, 2026):

ROCm works well for me now on Ubuntu 24.04.3 after updating the kernel to 6.18.7 then reinstalling ROCm 7.2. It might be worth giving that a go

<!-- gh-comment-id:3843097996 --> @snippetsBySam commented on GitHub (Feb 3, 2026): ROCm works well for me now on Ubuntu 24.04.3 after updating the kernel to 6.18.7 then reinstalling ROCm 7.2. It might be worth giving that a go
Author
Owner

@Znuff commented on GitHub (Feb 4, 2026):

ROCm works well for me now on Ubuntu 24.04.3 after updating the kernel to 6.18.7 then reinstalling ROCm 7.2. It might be worth giving that a go

Which ppa have you tried to use? The -HWE images do not come with a 6.18 so far. And I believe that Ubuntu 26.04 (so the next HWE kernels) is waiting for 7.0 release.

<!-- gh-comment-id:3844885947 --> @Znuff commented on GitHub (Feb 4, 2026): > ROCm works well for me now on Ubuntu 24.04.3 after updating the kernel to 6.18.7 then reinstalling ROCm 7.2. It might be worth giving that a go Which ppa have you tried to use? The `-HWE` images do not come with a 6.18 so far. And I believe that Ubuntu 26.04 (so the next HWE kernels) is waiting for 7.0 release.
Author
Owner

@aklofas commented on GitHub (Mar 1, 2026):

Confirming on Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151)

Same issue. Ollama 0.17.4, Linux Mint, kernel 6.17.0-14, system ROCm 7.2.0.

Root cause detail: The crash occurs specifically during rocblas_initialize — the GPU is detected (1 ROCm device found) but the bundled ROCm 6.3 rocBLAS GPU kernels fault when executed:

ggml_cuda_init: found 1 ROCm devices:
ggml_cuda_init: initializing rocBLAS on device 0
Memory access fault by GPU node-1 (Agent handle: 0x...) on address 0x.... Reason: Page not present or supervisor privilege.

The bundled ROCm 6.3 libraries can't be replaced with system ROCm 7.2 due to soname mismatches (libamdhip64.so.6 vs .so.7, librocblas.so.4 vs .so.5). Even pointing rocBLAS at system 7.2 kernel files via ROCBLAS_TENSILE_LIBPATH doesn't help — the fault is in how the bundled librocblas.so.4 maps GPU memory, not just the .hsaco kernel files.

hsa_init() succeeds with the bundled libraries (both as root and as the ollama user), confirming the issue is specifically in rocBLAS GPU kernel execution, not basic HSA/KFD communication.

Workaround: OLLAMA_VULKAN=1 works — all layers on GPU, ~96GB UMA available. Also set HIP_VISIBLE_DEVICES=-1 to skip the ROCm probe crash entirely.

What won't fix it:

  • Updating ollama (0.17.0 → 0.17.4 — same bundled ROCm 6.3)
  • HSA_OVERRIDE_GFX_VERSION=11.0.0 (native gfx1151 kernels exist, doesn't help)
  • LD_LIBRARY_PATH / LD_PRELOAD with system ROCm 7.2 (ABI mismatch, segfaults)

Proper fix: Ollama needs to bundle ROCm 7.x libraries, or provide a way to use system ROCm installations.

<!-- gh-comment-id:3979383978 --> @aklofas commented on GitHub (Mar 1, 2026): **Confirming on Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151)** Same issue. Ollama 0.17.4, Linux Mint, kernel 6.17.0-14, system ROCm 7.2.0. **Root cause detail:** The crash occurs specifically during `rocblas_initialize` — the GPU is detected (1 ROCm device found) but the bundled ROCm 6.3 rocBLAS GPU kernels fault when executed: ``` ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 Memory access fault by GPU node-1 (Agent handle: 0x...) on address 0x.... Reason: Page not present or supervisor privilege. ``` The bundled ROCm 6.3 libraries can't be replaced with system ROCm 7.2 due to soname mismatches (`libamdhip64.so.6` vs `.so.7`, `librocblas.so.4` vs `.so.5`). Even pointing rocBLAS at system 7.2 kernel files via `ROCBLAS_TENSILE_LIBPATH` doesn't help — the fault is in how the bundled `librocblas.so.4` maps GPU memory, not just the `.hsaco` kernel files. `hsa_init()` succeeds with the bundled libraries (both as root and as the `ollama` user), confirming the issue is specifically in rocBLAS GPU kernel execution, not basic HSA/KFD communication. **Workaround:** `OLLAMA_VULKAN=1` works — all layers on GPU, ~96GB UMA available. Also set `HIP_VISIBLE_DEVICES=-1` to skip the ROCm probe crash entirely. **What won't fix it:** - Updating ollama (0.17.0 → 0.17.4 — same bundled ROCm 6.3) - `HSA_OVERRIDE_GFX_VERSION=11.0.0` (native gfx1151 kernels exist, doesn't help) - `LD_LIBRARY_PATH` / `LD_PRELOAD` with system ROCm 7.2 (ABI mismatch, segfaults) **Proper fix:** Ollama needs to bundle ROCm 7.x libraries, or provide a way to use system ROCm installations.
Author
Owner

@dhiltgen commented on GitHub (Mar 11, 2026):

Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the RC a try and let us know if you run into any problems.

<!-- gh-comment-id:4041987240 --> @dhiltgen commented on GitHub (Mar 11, 2026): Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the [RC a try](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions) and let us know if you run into any problems.
Author
Owner

@aklofas commented on GitHub (Mar 11, 2026):

Tested 0.17.8-rc1 on Strix Halo (Radeon 8060S, gfx1151) — Linux kernel 6.17.0-14-generic, system ROCm 7.2.0.

TL;DR: ROCm backend still page-faults on gfx1151. Same GCVM_L2_PROTECTION_FAULT as with ROCm 6.3. Vulkan workaround continues to work on 0.17.8-rc1.

What I tested

  1. Bundled ROCm 7.2 (default): GPU discovery times out after 30s, falls back to CPU. Kernel log shows page fault from the ollama runner process.

  2. System ROCm 7.2 (LD_LIBRARY_PATH=/opt/rocm-7.2.0/lib): Same page fault — runner spins at 100% CPU after the fault. Not a bundled-vs-system lib issue.

  3. rocminfo (system ROCm 7.2): Works fine — HSA initializes, GPU is detected as gfx1151. So the runtime/detection side is fixed.

Kernel log (identical for both bundled and system ROCm 7.2)

amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32795)
amdgpu 0000:c2:00.0: amdgpu:  Process ollama pid 2948468 thread ollama pid 2948468
amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932
amdgpu 0000:c2:00.0: amdgpu:   Faulty UTCL2 client ID: CPF (0x4)
amdgpu 0000:c2:00.0: amdgpu:   WALKER_ERROR: 0x1
amdgpu 0000:c2:00.0: amdgpu:   PERMISSION_FAULTS: 0x3
amdgpu 0000:c2:00.0: amdgpu:   MAPPING_ERROR: 0x1
amdgpu 0000:c2:00.0: amdgpu:   RW: 0x0

Analysis

The ROCm 7.2 update fixes the HSA runtime initialization (rocminfo works, GPU detected), but the fault occurs during actual compute dispatch — likely in the rocBLAS GPU kernels when ggml-hip tries to run them. The CPF (Command Processor Frontend) client ID and MAPPING_ERROR suggest the GPU kernel is accessing an unmapped address, which points to the VGPR size issue in rocr-runtime not being the full picture for gfx1151.

Environment

  • Hardware: Framework Desktop, AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151, RDNA 3.5 iGPU, UMA)
  • OS: Linux Mint, kernel 6.17.0-14-generic
  • System ROCm: 7.2.0 at /opt/rocm-7.2.0/
  • Ollama: 0.17.8-rc1
  • Workaround: OLLAMA_VULKAN=1 + HIP_VISIBLE_DEVICES=-1 still works (~43 tok/s eval on glm-4.7-flash:q8_0)
<!-- gh-comment-id:4042455057 --> @aklofas commented on GitHub (Mar 11, 2026): Tested **0.17.8-rc1** on Strix Halo (Radeon 8060S, gfx1151) — Linux kernel 6.17.0-14-generic, system ROCm 7.2.0. **TL;DR:** ROCm backend still page-faults on gfx1151. Same `GCVM_L2_PROTECTION_FAULT` as with ROCm 6.3. Vulkan workaround continues to work on 0.17.8-rc1. ### What I tested 1. **Bundled ROCm 7.2 (default):** GPU discovery times out after 30s, falls back to CPU. Kernel log shows page fault from the ollama runner process. 2. **System ROCm 7.2 (`LD_LIBRARY_PATH=/opt/rocm-7.2.0/lib`):** Same page fault — runner spins at 100% CPU after the fault. Not a bundled-vs-system lib issue. 3. **`rocminfo` (system ROCm 7.2):** Works fine — HSA initializes, GPU is detected as `gfx1151`. So the runtime/detection side is fixed. ### Kernel log (identical for both bundled and system ROCm 7.2) ``` amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32795) amdgpu 0000:c2:00.0: amdgpu: Process ollama pid 2948468 thread ollama pid 2948468 amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800932 amdgpu 0000:c2:00.0: amdgpu: Faulty UTCL2 client ID: CPF (0x4) amdgpu 0000:c2:00.0: amdgpu: WALKER_ERROR: 0x1 amdgpu 0000:c2:00.0: amdgpu: PERMISSION_FAULTS: 0x3 amdgpu 0000:c2:00.0: amdgpu: MAPPING_ERROR: 0x1 amdgpu 0000:c2:00.0: amdgpu: RW: 0x0 ``` ### Analysis The ROCm 7.2 update fixes the HSA runtime initialization (rocminfo works, GPU detected), but the fault occurs during actual compute dispatch — likely in the rocBLAS GPU kernels when ggml-hip tries to run them. The `CPF` (Command Processor Frontend) client ID and `MAPPING_ERROR` suggest the GPU kernel is accessing an unmapped address, which points to the VGPR size issue in rocr-runtime not being the full picture for gfx1151. ### Environment - **Hardware:** Framework Desktop, AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151, RDNA 3.5 iGPU, UMA) - **OS:** Linux Mint, kernel 6.17.0-14-generic - **System ROCm:** 7.2.0 at `/opt/rocm-7.2.0/` - **Ollama:** 0.17.8-rc1 - **Workaround:** `OLLAMA_VULKAN=1` + `HIP_VISIBLE_DEVICES=-1` still works (~43 tok/s eval on glm-4.7-flash:q8_0)
Author
Owner

@Znuff commented on GitHub (Mar 27, 2026):

Same as above, Ubuntu 24.04, 6.17.0-19-generic. with rocm 7.2.0, same error still.

On: ollama version is 0.18.3

<!-- gh-comment-id:4139187876 --> @Znuff commented on GitHub (Mar 27, 2026): Same as above, Ubuntu 24.04, `6.17.0-19-generic`. with rocm 7.2.0, same error still. On: `ollama version is 0.18.3`
Author
Owner

@rick-github commented on GitHub (Mar 27, 2026):

AMD recommends linux kernel 6.18.4 or newer for Strix Halo support.

<!-- gh-comment-id:4139202885 --> @rick-github commented on GitHub (Mar 27, 2026): AMD [recommends](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/strixhalo.html#required-kernel-version) linux kernel 6.18.4 or newer for Strix Halo support.
Author
Owner

@Znuff commented on GitHub (Apr 17, 2026):

AMD recommends linux kernel 6.18.4 or newer for Strix Halo support.

Just to add to this: you can now run this, on Ubuntu 24.04, with the HWE Kernel >= 6.17.0-19.19~24.04.2 (current is 6.17.0-20.20~24.04.1), but you also need the amdgpu driver ~ 31.10 (ie: https://repo.radeon.com/amdgpu/31.10/ubuntu -- which the documentation from AMD skips mentioning (it still recommends/mentions only 30.30.x). Thanks to #15420

This finally makes ollama run on ROCm:

time=2026-04-17T03:52:06.036+03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70253.21 pci_id=0000:c5:00.0 type=iGPU total="63.0 GiB" available="62.8 GiB"

Only issue I'm still having is that it only detects 63GiB of VRAM, when I have configured 112GB:

# amd-ttm
# amd-ttm
💻 Current TTM pages limit: 29360128 pages (112.00 GB)
💻 Total system memory: 124.94 GB

...but for now, I'm happy.

<!-- gh-comment-id:4264678489 --> @Znuff commented on GitHub (Apr 17, 2026): > AMD [recommends](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/strixhalo.html#required-kernel-version) linux kernel 6.18.4 or newer for Strix Halo support. Just to add to this: you can **now** run this, on Ubuntu 24.04, with the HWE Kernel `>= 6.17.0-19.19~24.04.2` (current is `6.17.0-20.20~24.04.1`), **but** you also need the `amdgpu` driver ~ `31.10` (ie: `https://repo.radeon.com/amdgpu/31.10/ubuntu` -- which the documentation from AMD skips mentioning (it still recommends/mentions only `30.30.x`). Thanks to #15420 This finally makes **ollama** run on ROCm: ``` time=2026-04-17T03:52:06.036+03:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm driver=70253.21 pci_id=0000:c5:00.0 type=iGPU total="63.0 GiB" available="62.8 GiB" ``` Only issue I'm still having is that it only detects 63GiB of VRAM, when I have configured 112GB: ``` # amd-ttm # amd-ttm 💻 Current TTM pages limit: 29360128 pages (112.00 GB) 💻 Total system memory: 124.94 GB ``` ...but for now, I'm happy.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34706