[GH-ISSUE #8851] Memory access fault by GPU node-1 (Agent handle: 0x5635e3db2590) on address 0x7f189722f000. Reason: Page not present or supervisor privilege. (ollama via docker) #5738

Open
opened 2026-04-12 17:01:33 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @nicoKoehler on GitHub (Feb 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8851

What is the issue?

When running the ollama docker image (specs below), and trying to run any model, ollama itself gives the following error:
ollama runner process has terminated: signal: aborted (core dumped)

The docker container gives a long list of Debug messages, but the core error is:

ollama  | Memory access fault by GPU node-1 (Agent handle: 0x559ee9967590) on address 0x7fe05f439000. Reason: Page not present or supervisor privilege.
ollama  | time=2025-02-05T15:41:26.252Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"

The same setup works fine with my RX 550 because it is too old for rocm, so it defaults to CPU. Such is my assumption anyways

The memory access fault error is not unique to ollama, but also occurs when trying to run the rocm/pytorch containers. I will enquire there as well, but was hoping that folks her perhaps had some ideas.

docker-compose.yml

services:

  ollama:
    image: ollama/ollama:rocm
    container_name: ollama
    privileged: true
    environment:
      HSA_OVERRIDE_GFX_VERSION: 11.0.0
      AMD_SERIALIZE_KERNEL: 3
      HIP_VISIBLE_DEVICES: 0
      OLLAMA_DEBUG: 1
      AMD_LOG_LEVEL: 3
    devices:
      - "/dev/kfd"
      - "/dev/dri"
    security_opt:
      - seccomp:unconfined
    cap_add:
      - SYS_PTRACE
    ipc: host
    group_add:
      - video
    volumes:
      - /home/user/.ollama:/root/.ollama
      - /home/user/ollama/models:/usr/share/ollama
    ports:
      - "11434:11434"

rocm-info

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4800                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16080720(0xf55f50) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16080720(0xf55f50) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16080720(0xf55f50) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6700 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      3072(0xc00) KB                     
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29663(0x73df)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2855                               
  BDFID:                   1024                               
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    12566528(0xbfc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.5.7-0-ga420a45-dirty

Originally created by @nicoKoehler on GitHub (Feb 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8851 ### What is the issue? When running the ollama docker image (specs below), and trying to run any model, ollama itself gives the following error: `ollama runner process has terminated: signal: aborted (core dumped)` The docker container gives a long list of Debug messages, but the core error is: ``` ollama | Memory access fault by GPU node-1 (Agent handle: 0x559ee9967590) on address 0x7fe05f439000. Reason: Page not present or supervisor privilege. ollama | time=2025-02-05T15:41:26.252Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)" ``` The same setup works fine with my RX 550 because it is too old for rocm, so it defaults to CPU. Such is my assumption anyways The memory access fault error is not unique to ollama, but also occurs when trying to run the rocm/pytorch containers. I will enquire there as well, but was hoping that folks her perhaps had some ideas. docker-compose.yml ``` services: ollama: image: ollama/ollama:rocm container_name: ollama privileged: true environment: HSA_OVERRIDE_GFX_VERSION: 11.0.0 AMD_SERIALIZE_KERNEL: 3 HIP_VISIBLE_DEVICES: 0 OLLAMA_DEBUG: 1 AMD_LOG_LEVEL: 3 devices: - "/dev/kfd" - "/dev/dri" security_opt: - seccomp:unconfined cap_add: - SYS_PTRACE ipc: host group_add: - video volumes: - /home/user/.ollama:/root/.ollama - /home/user/ollama/models:/usr/share/ollama ports: - "11434:11434" ``` rocm-info ``` ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz Uuid: CPU-XX Marketing Name: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4800 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 16080720(0xf55f50) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 16080720(0xf55f50) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16080720(0xf55f50) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1100 Uuid: GPU-XX Marketing Name: AMD Radeon RX 6700 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 3072(0xc00) KB L3: 98304(0x18000) KB Chip ID: 29663(0x73df) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2855 BDFID: 1024 Internal Node ID: 1 Compute Unit: 40 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ``` ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version 0.5.7-0-ga420a45-dirty
GiteaMirror added the amdbug labels 2026-04-12 17:01:33 -05:00
Author
Owner

@sunarowicz commented on GitHub (May 2, 2025):

Same problem here.

Ollama version: 0.6.6 (installed in system, not container by official install script)
GPU: iGP 780M in Rzyen 7 78003D (16GB VRAM reserved in BIOS)
ROCm installed
System variables used: HSA_OVERRIDE_GFX_VERSION=11.0.0 AMD_LOG_LEVEL=3 and OLLAMA_DEBUG=1

rocminfo (truncated):

ROCk module version 6.12.12 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.15
Runtime Ext Version:     1.7
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES
...
*******                  
Agent 2                  
*******                  
  Name:                    gfx1036                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH
...

ollama server messages (truncated):

load_tensors: layer   0 assigned to device ROCm0, is_swa = 0
load_tensors: layer   1 assigned to device ROCm0, is_swa = 0
load_tensors: layer   2 assigned to device ROCm0, is_swa = 0
load_tensors: layer   3 assigned to device ROCm0, is_swa = 0
load_tensors: layer   4 assigned to device ROCm0, is_swa = 0
load_tensors: layer   5 assigned to device ROCm0, is_swa = 0
load_tensors: layer   6 assigned to device ROCm0, is_swa = 0
load_tensors: layer   7 assigned to device ROCm0, is_swa = 0
load_tensors: layer   8 assigned to device ROCm0, is_swa = 0
load_tensors: layer   9 assigned to device ROCm0, is_swa = 0
load_tensors: layer  10 assigned to device ROCm0, is_swa = 0
load_tensors: layer  11 assigned to device ROCm0, is_swa = 0
load_tensors: layer  12 assigned to device ROCm0, is_swa = 0
load_tensors: layer  13 assigned to device ROCm0, is_swa = 0
load_tensors: layer  14 assigned to device ROCm0, is_swa = 0
load_tensors: layer  15 assigned to device ROCm0, is_swa = 0
load_tensors: layer  16 assigned to device ROCm0, is_swa = 0
load_tensors: layer  17 assigned to device ROCm0, is_swa = 0
load_tensors: layer  18 assigned to device ROCm0, is_swa = 0
load_tensors: layer  19 assigned to device ROCm0, is_swa = 0
load_tensors: layer  20 assigned to device ROCm0, is_swa = 0
load_tensors: layer  21 assigned to device ROCm0, is_swa = 0
load_tensors: layer  22 assigned to device ROCm0, is_swa = 0
load_tensors: layer  23 assigned to device ROCm0, is_swa = 0
load_tensors: layer  24 assigned to device ROCm0, is_swa = 0
load_tensors: layer  25 assigned to device ROCm0, is_swa = 0
load_tensors: layer  26 assigned to device ROCm0, is_swa = 0
load_tensors: layer  27 assigned to device ROCm0, is_swa = 0
load_tensors: layer  28 assigned to device ROCm0, is_swa = 0
load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type ROCm_Host, using CPU instead
time=2025-05-02T18:12:30.500+02:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
:3:hip_device_runtime.cpp   :634 : 152053957662d us:   hipGetDevice ( 0x70ce877fd71c ) 
:3:hip_device_runtime.cpp   :642 : 152053957671d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_memory.cpp           :845 : 152053957675d us:   hipMemGetInfo ( 0x70ce877fd930, 0x70ce877fd938 ) 
:3:hip_memory.cpp           :869 : 152053957684d us:  hipMemGetInfo: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :634 : 152053957695d us:   hipGetDevice ( 0x70ce877fd5cc ) 
:3:hip_device_runtime.cpp   :642 : 152053957696d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :634 : 152053957697d us:   hipGetDevice ( 0x70ce877fd5cc ) 
:3:hip_device_runtime.cpp   :642 : 152053957698d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_memory.cpp           :703 : 152053957706d us:   hipMalloc ( 0x70ce877fd628, 980104704 ) 
:3:rocdevice.cpp            :2425: 152054017710d us:  Device=0x5d8d5aef70f0, freeMem_ = 0xd7c15da00
:3:hip_memory.cpp           :705 : 152054017732d us:  hipMalloc: Returned hipSuccess : 0x70cd12c00000: duration: 60026d us
:3:hip_device_runtime.cpp   :634 : 152054017748d us:   hipGetDevice ( 0x70ce877fd62c ) 
:3:hip_device_runtime.cpp   :642 : 152054017750d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_memory.cpp           :3310: 152054017755d us:   hipMemset ( 0x70cd1f7cba00, 0, 210 ) 
:3:rocdevice.cpp            :3064: 152054017762d us:  Number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp            :3140: 152054027419d us:  Created SWq=0x70ce9c0d8000 to map on HWq=0x70ce74100000 with size 16384 with priority 1, cooperative: 0
:3:rocdevice.cpp            :3232: 152054027431d us:  acquireQueue refCount: 0x70ce74100000 (1)
:3:devprogram.cpp           :2648: 152054199825d us:  Using Code Object V5.
Memory access fault by GPU node-1 (Agent handle: 0x5d8d5aeee0b0) on address 0x70cee5a30000. Reason: Page not present or supervisor privilege.
time=2025-05-02T18:12:35.513+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"
<!-- gh-comment-id:2847674994 --> @sunarowicz commented on GitHub (May 2, 2025): Same problem here. Ollama version: 0.6.6 (installed in system, not container by official install script) GPU: iGP 780M in Rzyen 7 78003D (16GB VRAM reserved in BIOS) ROCm installed System variables used: HSA_OVERRIDE_GFX_VERSION=11.0.0 AMD_LOG_LEVEL=3 and OLLAMA_DEBUG=1 rocminfo (truncated): ``` ROCk module version 6.12.12 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ... ******* Agent 2 ******* Name: gfx1036 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH ... ``` ollama server messages (truncated): ``` load_tensors: layer 0 assigned to device ROCm0, is_swa = 0 load_tensors: layer 1 assigned to device ROCm0, is_swa = 0 load_tensors: layer 2 assigned to device ROCm0, is_swa = 0 load_tensors: layer 3 assigned to device ROCm0, is_swa = 0 load_tensors: layer 4 assigned to device ROCm0, is_swa = 0 load_tensors: layer 5 assigned to device ROCm0, is_swa = 0 load_tensors: layer 6 assigned to device ROCm0, is_swa = 0 load_tensors: layer 7 assigned to device ROCm0, is_swa = 0 load_tensors: layer 8 assigned to device ROCm0, is_swa = 0 load_tensors: layer 9 assigned to device ROCm0, is_swa = 0 load_tensors: layer 10 assigned to device ROCm0, is_swa = 0 load_tensors: layer 11 assigned to device ROCm0, is_swa = 0 load_tensors: layer 12 assigned to device ROCm0, is_swa = 0 load_tensors: layer 13 assigned to device ROCm0, is_swa = 0 load_tensors: layer 14 assigned to device ROCm0, is_swa = 0 load_tensors: layer 15 assigned to device ROCm0, is_swa = 0 load_tensors: layer 16 assigned to device ROCm0, is_swa = 0 load_tensors: layer 17 assigned to device ROCm0, is_swa = 0 load_tensors: layer 18 assigned to device ROCm0, is_swa = 0 load_tensors: layer 19 assigned to device ROCm0, is_swa = 0 load_tensors: layer 20 assigned to device ROCm0, is_swa = 0 load_tensors: layer 21 assigned to device ROCm0, is_swa = 0 load_tensors: layer 22 assigned to device ROCm0, is_swa = 0 load_tensors: layer 23 assigned to device ROCm0, is_swa = 0 load_tensors: layer 24 assigned to device ROCm0, is_swa = 0 load_tensors: layer 25 assigned to device ROCm0, is_swa = 0 load_tensors: layer 26 assigned to device ROCm0, is_swa = 0 load_tensors: layer 27 assigned to device ROCm0, is_swa = 0 load_tensors: layer 28 assigned to device ROCm0, is_swa = 0 load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type ROCm_Host, using CPU instead time=2025-05-02T18:12:30.500+02:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" :3:hip_device_runtime.cpp :634 : 152053957662d us: hipGetDevice ( 0x70ce877fd71c ) :3:hip_device_runtime.cpp :642 : 152053957671d us: hipGetDevice: Returned hipSuccess : :3:hip_memory.cpp :845 : 152053957675d us: hipMemGetInfo ( 0x70ce877fd930, 0x70ce877fd938 ) :3:hip_memory.cpp :869 : 152053957684d us: hipMemGetInfo: Returned hipSuccess : :3:hip_device_runtime.cpp :634 : 152053957695d us: hipGetDevice ( 0x70ce877fd5cc ) :3:hip_device_runtime.cpp :642 : 152053957696d us: hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :634 : 152053957697d us: hipGetDevice ( 0x70ce877fd5cc ) :3:hip_device_runtime.cpp :642 : 152053957698d us: hipGetDevice: Returned hipSuccess : :3:hip_memory.cpp :703 : 152053957706d us: hipMalloc ( 0x70ce877fd628, 980104704 ) :3:rocdevice.cpp :2425: 152054017710d us: Device=0x5d8d5aef70f0, freeMem_ = 0xd7c15da00 :3:hip_memory.cpp :705 : 152054017732d us: hipMalloc: Returned hipSuccess : 0x70cd12c00000: duration: 60026d us :3:hip_device_runtime.cpp :634 : 152054017748d us: hipGetDevice ( 0x70ce877fd62c ) :3:hip_device_runtime.cpp :642 : 152054017750d us: hipGetDevice: Returned hipSuccess : :3:hip_memory.cpp :3310: 152054017755d us: hipMemset ( 0x70cd1f7cba00, 0, 210 ) :3:rocdevice.cpp :3064: 152054017762d us: Number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4 :3:rocdevice.cpp :3140: 152054027419d us: Created SWq=0x70ce9c0d8000 to map on HWq=0x70ce74100000 with size 16384 with priority 1, cooperative: 0 :3:rocdevice.cpp :3232: 152054027431d us: acquireQueue refCount: 0x70ce74100000 (1) :3:devprogram.cpp :2648: 152054199825d us: Using Code Object V5. Memory access fault by GPU node-1 (Agent handle: 0x5d8d5aeee0b0) on address 0x70cee5a30000. Reason: Page not present or supervisor privilege. time=2025-05-02T18:12:35.513+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)" ```
Author
Owner

@chilman408 commented on GitHub (Jun 24, 2025):

I have the same issue with Ollama in docker running in Unraid with AMD iGPU 780m.

Let me know how I can help with debugging?

<!-- gh-comment-id:3001777740 --> @chilman408 commented on GitHub (Jun 24, 2025): I have the same issue with Ollama in docker running in Unraid with AMD iGPU 780m. Let me know how I can help with debugging?
Author
Owner

@0fflineuser commented on GitHub (Aug 17, 2025):

Same problem here, It used to work.

<!-- gh-comment-id:3194334810 --> @0fflineuser commented on GitHub (Aug 17, 2025): Same problem here, It used to work.
Author
Owner

@0fflineuser commented on GitHub (Aug 17, 2025):

Nevermind, I had HSA_OVERRIDE_GFX_VERSION set to 11.0.0 instead of 10.3.0.
Sry.

<!-- gh-comment-id:3194700395 --> @0fflineuser commented on GitHub (Aug 17, 2025): Nevermind, I had HSA_OVERRIDE_GFX_VERSION set to 11.0.0 instead of 10.3.0. Sry.
Author
Owner

@juanluisbaptiste commented on GitHub (Nov 29, 2025):

Nevermind, I had HSA_OVERRIDE_GFX_VERSION set to 11.0.0 instead of 10.3.0. Sry.

Where do you set that var? I have the same problem on a AMD RYZEN AI MAX+ 395 with a Radeon 8060S video card, every time I launch a query on ollama, X restarts most of the times (other times the pc completely freezes), and on the logs I see the same error.

<!-- gh-comment-id:3591028847 --> @juanluisbaptiste commented on GitHub (Nov 29, 2025): > Nevermind, I had HSA_OVERRIDE_GFX_VERSION set to 11.0.0 instead of 10.3.0. Sry. Where do you set that var? I have the same problem on a AMD RYZEN AI MAX+ 395 with a Radeon 8060S video card, every time I launch a query on ollama, X restarts most of the times (other times the pc completely freezes), and on the logs I see the same error.
Author
Owner

@nicoKoehler commented on GitHub (Nov 29, 2025):

@juanluisbaptiste You can set it either in a docker file, if you are using a container, or set it as an environment variable in the OS you are using.

<!-- gh-comment-id:3591432775 --> @nicoKoehler commented on GitHub (Nov 29, 2025): @juanluisbaptiste You can set it either in a docker file, if you are using a container, or set it as an environment variable in the OS you are using.
Author
Owner

@pcarranza commented on GitHub (Mar 1, 2026):

Thank you all, with this final detail, I managed to run ollama on podman on a Beelink SER9 PRO AMD Pro Ryzen™ AI 9 HX 370, sharing for posterity in case someone else needs this in the future:

After installing the ROCm driver, add my user to render sudo usermod -aG render $USER reboot, then

podman run --pull newer \
	--security-opt label=type:container_runtime_t \
	--group-add keep-groups \
	--replace \
	--device /dev/kfd \
	--device /dev/dri \
	-e HSA_OVERRIDE_GFX_VERSION=10.3.0 \
	-e OLLAMA_HOST=0.0.0.0:11434 \
	-e OLLAMA_MODELS=/root/.ollama/models \
	-e OLLAMA_DEBUG=2 \
	-v ollama:/root/.ollama \
	-p 11434:11434 \
	--name ollama \
	docker.io/ollama/ollama:rocm

Then in the logs:

...
time=2026-03-01T08:37:30.458Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:35687571456 FreeMemory:35088281600 ComputeMajor:16 ComputeMinor:48 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
time=2026-03-01T08:37:30.458Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.059048769s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
time=2026-03-01T08:37:30.458Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]]
time=2026-03-01T08:37:30.458Z level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2026-03-01T08:37:30.458Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.859426026s
time=2026-03-01T08:37:30.458Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1030 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="33.2 GiB" available="32.7 GiB"
time=2026-03-01T08:37:30.458Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="33.2 GiB" default_num_ctx=32768
<!-- gh-comment-id:3979507826 --> @pcarranza commented on GitHub (Mar 1, 2026): Thank you all, with this final detail, I managed to run ollama on podman on a Beelink SER9 PRO AMD Pro Ryzen™ AI 9 HX 370, sharing for posterity in case someone else needs this in the future: After installing the ROCm driver, add my user to render `sudo usermod -aG render $USER` reboot, then ```bash podman run --pull newer \ --security-opt label=type:container_runtime_t \ --group-add keep-groups \ --replace \ --device /dev/kfd \ --device /dev/dri \ -e HSA_OVERRIDE_GFX_VERSION=10.3.0 \ -e OLLAMA_HOST=0.0.0.0:11434 \ -e OLLAMA_MODELS=/root/.ollama/models \ -e OLLAMA_DEBUG=2 \ -v ollama:/root/.ollama \ -p 11434:11434 \ --name ollama \ docker.io/ollama/ollama:rocm ``` Then in the logs: ``` ... time=2026-03-01T08:37:30.458Z level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:35687571456 FreeMemory:35088281600 ComputeMajor:16 ComputeMinor:48 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2026-03-01T08:37:30.458Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=1.059048769s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" time=2026-03-01T08:37:30.458Z level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]] time=2026-03-01T08:37:30.458Z level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0 time=2026-03-01T08:37:30.458Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.859426026s time=2026-03-01T08:37:30.458Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1030 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="33.2 GiB" available="32.7 GiB" time=2026-03-01T08:37:30.458Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="33.2 GiB" default_num_ctx=32768 ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5738