[GH-ISSUE #12472] Model runner has unexpectedly stopped (GPU Hang) - Framework Desktop #34047

Open
opened 2026-04-22 17:16:41 -05:00 by GiteaMirror · 19 comments
Owner

Originally created by @PDGIII on GitHub (Oct 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12472

What is the issue?

Sporadically getting the following error after loading models and entering a prompt:

Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

And my ROCM Drive is loaded:

~$ rocminfo
ROCk module version 6.14.14 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.18
Runtime Ext Version:     1.11
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5187                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32639440(0x1f209d0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    32639440(0x1f209d0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32639440(0x1f209d0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32639440(0x1f209d0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1151                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 5510(0x1586)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2900                               
  BDFID:                   49664                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        2147483647(0x7fffffff)             
    y                        65535(0xffff)                      
    z                        65535(0xffff)                      
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 31                                 
  SDMA engine uCode::      14                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    100663296(0x6000000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    100663296(0x6000000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1151         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
    ISA 2                    
      Name:                    amdgcn-amd-amdhsa--gfx11-generic   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    aie2                               
  Uuid:                    AIE-XX                             
  Marketing Name:          AIE-ML                             
  Vendor Name:             AMD                                
  Feature:                 AGENT_DISPATCH                     
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        1(0x1)                             
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          64(0x40)                           
  Queue Type:              SINGLE                             
  Node:                    0                                  
  Device Type:             DSP                                
  Cache Info:              
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          0(0x0)                             
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            0                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:0                                  
  Memory Properties:       
  Features:                AGENT_DISPATCH
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, COARSE GRAINED
      Size:                    32639440(0x1f209d0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65536(0x10000) KB                  
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32639440(0x1f209d0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*** Done ***             

Relevant log output

sudo journalctl -f -b -u ollama.service
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:203 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5510 unique_id=0
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:237 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:343 msg="amdgpu memory" gpu=0 total="96.0 GiB"
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:344 msg="amdgpu memory" gpu=0 available="95.9 GiB"
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/lib/ollama/rocm"
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.729Z level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/local/lib/ollama/rocm"
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.733Z level=DEBUG source=amd_linux.go:375 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]"
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.733Z level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=0 gpu_type=gfx1151
Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.735Z level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1151 driver=6.14 name=1002:1586 total="96.0 GiB" available="95.9 GiB"
Oct 01 20:22:36 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:36 | 200 |      69.636µs |       127.0.0.1 | HEAD     "/"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.427Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 01 20:22:36 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:36 | 200 |   77.972254ms |       127.0.0.1 | POST     "/api/show"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.537Z level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="31.1 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.1 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.537Z level=DEBUG source=amd_linux.go:492 msg="updating rocm free memory" gpu=0 name=1002:1586 before="95.9 GiB" now="95.9 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.537Z level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.560Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.560Z level=DEBUG source=sched.go:208 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.655Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.num_channels default=3
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.max_upscaling_size default=448
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.rope.scaling.factor default=1
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.no_rope_interval default=4
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.temperature_tuning default=true
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.scale default=0.10000000149011612
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.floor_scale default=8192
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="31.1 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.1 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=amd_linux.go:492 msg="updating rocm free memory" gpu=0 name=1002:1586 before="95.9 GiB" now="95.9 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=server.go:324 msg="adding gpu library" path=/usr/local/lib/ollama/rocm
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=server.go:332 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/rocm]
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 --port 36499"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=server.go:400 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_HOST=0.0.0.0:11434 OLLAMA_DEBUG=1 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm LD_LIBRARY_PATH=/usr/local/lib/ollama/rocm:/usr/local/lib/ollama/rocm:/usr/local/lib/ollama:/usr/local/lib/ollama ROCR_VISIBLE_DEVICES=0
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=-1
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="31.1 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.1 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=DEBUG source=amd_linux.go:492 msg="updating rocm free memory" gpu=0 name=1002:1586 before="95.9 GiB" now="95.9 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=INFO source=server.go:678 msg="system memory" total="31.1 GiB" free="28.6 GiB" free_swap="8.0 GiB"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=INFO source=server.go:686 msg="gpu memory" id=0 available="95.4 GiB" free="95.9 GiB" minimum="457.0 MiB" overhead="0 B"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.666Z level=INFO source=runner.go:1252 msg="starting ollama engine"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.666Z level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:36499"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.669Z level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=INFO source=ggml.go:131 msg="" architecture=llama4 file_type=Q4_K_M name="" description="" num_tensors=1182 num_key_values=45
Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Oct 01 20:22:37 llama ollama[2180]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Oct 01 20:22:37 llama ollama[2180]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Oct 01 20:22:37 llama ollama[2180]: ggml_cuda_init: found 1 ROCm devices:
Oct 01 20:22:37 llama ollama[2180]:   Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
Oct 01 20:22:37 llama ollama[2180]: load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so
Oct 01 20:22:37 llama ollama[2180]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.474Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/rocm
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.474Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.num_channels default=3
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.max_upscaling_size default=448
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.rope.scaling.factor default=1
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.no_rope_interval default=4
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.temperature_tuning default=true
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.scale default=0.10000000149011612
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.floor_scale default=8192
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.784Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2804 splits=1
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2545 splits=2
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="62.3 GiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="554.9 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="433.1 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="10.0 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=backend.go:342 msg="total memory" size="64.0 GiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=581898240U required.CPU.Graph=10485760U required.ROCm0.ID=0 required.ROCm0.Weights="[1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 2595370496U]" required.ROCm0.Cache="[16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 0U]" required.ROCm0.Graph=454146304U
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="95.0 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="433.1 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.812Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.num_channels default=3
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.max_upscaling_size default=448
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.rope.scaling.factor default=1
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.no_rope_interval default=4
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.temperature_tuning default=true
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.scale default=0.10000000149011612
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.floor_scale default=8192
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.954Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2804 splits=1
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2545 splits=2
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="62.3 GiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="554.9 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="433.1 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="10.0 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:342 msg="total memory" size="64.0 GiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=581898240A required.CPU.Graph=10485760A required.ROCm0.ID=0 required.ROCm0.Weights="[1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 2595370496A]" required.ROCm0.Cache="[16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 0U]" required.ROCm0.Graph=454146304A
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="95.0 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="433.1 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=ggml.go:493 msg="offloading output layer to GPU"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="62.3 GiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:315 msg="model weights" device=CPU size="554.9 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="433.1 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.0 MiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:342 msg="total memory" size="64.0 GiB"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=sched.go:470 msg="loaded runners" count=1
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.217Z level=DEBUG source=server.go:1295 msg="model load progress 0.02"
Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.467Z level=DEBUG source=server.go:1295 msg="model load progress 0.04"
Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.718Z level=DEBUG source=server.go:1295 msg="model load progress 0.05"
Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.969Z level=DEBUG source=server.go:1295 msg="model load progress 0.07"
Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.220Z level=DEBUG source=server.go:1295 msg="model load progress 0.09"
Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.471Z level=DEBUG source=server.go:1295 msg="model load progress 0.11"
Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.721Z level=DEBUG source=server.go:1295 msg="model load progress 0.13"
Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.972Z level=DEBUG source=server.go:1295 msg="model load progress 0.15"
Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.223Z level=DEBUG source=server.go:1295 msg="model load progress 0.17"
Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.474Z level=DEBUG source=server.go:1295 msg="model load progress 0.18"
Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.725Z level=DEBUG source=server.go:1295 msg="model load progress 0.20"
Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.976Z level=DEBUG source=server.go:1295 msg="model load progress 0.22"
Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.226Z level=DEBUG source=server.go:1295 msg="model load progress 0.24"
Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.477Z level=DEBUG source=server.go:1295 msg="model load progress 0.26"
Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.728Z level=DEBUG source=server.go:1295 msg="model load progress 0.28"
Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.979Z level=DEBUG source=server.go:1295 msg="model load progress 0.29"
Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.230Z level=DEBUG source=server.go:1295 msg="model load progress 0.31"
Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.481Z level=DEBUG source=server.go:1295 msg="model load progress 0.33"
Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.731Z level=DEBUG source=server.go:1295 msg="model load progress 0.35"
Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.983Z level=DEBUG source=server.go:1295 msg="model load progress 0.37"
Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.233Z level=DEBUG source=server.go:1295 msg="model load progress 0.39"
Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.484Z level=DEBUG source=server.go:1295 msg="model load progress 0.41"
Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.735Z level=DEBUG source=server.go:1295 msg="model load progress 0.44"
Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.986Z level=DEBUG source=server.go:1295 msg="model load progress 0.46"
Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.237Z level=DEBUG source=server.go:1295 msg="model load progress 0.48"
Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.488Z level=DEBUG source=server.go:1295 msg="model load progress 0.50"
Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.739Z level=DEBUG source=server.go:1295 msg="model load progress 0.51"
Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.989Z level=DEBUG source=server.go:1295 msg="model load progress 0.53"
Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.240Z level=DEBUG source=server.go:1295 msg="model load progress 0.55"
Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.491Z level=DEBUG source=server.go:1295 msg="model load progress 0.57"
Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.741Z level=DEBUG source=server.go:1295 msg="model load progress 0.59"
Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.992Z level=DEBUG source=server.go:1295 msg="model load progress 0.61"
Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.243Z level=DEBUG source=server.go:1295 msg="model load progress 0.62"
Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.494Z level=DEBUG source=server.go:1295 msg="model load progress 0.64"
Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.745Z level=DEBUG source=server.go:1295 msg="model load progress 0.66"
Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.995Z level=DEBUG source=server.go:1295 msg="model load progress 0.68"
Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.246Z level=DEBUG source=server.go:1295 msg="model load progress 0.70"
Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.497Z level=DEBUG source=server.go:1295 msg="model load progress 0.72"
Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.748Z level=DEBUG source=server.go:1295 msg="model load progress 0.74"
Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.999Z level=DEBUG source=server.go:1295 msg="model load progress 0.76"
Oct 01 20:22:48 llama ollama[2180]: time=2025-10-01T20:22:48.250Z level=DEBUG source=server.go:1295 msg="model load progress 0.77"
Oct 01 20:22:48 llama ollama[2180]: time=2025-10-01T20:22:48.503Z level=DEBUG source=server.go:1295 msg="model load progress 0.79"
Oct 01 20:22:48 llama ollama[2180]: time=2025-10-01T20:22:48.754Z level=DEBUG source=server.go:1295 msg="model load progress 0.81"
Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.005Z level=DEBUG source=server.go:1295 msg="model load progress 0.83"
Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.255Z level=DEBUG source=server.go:1295 msg="model load progress 0.85"
Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.506Z level=DEBUG source=server.go:1295 msg="model load progress 0.87"
Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.757Z level=DEBUG source=server.go:1295 msg="model load progress 0.88"
Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.008Z level=DEBUG source=server.go:1295 msg="model load progress 0.90"
Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.259Z level=DEBUG source=server.go:1295 msg="model load progress 0.92"
Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.510Z level=DEBUG source=server.go:1295 msg="model load progress 0.94"
Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.760Z level=DEBUG source=server.go:1295 msg="model load progress 0.96"
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.011Z level=DEBUG source=server.go:1295 msg="model load progress 0.98"
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.262Z level=DEBUG source=server.go:1295 msg="model load progress 0.99"
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.403Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=INFO source=server.go:1289 msg="llama runner started in 14.86 seconds"
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=DEBUG source=sched.go:482 msg="finished setting up" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096
Oct 01 20:22:51 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:51 | 200 | 15.082001029s |       127.0.0.1 | POST     "/api/generate"
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=DEBUG source=sched.go:490 msg="context for request finished"
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 duration=5m0s
Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.515Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 refCount=0
Oct 01 20:22:53 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:53 | 200 |      44.612µs |  100.102.200.84 | GET      "/"
Oct 01 20:22:53 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:53 | 200 |       16.36µs |  100.102.200.84 | GET      "/"
Oct 01 20:23:03 llama ollama[2180]: time=2025-10-01T20:23:03.281Z level=DEBUG source=sched.go:580 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5
Oct 01 20:23:03 llama ollama[2180]: time=2025-10-01T20:23:03.307Z level=DEBUG source=server.go:1388 msg="completion request" images=0 prompt=1727 format=""
Oct 01 20:23:03 llama ollama[2180]: time=2025-10-01T20:23:03.338Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=345 used=0 remaining=345
Oct 01 20:23:05 llama ollama[2180]: HW Exception by GPU node-1 (Agent handle: 0x723f9c692ba0) reason :GPU Hang
Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.778Z level=ERROR source=server.go:1459 msg="post predict" error="Post \"http://127.0.0.1:36499/completion\": EOF"
Oct 01 20:23:05 llama ollama[2180]: [GIN] 2025/10/01 - 20:23:05 | 200 |  2.601440815s |       127.0.0.1 | POST     "/api/chat"
Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.778Z level=DEBUG source=sched.go:377 msg="context for request finished" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096
Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.779Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 duration=5m0s
Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.779Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 refCount=0

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.12.3

Originally created by @PDGIII on GitHub (Oct 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12472 ### What is the issue? Sporadically getting the following error after loading models and entering a prompt: `Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details` And my ROCM Drive is loaded: ```shell ~$ rocminfo ROCk module version 6.14.14 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.18 Runtime Ext Version: 1.11 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S Uuid: CPU-XX Marketing Name: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 49152(0xc000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5187 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 32639440(0x1f209d0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 32639440(0x1f209d0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32639440(0x1f209d0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 32639440(0x1f209d0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1151 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB L3: 32768(0x8000) KB Chip ID: 5510(0x1586) ASIC Revision: 0(0x0) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2900 BDFID: 49664 Internal Node ID: 1 Compute Unit: 40 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 31 SDMA engine uCode:: 14 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 100663296(0x6000000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 100663296(0x6000000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1151 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) FBarrier Max Size: 32 ******* Agent 3 ******* Name: aie2 Uuid: AIE-XX Marketing Name: AIE-ML Vendor Name: AMD Feature: AGENT_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 1(0x1) Queue Min Size: 64(0x40) Queue Max Size: 64(0x40) Queue Type: SINGLE Node: 0 Device Type: DSP Cache Info: L2: 2048(0x800) KB L3: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 0(0x0) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 0 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:0 Memory Properties: Features: AGENT_DISPATCH Pool Info: Pool 1 Segment: GLOBAL; FLAGS: KERNARG, COARSE GRAINED Size: 32639440(0x1f209d0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65536(0x10000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:0KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 32639440(0x1f209d0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *** Done *** ``` ### Relevant log output ```shell sudo journalctl -f -b -u ollama.service Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:203 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5510 unique_id=0 Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:237 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:343 msg="amdgpu memory" gpu=0 total="96.0 GiB" Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_linux.go:344 msg="amdgpu memory" gpu=0 available="95.9 GiB" Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.728Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/lib/ollama/rocm" Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.729Z level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/local/lib/ollama/rocm" Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.733Z level=DEBUG source=amd_linux.go:375 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]" Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.733Z level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=0 gpu_type=gfx1151 Oct 01 20:22:16 llama ollama[2180]: time=2025-10-01T20:22:16.735Z level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1151 driver=6.14 name=1002:1586 total="96.0 GiB" available="95.9 GiB" Oct 01 20:22:36 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:36 | 200 | 69.636µs | 127.0.0.1 | HEAD "/" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.427Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 01 20:22:36 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:36 | 200 | 77.972254ms | 127.0.0.1 | POST "/api/show" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.537Z level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="31.1 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.1 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.537Z level=DEBUG source=amd_linux.go:492 msg="updating rocm free memory" gpu=0 name=1002:1586 before="95.9 GiB" now="95.9 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.537Z level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.560Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.560Z level=DEBUG source=sched.go:208 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.655Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.num_channels default=3 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.max_upscaling_size default=448 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.rope.scaling.factor default=1 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.no_rope_interval default=4 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.temperature_tuning default=true Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.scale default=0.10000000149011612 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.floor_scale default=8192 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="31.1 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.1 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=amd_linux.go:492 msg="updating rocm free memory" gpu=0 name=1002:1586 before="95.9 GiB" now="95.9 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=server.go:324 msg="adding gpu library" path=/usr/local/lib/ollama/rocm Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=server.go:332 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/rocm] Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 --port 36499" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.656Z level=DEBUG source=server.go:400 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_HOST=0.0.0.0:11434 OLLAMA_DEBUG=1 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm LD_LIBRARY_PATH=/usr/local/lib/ollama/rocm:/usr/local/lib/ollama/rocm:/usr/local/lib/ollama:/usr/local/lib/ollama ROCR_VISIBLE_DEVICES=0 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=-1 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="31.1 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.1 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=DEBUG source=amd_linux.go:492 msg="updating rocm free memory" gpu=0 name=1002:1586 before="95.9 GiB" now="95.9 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=INFO source=server.go:678 msg="system memory" total="31.1 GiB" free="28.6 GiB" free_swap="8.0 GiB" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.657Z level=INFO source=server.go:686 msg="gpu memory" id=0 available="95.4 GiB" free="95.9 GiB" minimum="457.0 MiB" overhead="0 B" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.666Z level=INFO source=runner.go:1252 msg="starting ollama engine" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.666Z level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:36499" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.669Z level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=INFO source=ggml.go:131 msg="" architecture=llama4 file_type=Q4_K_M name="" description="" num_tensors=1182 num_key_values=45 Oct 01 20:22:36 llama ollama[2180]: time=2025-10-01T20:22:36.710Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama Oct 01 20:22:37 llama ollama[2180]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Oct 01 20:22:37 llama ollama[2180]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Oct 01 20:22:37 llama ollama[2180]: ggml_cuda_init: found 1 ROCm devices: Oct 01 20:22:37 llama ollama[2180]: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 Oct 01 20:22:37 llama ollama[2180]: load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so Oct 01 20:22:37 llama ollama[2180]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.474Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/rocm Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.474Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.num_channels default=3 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.max_upscaling_size default=448 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.rope.scaling.factor default=1 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.no_rope_interval default=4 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.temperature_tuning default=true Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.scale default=0.10000000149011612 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.477Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.floor_scale default=8192 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.784Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2804 splits=1 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2545 splits=2 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="62.3 GiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="554.9 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="433.1 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.785Z level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="10.0 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=backend.go:342 msg="total memory" size="64.0 GiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=581898240U required.CPU.Graph=10485760U required.ROCm0.ID=0 required.ROCm0.Weights="[1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1246535936U 1246535936U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 1430364416U 2595370496U]" required.ROCm0.Cache="[16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 0U]" required.ROCm0.Graph=454146304U Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="95.0 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="433.1 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.786Z level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.812Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.num_channels default=3 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.vision.max_upscaling_size default=448 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.rope.scaling.factor default=1 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.no_rope_interval default=4 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.temperature_tuning default=true Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.scale default=0.10000000149011612 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.816Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.attention.floor_scale default=8192 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.954Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2804 splits=1 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2545 splits=2 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="62.3 GiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="554.9 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="433.1 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="10.0 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=backend.go:342 msg="total memory" size="64.0 GiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=581898240A required.CPU.Graph=10485760A required.ROCm0.ID=0 required.ROCm0.Weights="[1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1246535936A 1246535936A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 1430364416A 2595370496A]" required.ROCm0.Cache="[16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 0U]" required.ROCm0.Graph=454146304A Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.965Z level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="95.0 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="433.1 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=ggml.go:493 msg="offloading output layer to GPU" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="62.3 GiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:315 msg="model weights" device=CPU size="554.9 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="433.1 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:337 msg="compute graph" device=CPU size="10.0 MiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=backend.go:342 msg="total memory" size="64.0 GiB" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=sched.go:470 msg="loaded runners" count=1 Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" Oct 01 20:22:37 llama ollama[2180]: time=2025-10-01T20:22:37.966Z level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.217Z level=DEBUG source=server.go:1295 msg="model load progress 0.02" Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.467Z level=DEBUG source=server.go:1295 msg="model load progress 0.04" Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.718Z level=DEBUG source=server.go:1295 msg="model load progress 0.05" Oct 01 20:22:38 llama ollama[2180]: time=2025-10-01T20:22:38.969Z level=DEBUG source=server.go:1295 msg="model load progress 0.07" Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.220Z level=DEBUG source=server.go:1295 msg="model load progress 0.09" Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.471Z level=DEBUG source=server.go:1295 msg="model load progress 0.11" Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.721Z level=DEBUG source=server.go:1295 msg="model load progress 0.13" Oct 01 20:22:39 llama ollama[2180]: time=2025-10-01T20:22:39.972Z level=DEBUG source=server.go:1295 msg="model load progress 0.15" Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.223Z level=DEBUG source=server.go:1295 msg="model load progress 0.17" Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.474Z level=DEBUG source=server.go:1295 msg="model load progress 0.18" Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.725Z level=DEBUG source=server.go:1295 msg="model load progress 0.20" Oct 01 20:22:40 llama ollama[2180]: time=2025-10-01T20:22:40.976Z level=DEBUG source=server.go:1295 msg="model load progress 0.22" Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.226Z level=DEBUG source=server.go:1295 msg="model load progress 0.24" Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.477Z level=DEBUG source=server.go:1295 msg="model load progress 0.26" Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.728Z level=DEBUG source=server.go:1295 msg="model load progress 0.28" Oct 01 20:22:41 llama ollama[2180]: time=2025-10-01T20:22:41.979Z level=DEBUG source=server.go:1295 msg="model load progress 0.29" Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.230Z level=DEBUG source=server.go:1295 msg="model load progress 0.31" Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.481Z level=DEBUG source=server.go:1295 msg="model load progress 0.33" Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.731Z level=DEBUG source=server.go:1295 msg="model load progress 0.35" Oct 01 20:22:42 llama ollama[2180]: time=2025-10-01T20:22:42.983Z level=DEBUG source=server.go:1295 msg="model load progress 0.37" Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.233Z level=DEBUG source=server.go:1295 msg="model load progress 0.39" Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.484Z level=DEBUG source=server.go:1295 msg="model load progress 0.41" Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.735Z level=DEBUG source=server.go:1295 msg="model load progress 0.44" Oct 01 20:22:43 llama ollama[2180]: time=2025-10-01T20:22:43.986Z level=DEBUG source=server.go:1295 msg="model load progress 0.46" Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.237Z level=DEBUG source=server.go:1295 msg="model load progress 0.48" Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.488Z level=DEBUG source=server.go:1295 msg="model load progress 0.50" Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.739Z level=DEBUG source=server.go:1295 msg="model load progress 0.51" Oct 01 20:22:44 llama ollama[2180]: time=2025-10-01T20:22:44.989Z level=DEBUG source=server.go:1295 msg="model load progress 0.53" Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.240Z level=DEBUG source=server.go:1295 msg="model load progress 0.55" Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.491Z level=DEBUG source=server.go:1295 msg="model load progress 0.57" Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.741Z level=DEBUG source=server.go:1295 msg="model load progress 0.59" Oct 01 20:22:45 llama ollama[2180]: time=2025-10-01T20:22:45.992Z level=DEBUG source=server.go:1295 msg="model load progress 0.61" Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.243Z level=DEBUG source=server.go:1295 msg="model load progress 0.62" Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.494Z level=DEBUG source=server.go:1295 msg="model load progress 0.64" Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.745Z level=DEBUG source=server.go:1295 msg="model load progress 0.66" Oct 01 20:22:46 llama ollama[2180]: time=2025-10-01T20:22:46.995Z level=DEBUG source=server.go:1295 msg="model load progress 0.68" Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.246Z level=DEBUG source=server.go:1295 msg="model load progress 0.70" Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.497Z level=DEBUG source=server.go:1295 msg="model load progress 0.72" Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.748Z level=DEBUG source=server.go:1295 msg="model load progress 0.74" Oct 01 20:22:47 llama ollama[2180]: time=2025-10-01T20:22:47.999Z level=DEBUG source=server.go:1295 msg="model load progress 0.76" Oct 01 20:22:48 llama ollama[2180]: time=2025-10-01T20:22:48.250Z level=DEBUG source=server.go:1295 msg="model load progress 0.77" Oct 01 20:22:48 llama ollama[2180]: time=2025-10-01T20:22:48.503Z level=DEBUG source=server.go:1295 msg="model load progress 0.79" Oct 01 20:22:48 llama ollama[2180]: time=2025-10-01T20:22:48.754Z level=DEBUG source=server.go:1295 msg="model load progress 0.81" Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.005Z level=DEBUG source=server.go:1295 msg="model load progress 0.83" Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.255Z level=DEBUG source=server.go:1295 msg="model load progress 0.85" Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.506Z level=DEBUG source=server.go:1295 msg="model load progress 0.87" Oct 01 20:22:49 llama ollama[2180]: time=2025-10-01T20:22:49.757Z level=DEBUG source=server.go:1295 msg="model load progress 0.88" Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.008Z level=DEBUG source=server.go:1295 msg="model load progress 0.90" Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.259Z level=DEBUG source=server.go:1295 msg="model load progress 0.92" Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.510Z level=DEBUG source=server.go:1295 msg="model load progress 0.94" Oct 01 20:22:50 llama ollama[2180]: time=2025-10-01T20:22:50.760Z level=DEBUG source=server.go:1295 msg="model load progress 0.96" Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.011Z level=DEBUG source=server.go:1295 msg="model load progress 0.98" Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.262Z level=DEBUG source=server.go:1295 msg="model load progress 0.99" Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.403Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama4.pooling_type default=0 Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=INFO source=server.go:1289 msg="llama runner started in 14.86 seconds" Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=DEBUG source=sched.go:482 msg="finished setting up" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 Oct 01 20:22:51 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:51 | 200 | 15.082001029s | 127.0.0.1 | POST "/api/generate" Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=DEBUG source=sched.go:490 msg="context for request finished" Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.514Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 duration=5m0s Oct 01 20:22:51 llama ollama[2180]: time=2025-10-01T20:22:51.515Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 refCount=0 Oct 01 20:22:53 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:53 | 200 | 44.612µs | 100.102.200.84 | GET "/" Oct 01 20:22:53 llama ollama[2180]: [GIN] 2025/10/01 - 20:22:53 | 200 | 16.36µs | 100.102.200.84 | GET "/" Oct 01 20:23:03 llama ollama[2180]: time=2025-10-01T20:23:03.281Z level=DEBUG source=sched.go:580 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 Oct 01 20:23:03 llama ollama[2180]: time=2025-10-01T20:23:03.307Z level=DEBUG source=server.go:1388 msg="completion request" images=0 prompt=1727 format="" Oct 01 20:23:03 llama ollama[2180]: time=2025-10-01T20:23:03.338Z level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=345 used=0 remaining=345 Oct 01 20:23:05 llama ollama[2180]: HW Exception by GPU node-1 (Agent handle: 0x723f9c692ba0) reason :GPU Hang Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.778Z level=ERROR source=server.go:1459 msg="post predict" error="Post \"http://127.0.0.1:36499/completion\": EOF" Oct 01 20:23:05 llama ollama[2180]: [GIN] 2025/10/01 - 20:23:05 | 200 | 2.601440815s | 127.0.0.1 | POST "/api/chat" Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.778Z level=DEBUG source=sched.go:377 msg="context for request finished" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.779Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 duration=5m0s Oct 01 20:23:05 llama ollama[2180]: time=2025-10-01T20:23:05.779Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/llama4:latest runner.inference=rocm runner.devices=1 runner.size="64.0 GiB" runner.vram="64.0 GiB" runner.parallel=1 runner.pid=2217 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-9d507a36062c2845dd3bb3e93364e9abc1607118acd8650727a700f72fb126e5 runner.num_ctx=4096 refCount=0 ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.12.3
GiteaMirror added the amdbug labels 2026-04-22 17:16:41 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 1, 2025):

Oct 01 20:23:05 llama ollama[2180]: HW Exception by GPU node-1 (Agent handle: 0x723f9c692ba0) reason :GPU Hang

This seems to be a problem with the Radeon 8060S on Linux systems. I have seen several workarounds suggested but none have worked so far. It looks like a kernel/driver issue.

<!-- gh-comment-id:3358311116 --> @rick-github commented on GitHub (Oct 1, 2025): ``` Oct 01 20:23:05 llama ollama[2180]: HW Exception by GPU node-1 (Agent handle: 0x723f9c692ba0) reason :GPU Hang ``` This seems to be a problem with the Radeon 8060S on Linux systems. I have seen several workarounds suggested but none have worked so far. It looks like a kernel/driver issue.
Author
Owner

@PDGIII commented on GitHub (Oct 1, 2025):

Oct 01 20:23:05 llama ollama[2180]: HW Exception by GPU node-1 (Agent handle: 0x723f9c692ba0) reason :GPU Hang

This seems to be a problem with the Radeon 8060S on Linux systems. I have seen several workarounds suggested but none have worked so far. It looks like a kernel/driver issue.

Kinda new to working with AMD so if anyone knows something that works, I'm game.

FWIW, I'm using Ubuntu 24.04 with the HWE kernel and installed the ROCM drivers via AMD's guide:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.3 LTS
Release:	24.04
Codename:	noble

$ uname -a
Linux llama 6.14.0-33-generic #33~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 19 17:02:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -l | grep rocm-core
ii  rocm-core                             7.0.1.70001-42~24.04                    amd64        ROCm Runtime software stack
<!-- gh-comment-id:3358338669 --> @PDGIII commented on GitHub (Oct 1, 2025): > ``` > Oct 01 20:23:05 llama ollama[2180]: HW Exception by GPU node-1 (Agent handle: 0x723f9c692ba0) reason :GPU Hang > ``` > > This seems to be a problem with the Radeon 8060S on Linux systems. I have seen several workarounds suggested but none have worked so far. It looks like a kernel/driver issue. Kinda new to working with AMD so if anyone knows something that works, I'm game. FWIW, I'm using Ubuntu 24.04 with the HWE kernel and installed the ROCM drivers via AMD's [guide](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html): ``` $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 24.04.3 LTS Release: 24.04 Codename: noble $ uname -a Linux llama 6.14.0-33-generic #33~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 19 17:02:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux $ dpkg -l | grep rocm-core ii rocm-core 7.0.1.70001-42~24.04 amd64 ROCm Runtime software stack ```
Author
Owner

@PDGIII commented on GitHub (Oct 10, 2025):

FWIW, this is also occuring with version 0.12.5-rc:

ollama -v
ollama version is 0.12.5-rc
<!-- gh-comment-id:3392229776 --> @PDGIII commented on GitHub (Oct 10, 2025): FWIW, this is also occuring with version 0.12.5-rc: ```shell ollama -v ollama version is 0.12.5-rc ```
Author
Owner

@rick-github commented on GitHub (Oct 10, 2025):

I don't think it's an ollama issue, looks like a kernel/driver issue.

<!-- gh-comment-id:3392292686 --> @rick-github commented on GitHub (Oct 10, 2025): I don't think it's an ollama issue, looks like a kernel/driver issue.
Author
Owner

@gavinbarnard commented on GitHub (Oct 11, 2025):

I'm using the ROCm 6.4.4 and have had some similar gpu hang events when loading gpt-oss;120b

I was having issues were the model would take about 14~16 seconds to load. If it loaded in under 15 I could interact w/o issues. If it loaded 15 seconds or later. I'd be accompanied by a GPU Hang message within milliseconds of the runner returning ready.

gavin@evox2:~$ cat /sys/module/amdgpu/parameters/lockup_timeout
60
gavin@evox2:~$ cat /sys/module/amdgpu/parameters/queue_preemption_timeout_ms
60000

Now I'm just fighting a memory leak.

<!-- gh-comment-id:3392836161 --> @gavinbarnard commented on GitHub (Oct 11, 2025): I'm using the ROCm 6.4.4 and have had some similar gpu hang events when loading gpt-oss;120b I was having issues were the model would take about 14~16 seconds to load. If it loaded in under 15 I could interact w/o issues. If it loaded 15 seconds or later. I'd be accompanied by a GPU Hang message within milliseconds of the runner returning ready. ``` gavin@evox2:~$ cat /sys/module/amdgpu/parameters/lockup_timeout 60 gavin@evox2:~$ cat /sys/module/amdgpu/parameters/queue_preemption_timeout_ms 60000 ``` Now I'm just fighting a memory leak.
Author
Owner

@PDGIII commented on GitHub (Oct 11, 2025):

@rick-github I think you're right as I'm seeing this in dmesg:

[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1002
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Failed to evict queue 1
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Failed to evict process queues
[Sat Oct 11 22:19:13 2025] amdgpu: Failed to quiesce KFD
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset begin!
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Dumping IP State
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Dumping IP State Completed
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MODE2 reset
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset succeeded, trying to resume
[Sat Oct 11 22:19:13 2025] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: SMU is resuming...
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: SMU is resumed successfully!
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f2c000000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f32c00000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f4b200000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f69400000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f6aa00000, queue evicted
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x09002600
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset(88) succeeded!
[Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: [drm] device wedged, but recovered through reset

I've got the coredump from: /sys/class/drm/card1/device/devcoredump/data if anyone's interested...

<!-- gh-comment-id:3393707037 --> @PDGIII commented on GitHub (Oct 11, 2025): @rick-github I think you're right as I'm seeing this in dmesg: ```shell [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1002 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Failed to evict queue 1 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Failed to evict process queues [Sat Oct 11 22:19:13 2025] amdgpu: Failed to quiesce KFD [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset begin! [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Dumping IP State [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: Dumping IP State Completed [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: MODE2 reset [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset succeeded, trying to resume [Sat Oct 11 22:19:13 2025] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] AMDGPU device coredump file has been created [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: SMU is resuming... [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: SMU is resumed successfully! [Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f2c000000, queue evicted [Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f32c00000, queue evicted [Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f4b200000, queue evicted [Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f69400000, queue evicted [Sat Oct 11 22:19:13 2025] amdgpu: Freeing queue vital buffer 0x721f6aa00000, queue evicted [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x09002600 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8 [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: amdgpu: GPU reset(88) succeeded! [Sat Oct 11 22:19:13 2025] amdgpu 0000:c2:00.0: [drm] device wedged, but recovered through reset ``` I've got the coredump from: `/sys/class/drm/card1/device/devcoredump/data` if anyone's interested...
Author
Owner

@wszgrcy commented on GitHub (Oct 13, 2025):

I also encountered the same error, but it was not Olama but PyTorch. I guess this problem may be related to ensemble display? (I used 780m)
Probabilistic occurrence

<!-- gh-comment-id:3397210955 --> @wszgrcy commented on GitHub (Oct 13, 2025): I also encountered the same error, but it was not Olama but PyTorch. I guess this problem may be related to ensemble display? (I used 780m) Probabilistic occurrence
Author
Owner

@k3mist commented on GitHub (Oct 13, 2025):

im running into this as well. def seems kernel/driver related

[24682.030485] amdxdna 0000:c4:00.1: amdxdna_fw_log_init: Failed to allocate fw log buffer of size: 0x400000
[24682.030672] amdxdna 0000:c4:00.1: amdxdna_fw_log_resume: Failed to enable firmware logging: -12
[24682.030678] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -12
[24682.030685] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -22
[24682.031097] amdxdna 0000:c4:00.1: device is already started
[24682.032489] amdxdna 0000:c4:00.1: amdxdna_fw_log_init: Failed to allocate fw log buffer of size: 0x400000
[24682.032672] amdxdna 0000:c4:00.1: amdxdna_fw_log_resume: Failed to enable firmware logging: -12
[24682.032674] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -12
[24682.032677] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -22
[24682.034271] amdxdna 0000:c4:00.1: device is already started
[24682.035115] amdxdna 0000:c4:00.1: amdxdna_fw_log_init: Failed to allocate fw log buffer of size: 0x400000
[24682.035226] amdxdna 0000:c4:00.1: amdxdna_fw_log_resume: Failed to enable firmware logging: -12
[24682.035228] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -12
[24682.035245] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -22
[24738.120118] amdxdna 0000:c4:00.1: device is already started
[24906.269424] amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[24906.269579] amdgpu 0000:c3:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1002
[24906.269685] amdgpu 0000:c3:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
[24906.269781] amdgpu 0000:c3:00.0: amdgpu: Failed to evict queue 1
[24906.269957] amdgpu 0000:c3:00.0: amdgpu: Failed to evict process queues
[24906.270180] amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
[24906.270259] amdgpu 0000:c3:00.0: amdgpu: Dumping IP State
[24906.271846] amdgpu 0000:c3:00.0: amdgpu: Dumping IP State Completed
[24906.339472] amdgpu: Freeing queue vital buffer 0x7e61da200000, queue evicted
[24906.339481] amdgpu: Freeing queue vital buffer 0x7e665d400000, queue evicted
[24906.339483] amdgpu: Freeing queue vital buffer 0x7e666d000000, queue evicted
[24906.339485] amdgpu: Freeing queue vital buffer 0x7e6675600000, queue evicted
[24906.339487] amdgpu: Freeing queue vital buffer 0x7e6676c00000, queue evicted
[24907.311429] gmc_v11_0_process_interrupt: 2 callbacks suppressed
[24907.311440] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0)
[24907.311459] amdgpu 0000:c3:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[24907.311464] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B53
[24907.311468] amdgpu 0000:c3:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
[24907.311471] amdgpu 0000:c3:00.0: amdgpu:      MORE_FAULTS: 0x1
[24907.311474] amdgpu 0000:c3:00.0: amdgpu:      WALKER_ERROR: 0x1
[24907.311476] amdgpu 0000:c3:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[24907.311478] amdgpu 0000:c3:00.0: amdgpu:      MAPPING_ERROR: 0x1
[24907.311480] amdgpu 0000:c3:00.0: amdgpu:      RW: 0x1
[24907.311491] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0)
[24907.311494] amdgpu 0000:c3:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[24908.949805] amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=SUSPEND
[24908.949814] [drm:amdgpu_mes_suspend [amdgpu]] *ERROR* failed to suspend all gangs
[24908.950019] amdgpu 0000:c3:00.0: amdgpu: suspend of IP block <mes_v11_0> failed -110
[24909.316799] amdgpu 0000:c3:00.0: amdgpu: MODE2 reset
[24909.351394] amdgpu 0000:c3:00.0: amdgpu: GPU reset succeeded, trying to resume
[24909.351901] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[24909.351993] amdgpu 0000:c3:00.0: amdgpu: SMU is resuming...
[24909.367333] amdgpu 0000:c3:00.0: amdgpu: SMU is resumed successfully!
[24909.382967] [drm] DMUB hardware initialized: version=0x09000F00
[24909.412101] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[24909.412104] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[24909.412104] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[24909.412105] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[24909.412106] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[24909.412106] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[24909.412107] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[24909.412108] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[24909.412108] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[24909.412109] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[24909.412110] amdgpu 0000:c3:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[24909.412110] amdgpu 0000:c3:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[24909.412111] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8
[24909.412112] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8
[24909.412113] amdgpu 0000:c3:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[24909.412113] amdgpu 0000:c3:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8
[24909.452320] amdgpu 0000:c3:00.0: amdgpu: GPU reset(5) succeeded!
<!-- gh-comment-id:3399052724 --> @k3mist commented on GitHub (Oct 13, 2025): im running into this as well. def seems kernel/driver related ``` [24682.030485] amdxdna 0000:c4:00.1: amdxdna_fw_log_init: Failed to allocate fw log buffer of size: 0x400000 [24682.030672] amdxdna 0000:c4:00.1: amdxdna_fw_log_resume: Failed to enable firmware logging: -12 [24682.030678] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -12 [24682.030685] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -22 [24682.031097] amdxdna 0000:c4:00.1: device is already started [24682.032489] amdxdna 0000:c4:00.1: amdxdna_fw_log_init: Failed to allocate fw log buffer of size: 0x400000 [24682.032672] amdxdna 0000:c4:00.1: amdxdna_fw_log_resume: Failed to enable firmware logging: -12 [24682.032674] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -12 [24682.032677] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -22 [24682.034271] amdxdna 0000:c4:00.1: device is already started [24682.035115] amdxdna 0000:c4:00.1: amdxdna_fw_log_init: Failed to allocate fw log buffer of size: 0x400000 [24682.035226] amdxdna 0000:c4:00.1: amdxdna_fw_log_resume: Failed to enable firmware logging: -12 [24682.035228] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -12 [24682.035245] amdxdna 0000:c4:00.1: amdxdna_pm_resume_get: Resume failed: -22 [24738.120118] amdxdna 0000:c4:00.1: device is already started [24906.269424] amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE [24906.269579] amdgpu 0000:c3:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1002 [24906.269685] amdgpu 0000:c3:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset [24906.269781] amdgpu 0000:c3:00.0: amdgpu: Failed to evict queue 1 [24906.269957] amdgpu 0000:c3:00.0: amdgpu: Failed to evict process queues [24906.270180] amdgpu 0000:c3:00.0: amdgpu: GPU reset begin! [24906.270259] amdgpu 0000:c3:00.0: amdgpu: Dumping IP State [24906.271846] amdgpu 0000:c3:00.0: amdgpu: Dumping IP State Completed [24906.339472] amdgpu: Freeing queue vital buffer 0x7e61da200000, queue evicted [24906.339481] amdgpu: Freeing queue vital buffer 0x7e665d400000, queue evicted [24906.339483] amdgpu: Freeing queue vital buffer 0x7e666d000000, queue evicted [24906.339485] amdgpu: Freeing queue vital buffer 0x7e6675600000, queue evicted [24906.339487] amdgpu: Freeing queue vital buffer 0x7e6676c00000, queue evicted [24907.311429] gmc_v11_0_process_interrupt: 2 callbacks suppressed [24907.311440] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) [24907.311459] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [24907.311464] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B53 [24907.311468] amdgpu 0000:c3:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [24907.311471] amdgpu 0000:c3:00.0: amdgpu: MORE_FAULTS: 0x1 [24907.311474] amdgpu 0000:c3:00.0: amdgpu: WALKER_ERROR: 0x1 [24907.311476] amdgpu 0000:c3:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [24907.311478] amdgpu 0000:c3:00.0: amdgpu: MAPPING_ERROR: 0x1 [24907.311480] amdgpu 0000:c3:00.0: amdgpu: RW: 0x1 [24907.311491] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) [24907.311494] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [24908.949805] amdgpu 0000:c3:00.0: amdgpu: MES failed to respond to msg=SUSPEND [24908.949814] [drm:amdgpu_mes_suspend [amdgpu]] *ERROR* failed to suspend all gangs [24908.950019] amdgpu 0000:c3:00.0: amdgpu: suspend of IP block <mes_v11_0> failed -110 [24909.316799] amdgpu 0000:c3:00.0: amdgpu: MODE2 reset [24909.351394] amdgpu 0000:c3:00.0: amdgpu: GPU reset succeeded, trying to resume [24909.351901] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). [24909.351993] amdgpu 0000:c3:00.0: amdgpu: SMU is resuming... [24909.367333] amdgpu 0000:c3:00.0: amdgpu: SMU is resumed successfully! [24909.382967] [drm] DMUB hardware initialized: version=0x09000F00 [24909.412101] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [24909.412104] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [24909.412104] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [24909.412105] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [24909.412106] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [24909.412106] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [24909.412107] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [24909.412108] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 [24909.412108] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 [24909.412109] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [24909.412110] amdgpu 0000:c3:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8 [24909.412110] amdgpu 0000:c3:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8 [24909.412111] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 4 on hub 8 [24909.412112] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 6 on hub 8 [24909.412113] amdgpu 0000:c3:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0 [24909.412113] amdgpu 0000:c3:00.0: amdgpu: ring vpe uses VM inv eng 7 on hub 8 [24909.452320] amdgpu 0000:c3:00.0: amdgpu: GPU reset(5) succeeded! ```
Author
Owner

@k3mist commented on GitHub (Oct 13, 2025):

a workaround for this (at least appears to be working so far 🤞) is GRUB_CMDLINE_LINUX="amdgpu.dc=0". this will disable the display core (eg; no functioning display/monitor). this works for me because im running my framework desktop as a server.

prior to adding that to grub i was only getting 5-7 prompts in before crash, consistently. i think im well above 40+ at this point in the same ollama run session.

edit; ran fine last night for ~5 hours straight.

for context;

-> % ollama --version
ollama version is 0.12.5
-> % xrt-smi examine  
System Configuration
  OS Name              : Linux
  Release              : 6.14.0-33-generic
  Machine              : x86_64
  CPU Cores            : 32
  Memory               : 31874 MB
  Distribution         : Ubuntu 24.04.3 LTS
  GLIBC                : 2.39
  Model                : Desktop (AMD Ryzen AI Max 300 Series)
  BIOS Vendor          : INSYDE Corp.
  BIOS Version         : 03.02
  Processor            : AMD RYZEN AI MAX+ 395 w/ Radeon 8060S

XRT
  Version              : 2.20.0
  Branch               : HEAD
  Hash                 : 9b916190ad2865a7e7e497155c66b3b4940f0add
  Hash Date            : 2025-10-06 13:55:12
  amdxdna              : 2.20.0_20251006, 6e2c124330b8644f89635a2169e15cf6f217f0cd
  virtio-pci           : unknown, unknown
  NPU Firmware Version : 255.0.5.35

Device(s) Present
|BDF             |Name            |
|----------------|----------------|
|[0000:c4:00.1]  |NPU Strix Halo  |
<!-- gh-comment-id:3399341115 --> @k3mist commented on GitHub (Oct 13, 2025): a workaround for this (at least appears to be working so far 🤞) is `GRUB_CMDLINE_LINUX="amdgpu.dc=0"`. this will disable the display core (eg; no functioning display/monitor). this works for me because im running my framework desktop as a server. prior to adding that to grub i was only getting 5-7 prompts in before crash, consistently. i think im well above 40+ at this point in the same `ollama run` session. edit; ran fine last night for ~5 hours straight. for context; ``` -> % ollama --version ollama version is 0.12.5 ``` ``` -> % xrt-smi examine System Configuration OS Name : Linux Release : 6.14.0-33-generic Machine : x86_64 CPU Cores : 32 Memory : 31874 MB Distribution : Ubuntu 24.04.3 LTS GLIBC : 2.39 Model : Desktop (AMD Ryzen AI Max 300 Series) BIOS Vendor : INSYDE Corp. BIOS Version : 03.02 Processor : AMD RYZEN AI MAX+ 395 w/ Radeon 8060S XRT Version : 2.20.0 Branch : HEAD Hash : 9b916190ad2865a7e7e497155c66b3b4940f0add Hash Date : 2025-10-06 13:55:12 amdxdna : 2.20.0_20251006, 6e2c124330b8644f89635a2169e15cf6f217f0cd virtio-pci : unknown, unknown NPU Firmware Version : 255.0.5.35 Device(s) Present |BDF |Name | |----------------|----------------| |[0000:c4:00.1] |NPU Strix Halo | ```
Author
Owner

@wszgrcy commented on GitHub (Oct 15, 2025):

a workaround for this (at least appears to be working so far 🤞) is GRUB_CMDLINE_LINUX="amdgpu.dc=0". this will disable the display core (eg; no functioning display/monitor). this works for me because im running my framework desktop as a server.

prior to adding that to grub i was only getting 5-7 prompts in before crash, consistently. i think im well above 40+ at this point in the same ollama run session.

edit; ran fine last night for ~5 hours straight.

for context;

-> % ollama --version
ollama version is 0.12.5
-> % xrt-smi examine  
System Configuration
  OS Name              : Linux
  Release              : 6.14.0-33-generic
  Machine              : x86_64
  CPU Cores            : 32
  Memory               : 31874 MB
  Distribution         : Ubuntu 24.04.3 LTS
  GLIBC                : 2.39
  Model                : Desktop (AMD Ryzen AI Max 300 Series)
  BIOS Vendor          : INSYDE Corp.
  BIOS Version         : 03.02
  Processor            : AMD RYZEN AI MAX+ 395 w/ Radeon 8060S

XRT
  Version              : 2.20.0
  Branch               : HEAD
  Hash                 : 9b916190ad2865a7e7e497155c66b3b4940f0add
  Hash Date            : 2025-10-06 13:55:12
  amdxdna              : 2.20.0_20251006, 6e2c124330b8644f89635a2169e15cf6f217f0cd
  virtio-pci           : unknown, unknown
  NPU Firmware Version : 255.0.5.35

Device(s) Present
|BDF             |Name            |
|----------------|----------------|
|[0000:c4:00.1]  |NPU Strix Halo  |

After testing, I found that it didn't work (still reported an error)
I use the GPU process
Physical machine 8745H=>PVE 9=>gpu passthrough=> Ubuntu 24.04=>Docker
And change gtt size

GRUB_CMDLINE_LINUX="amd_iommu=off amdgpu.gttsize=46080 ttm.pages_limit=11796480 ttm.page_pool_size=11796480 amdgpu.dc=0"
update-grub
I think it may be because there are too many intermediate links, making it difficult to identify which step it is except for the problem

<!-- gh-comment-id:3406441581 --> @wszgrcy commented on GitHub (Oct 15, 2025): > a workaround for this (at least appears to be working so far 🤞) is `GRUB_CMDLINE_LINUX="amdgpu.dc=0"`. this will disable the display core (eg; no functioning display/monitor). this works for me because im running my framework desktop as a server. > > prior to adding that to grub i was only getting 5-7 prompts in before crash, consistently. i think im well above 40+ at this point in the same `ollama run` session. > > edit; ran fine last night for ~5 hours straight. > > for context; > > ``` > -> % ollama --version > ollama version is 0.12.5 > ``` > > ``` > -> % xrt-smi examine > System Configuration > OS Name : Linux > Release : 6.14.0-33-generic > Machine : x86_64 > CPU Cores : 32 > Memory : 31874 MB > Distribution : Ubuntu 24.04.3 LTS > GLIBC : 2.39 > Model : Desktop (AMD Ryzen AI Max 300 Series) > BIOS Vendor : INSYDE Corp. > BIOS Version : 03.02 > Processor : AMD RYZEN AI MAX+ 395 w/ Radeon 8060S > > XRT > Version : 2.20.0 > Branch : HEAD > Hash : 9b916190ad2865a7e7e497155c66b3b4940f0add > Hash Date : 2025-10-06 13:55:12 > amdxdna : 2.20.0_20251006, 6e2c124330b8644f89635a2169e15cf6f217f0cd > virtio-pci : unknown, unknown > NPU Firmware Version : 255.0.5.35 > > Device(s) Present > |BDF |Name | > |----------------|----------------| > |[0000:c4:00.1] |NPU Strix Halo | > ``` After testing, I found that it didn't work (still reported an error) I use the GPU process Physical machine 8745H=>PVE 9=>gpu passthrough=> Ubuntu 24.04=>Docker And change gtt size >GRUB_CMDLINE_LINUX="amd_iommu=off amdgpu.gttsize=46080 ttm.pages_limit=11796480 ttm.page_pool_size=11796480 amdgpu.dc=0" > update-grub I think it may be because there are too many intermediate links, making it difficult to identify which step it is except for the problem
Author
Owner

@k3mist commented on GitHub (Oct 15, 2025):

@wszgrcy sorry you're still experiencing that. ive only had gpu hangs messing with large context length in ollama since i set that in grub. its likely theres still underlying issues with amd stuff but the only thing on my grub line is amdgpu.dc=0 and im just letting the driver handle the rest, like gtt (for now).

GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="amdgpu.dc=0"

im also running ollama directly on the machine via systemd for performance reasons. im currently testing with this config in ollama systemd service.

Environment="OLLAMA_CONTEXT_LENGTH=66560"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_FLASH_ATTENTION=1"

maybe the proxmax > passthrough > ubuntu > docker > ollama is the issue?
thats a lot of layers between the gpu and ollama and something may be happening in between them

your hardware is also very different and technically speaking you would be using a slightly different driver path at the kernel level, for instance my amd driver path is gfx1151

-> % rocminfo | grep gfx
  Name:                    gfx1151                            
      Name:                    amdgcn-amd-amdhsa--gfx1151         
      Name:                    amdgcn-amd-amdhsa--gfx11-generic  
<!-- gh-comment-id:3406938825 --> @k3mist commented on GitHub (Oct 15, 2025): @wszgrcy sorry you're still experiencing that. ive only had gpu hangs messing with large context length in ollama since i set that in grub. its likely theres still underlying issues with amd stuff but the only thing on my grub line is `amdgpu.dc=0` and im just letting the driver handle the rest, like gtt (for now). ``` GRUB_CMDLINE_LINUX_DEFAULT="" GRUB_CMDLINE_LINUX="amdgpu.dc=0" ``` im also running ollama directly on the machine via systemd for performance reasons. im currently testing with this config in ollama systemd service. ``` Environment="OLLAMA_CONTEXT_LENGTH=66560" Environment="OLLAMA_KV_CACHE_TYPE=q8_0" Environment="OLLAMA_FLASH_ATTENTION=1" ``` maybe the proxmax > passthrough > ubuntu > docker > ollama is the issue? thats a lot of layers between the gpu and ollama and something may be happening in between them your hardware is also very different and technically speaking you would be using a slightly different driver path at the kernel level, for instance my amd driver path is `gfx1151` ``` -> % rocminfo | grep gfx Name: gfx1151 Name: amdgcn-amd-amdhsa--gfx1151 Name: amdgcn-amd-amdhsa--gfx11-generic ```
Author
Owner

@rick-github commented on GitHub (Oct 15, 2025):

I have a 395+ with 96G VRAM and GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.no_system_mem_limit amdgpu.dc=0". A recent workload that caused multiple GPU hangs ran yesterday without a problem. On the downside, the bump to 0.12.5 resulted in multiple crashes of the runner with

ollama  | ollama: llama-sampling.cpp:662: void llama_sampler_dist_apply(llama_sampler*, llama_token_data_array*): Assertion `found' failed.

which appears to be a llama.cpp + ROCm issue.

<!-- gh-comment-id:3407366120 --> @rick-github commented on GitHub (Oct 15, 2025): I have a 395+ with 96G VRAM and `GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.no_system_mem_limit amdgpu.dc=0"`. A recent workload that caused multiple GPU hangs ran yesterday without a problem. On the downside, the bump to 0.12.5 resulted in multiple crashes of the runner with ``` ollama | ollama: llama-sampling.cpp:662: void llama_sampler_dist_apply(llama_sampler*, llama_token_data_array*): Assertion `found' failed. ``` which appears to be a llama.cpp + ROCm [issue](https://github.com/ggml-org/llama.cpp/issues/15551#issuecomment-3259886592).
Author
Owner

@k3mist commented on GitHub (Oct 15, 2025):

@rick-github

has there been any version you've seen stability with?

a few days ago is the first time ive tried running models so i have no baseline

<!-- gh-comment-id:3407505509 --> @k3mist commented on GitHub (Oct 15, 2025): @rick-github has there been any version you've seen stability with? a few days ago is the first time ive tried running models so i have no baseline
Author
Owner

@rick-github commented on GitHub (Oct 15, 2025):

AMD GPU hangs have been an issue since about the end of June, which is when I acquired the machine. Yesterday was the first day without one.

<!-- gh-comment-id:3407805742 --> @rick-github commented on GitHub (Oct 15, 2025): AMD GPU hangs have been an issue since about the end of June, which is when I acquired the machine. Yesterday was the first day without one.
Author
Owner

@PDGIII commented on GitHub (Oct 15, 2025):

FWIW, switching to kernel 6.14.0-1014-oem and ROCm 6.4.4 fixed my issue.

Previous unstable configuration:

  • OS: Ubuntu 24.04.3 LTS
  • Kernel: 6.14.0-33-generic
  • ROCm: 7.0.1

Current stable configuration:

  • OS: Ubuntu 24.04.3 LTS
  • Kernel: 6.14.0-1014-oem
  • ROCm: 6.4.4
<!-- gh-comment-id:3407940960 --> @PDGIII commented on GitHub (Oct 15, 2025): FWIW, switching to kernel 6.14.0-1014-oem and ROCm 6.4.4 fixed my issue. Previous unstable configuration: - OS: Ubuntu 24.04.3 LTS - Kernel: 6.14.0-33-generic - ROCm: 7.0.1 Current stable configuration: - OS: Ubuntu 24.04.3 LTS - Kernel: 6.14.0-1014-oem - ROCm: 6.4.4
Author
Owner

@rick-github commented on GitHub (Oct 15, 2025):

kernel: 6.11.0-29-generic
ROCm: 6.4.1

<!-- gh-comment-id:3408139435 --> @rick-github commented on GitHub (Oct 15, 2025): kernel: 6.11.0-29-generic ROCm: 6.4.1
Author
Owner

@k3mist commented on GitHub (Oct 15, 2025):

@PDGIII what settings are you using for ollama?

i may try that oem kernel

-> % rocminfo       
ROCk module version 6.12.12 is loaded

i got hangs when i tried to run parallel work loads just now but i think that was because of gpt-oss 120b in vram.

switching to qwen3-coder:30b-a3b-fp16 for agent and qwen2.5-coder:1.5b for autocomplete seems to be ok, but only tested for about 10 minutes

its mostly been stable with below, but i just turned on max loaded and num para

Environment="OLLAMA_CONTEXT_LENGTH=66560"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_FLASH_ATTENTION=1"

just added

Environment="OLLAMA_MAX_LOADED_MODELS=2"
Environment="OLLAMA_NUM_PARALLEL=2"
<!-- gh-comment-id:3408501861 --> @k3mist commented on GitHub (Oct 15, 2025): @PDGIII what settings are you using for ollama? i may try that oem kernel ``` -> % rocminfo ROCk module version 6.12.12 is loaded ``` i got hangs when i tried to run parallel work loads just now but i think that was because of gpt-oss 120b in vram. switching to `qwen3-coder:30b-a3b-fp16` for agent and `qwen2.5-coder:1.5b` for autocomplete seems to be ok, but only tested for about 10 minutes its mostly been stable with below, but i just turned on max loaded and num para ``` Environment="OLLAMA_CONTEXT_LENGTH=66560" Environment="OLLAMA_KV_CACHE_TYPE=q8_0" Environment="OLLAMA_FLASH_ATTENTION=1" ``` just added ``` Environment="OLLAMA_MAX_LOADED_MODELS=2" Environment="OLLAMA_NUM_PARALLEL=2" ```
Author
Owner

@koasi commented on GitHub (Nov 8, 2025):

I am using amd ai max 395 and have encountered the same issue. Nevertheless, after configuring amd-ttm, the problem appears to occur less frequently.
it's happened on gpt-oss:20b, llama3.1:70b
os: ubuntu 24
kernal :6.14.0-35-generic
rock: 6.16.6
ollama: 0.12.10

<!-- gh-comment-id:3506792859 --> @koasi commented on GitHub (Nov 8, 2025): I am using amd ai max 395 and have encountered the same issue. Nevertheless, after configuring amd-ttm, the problem appears to occur less frequently. it's happened on gpt-oss:20b, llama3.1:70b os: ubuntu 24 kernal :6.14.0-35-generic rock: 6.16.6 ollama: 0.12.10
Author
Owner

@dhiltgen commented on GitHub (Nov 14, 2025):

In 0.12.11 Vulkan is now included in the official binaries, but still experimental. It might be worth trying on these systems to see if it behaves any better than ROCm. To enable, set OLLAMA_VULKAN=1 for the server. https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-do-i-configure-ollama-server

<!-- gh-comment-id:3530406119 --> @dhiltgen commented on GitHub (Nov 14, 2025): In 0.12.11 Vulkan is now included in the official binaries, but still experimental. It might be worth trying on these systems to see if it behaves any better than ROCm. To enable, set OLLAMA_VULKAN=1 for the server. https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-do-i-configure-ollama-server
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34047