[GH-ISSUE #8463] AMD Radeon RX6700XT unable to take input #5444

Closed
opened 2026-04-12 16:40:40 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @bitfl0wer on GitHub (Jan 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8463

What is the issue?

When trying to use ollamas APIs, the llama server crashes when loading.

Obligatory System Information

CPU: AMD Ryzen 9 7900
RAM: 64GB DDR5
OS: Fedora Linux 41 (Workstation Edition) x86_64
Ollama host: Docker

docker-compose.yml

services:
  webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
       - 8003:8080/tcp
    environment:
       - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
       - webui:/app/backend/data
    depends_on:
       - ollama
    restart:
      unless-stopped

  ollama:
    image: ollama/ollama:rocm
    environment:
      - HSA_OVERRIDE_GFX_VERSION="10.3.1"
      - AMD_SERIALIZE_KERNEL=3
      - OLLAMA_DEBUG=1
      - HIP_VISIBLE_DEVICES=0
      - OLLAMA_LLM_LIBRARY=rocm_v60102
    ports:
      - 11434:11434/tcp
    volumes:
      - ollama:/root/.ollama
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    restart:
      unless-stopped
volumes:
  ollama:
  webui:

Console output of error

https://pastebin.com/pTW3FMCp
Line 55 already hints at an error. The next troubling thing I could find was at line 166.

rocminfo

❯ rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7900 12-Core Processor 
  ...[redacted because irrelevant]   
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1031                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6700 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      3072(0xc00) KB                     
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29663(0x73df)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2855                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 122                                
  SDMA engine uCode::      80                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    12566528(0xbfc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    12566528(0xbfc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1031         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

My user is in the video and render groups.

ls output of relevant /dev entries

❯ ls -lag /dev/dri /dev/kfd /dev/dri/*
crw-rw----@   226,1 root video  16 Jan 18:19 /dev/dri/card1
crw-rw-rw-    234,0 root render 16 Jan 15:46 /dev/kfd
crw-rw-rw-  226,128 root render 16 Jan 15:46 /dev/dri/renderD128

/dev/dri:
drwxr-xr-x        - root root   16 Jan 15:45 ./
drwxr-xr-x        - root root   16 Jan 23:20 ../
drwxr-xr-x        - root root   16 Jan 15:46 by-path/
crw-rw----@   226,1 root video  16 Jan 18:19 card1
crw-rw-rw-  226,128 root render 16 Jan 15:46 renderD128

/dev/dri/by-path:
drwxr-xr-x   - root root 16 Jan 15:46 ./
drwxr-xr-x   - root root 16 Jan 15:45 ../
lrwxrwxrwx@  8 root root 16 Jan 15:46 pci-0000:03:00.0-card -> ../card1
lrwxrwxrwx  13 root root 16 Jan 15:46 pci-0000:03:00.0-render -> ../renderD128

I just pulled the docker image half an hour ago, so it should be the most up to date. Nevertheless, here is the hash of the image in use: sha256:9874ece252bfd8404e2795066649953255abc29e7d6aeab1966d19fadf9f06c4

If any more information is needed, I am happy to supply it. :)
Thank you a lot for your time and effort.

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

No response

Originally created by @bitfl0wer on GitHub (Jan 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8463 ### What is the issue? When trying to use ollamas APIs, the llama server crashes when loading. ## Obligatory System Information CPU: AMD Ryzen 9 7900 RAM: 64GB DDR5 OS: Fedora Linux 41 (Workstation Edition) x86_64 Ollama host: Docker ### docker-compose.yml ```yml services: webui: image: ghcr.io/open-webui/open-webui:main ports: - 8003:8080/tcp environment: - OLLAMA_BASE_URL=http://ollama:11434 volumes: - webui:/app/backend/data depends_on: - ollama restart: unless-stopped ollama: image: ollama/ollama:rocm environment: - HSA_OVERRIDE_GFX_VERSION="10.3.1" - AMD_SERIALIZE_KERNEL=3 - OLLAMA_DEBUG=1 - HIP_VISIBLE_DEVICES=0 - OLLAMA_LLM_LIBRARY=rocm_v60102 ports: - 11434:11434/tcp volumes: - ollama:/root/.ollama devices: - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri restart: unless-stopped volumes: ollama: webui: ``` ### Console output of error https://pastebin.com/pTW3FMCp Line 55 already hints at an error. The next troubling thing I could find was at line 166. ### rocminfo ``` ❯ rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 7900 12-Core Processor ...[redacted because irrelevant] Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1031 Uuid: GPU-XX Marketing Name: AMD Radeon RX 6700 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 3072(0xc00) KB L3: 98304(0x18000) KB Chip ID: 29663(0x73df) ASIC Revision: 0(0x0) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2855 BDFID: 768 Internal Node ID: 1 Compute Unit: 40 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 122 SDMA engine uCode:: 80 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1031 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` My user is in the `video` and `render` groups. ### `ls` output of relevant `/dev` entries ``` ❯ ls -lag /dev/dri /dev/kfd /dev/dri/* crw-rw----@ 226,1 root video 16 Jan 18:19 /dev/dri/card1 crw-rw-rw- 234,0 root render 16 Jan 15:46 /dev/kfd crw-rw-rw- 226,128 root render 16 Jan 15:46 /dev/dri/renderD128 /dev/dri: drwxr-xr-x - root root 16 Jan 15:45 ./ drwxr-xr-x - root root 16 Jan 23:20 ../ drwxr-xr-x - root root 16 Jan 15:46 by-path/ crw-rw----@ 226,1 root video 16 Jan 18:19 card1 crw-rw-rw- 226,128 root render 16 Jan 15:46 renderD128 /dev/dri/by-path: drwxr-xr-x - root root 16 Jan 15:46 ./ drwxr-xr-x - root root 16 Jan 15:45 ../ lrwxrwxrwx@ 8 root root 16 Jan 15:46 pci-0000:03:00.0-card -> ../card1 lrwxrwxrwx 13 root root 16 Jan 15:46 pci-0000:03:00.0-render -> ../renderD128 ``` I just pulled the docker image half an hour ago, so it should be the most up to date. Nevertheless, here is the hash of the image in use: `sha256:9874ece252bfd8404e2795066649953255abc29e7d6aeab1966d19fadf9f06c4` If any more information is needed, I am happy to supply it. :) Thank you a lot for your time and effort. ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 16:40:40 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 17, 2025):

Try removing the quotes around 10.3.1.

<!-- gh-comment-id:2597211456 --> @rick-github commented on GitHub (Jan 17, 2025): Try removing the quotes around 10.3.1.
Author
Owner

@bitfl0wer commented on GitHub (Jan 17, 2025):

Try removing the quotes around 10.3.1.

That does not change anything.

<!-- gh-comment-id:2598950523 --> @bitfl0wer commented on GitHub (Jan 17, 2025): > Try removing the quotes around 10.3.1. That does not change anything.
Author
Owner

@Vilchis-Joshua commented on GitHub (Jan 18, 2025):

I would love to know how you got even this far.
I'm running on Debian Bookworm and whenever I exec into my ollama/ollama:rocm image, I can't even get it to recognize rocminfo.

Have you already confirmed that you can run it locally? (I.e. AMD drivers are all present and ROCM installation is working)

<!-- gh-comment-id:2599414509 --> @Vilchis-Joshua commented on GitHub (Jan 18, 2025): I would love to know how you got even this far. I'm running on Debian Bookworm and whenever I exec into my ollama/ollama:rocm image, I can't even get it to recognize rocminfo. Have you already confirmed that you can run it locally? (I.e. AMD drivers are all present and ROCM installation is working)
Author
Owner

@rick-github commented on GitHub (Jan 18, 2025):

I had a spare ROCm device and I tried to replicate. I took your ollama stanza and after up and a model load I got the same error:

Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found

Then I added these lines to the ollama stanza:

  cap_add:
     - PERFMON
     - CAP_PERFMON
  privileged: true
  security_opt:
    - seccomp:unconfined

Did an up, loaded a model, prompted and got a reply. So yay, that fixed it. Just to verify, I removed those lines and re-upd the container, expecting to to fail again. It didn't. Now it works fine without those lines.

So try adding the above, it may or may not work.

Also, OLLAMA_LLM_LIBRARY is wrong, should be OLLAMA_LLM_LIBRARY=rocm.

<!-- gh-comment-id:2599442956 --> @rick-github commented on GitHub (Jan 18, 2025): I had a spare ROCm device and I tried to replicate. I took your ollama stanza and after `up` and a model load I got the same error: ``` Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found ``` Then I added these lines to the ollama stanza: ```yaml cap_add: - PERFMON - CAP_PERFMON privileged: true security_opt: - seccomp:unconfined ``` Did an `up`, loaded a model, prompted and got a reply. So yay, that fixed it. Just to verify, I removed those lines and re-`up`d the container, expecting to to fail again. It didn't. Now it works fine without those lines. So try adding the above, it may or may not work. Also, `OLLAMA_LLM_LIBRARY` is wrong, should be `OLLAMA_LLM_LIBRARY=rocm`.
Author
Owner

@bitfl0wer commented on GitHub (Jan 18, 2025):

I had a spare ROCm device and I tried to replicate. I took your ollama stanza and after up and a model load I got the same error:

Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found

Then I added these lines to the ollama stanza:

cap_add:
- PERFMON
- CAP_PERFMON
privileged: true
security_opt:
- seccomp:unconfined

Did an up, loaded a model, prompted and got a reply. So yay, that fixed it. Just to verify, I removed those lines and re-upd the container, expecting to to fail again. It didn't. Now it works fine without those lines.

So try adding the above, it may or may not work.

Also, OLLAMA_LLM_LIBRARY is wrong, should be OLLAMA_LLM_LIBRARY=rocm.

Thank you for your time and effort!
It seems that this is an issue with so many parameters at play, that there is not just one solution.
I have also tried to debug this error and I have found out the following:

  • HSA_OVERRIDE_GFX_VERSION has to have its quotes removed, like you suggested. However, this was not the only issue, which is why I haven't seen a change after applying (only) this fix.
  • OLLAMA_LLM_LIBRARY was indeed wrong. In fact, commenting out that environment variable entirely is sufficient for my setup! Combined with the first patch, my ollama instance can now process prompts.

Again, thank you for your efforts! They are much appreciated.

<!-- gh-comment-id:2599656583 --> @bitfl0wer commented on GitHub (Jan 18, 2025): > I had a spare ROCm device and I tried to replicate. I took your ollama stanza and after `up` and a model load I got the same error: > > ``` > Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found > ``` > > Then I added these lines to the ollama stanza: > > cap_add: > - PERFMON > - CAP_PERFMON > privileged: true > security_opt: > - seccomp:unconfined > > Did an `up`, loaded a model, prompted and got a reply. So yay, that fixed it. Just to verify, I removed those lines and re-`up`d the container, expecting to to fail again. It didn't. Now it works fine without those lines. > > So try adding the above, it may or may not work. > > Also, `OLLAMA_LLM_LIBRARY` is wrong, should be `OLLAMA_LLM_LIBRARY=rocm`. Thank you for your time and effort! It seems that this is an issue with so many parameters at play, that there is not just one solution. I have also tried to debug this error and I have found out the following: - `HSA_OVERRIDE_GFX_VERSION` has to have its quotes removed, like you suggested. However, this was not the only issue, which is why I haven't seen a change after applying (only) this fix. - `OLLAMA_LLM_LIBRARY` was indeed wrong. In fact, commenting out that environment variable entirely is sufficient for my setup! Combined with the first patch, my ollama instance can now process prompts. Again, thank you for your efforts! They are much appreciated.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5444