[GH-ISSUE #3189] Add support for amd Radeon 780M gfx1103 - override works #64003

Closed
opened 2026-05-03 15:46:50 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @thomas-0816 on GitHub (Mar 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3189

Originally assigned to: @dhiltgen on GitHub.

What are you trying to do?

Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04).

Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good.

Not working ("amdgpu [0] gfx1103 is not supported"):

OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..."
time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected"

My current workaround is to force gfx1102 (no issues so far):

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..."
time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory  8192M"

Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives
cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000)

Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios:

> CPU
OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       4.343514826s
load duration:        264.691µs
prompt eval duration: 168.205ms
prompt eval rate:     0.00 tokens/s
eval count:           26 token(s)
eval duration:        4.174563s
eval rate:            6.23 tokens/s
> GPU
HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       1.513455927s
load duration:        161.535µs
prompt eval duration: 65.979ms
prompt eval rate:     0.00 tokens/s
eval count:           27 token(s)
eval duration:        1.446805s
eval rate:            18.66 tokens/s

How should we solve this?

No response

What is the impact of not solving this?

No response

Anything else?

No response

Originally created by @thomas-0816 on GitHub (Mar 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3189 Originally assigned to: @dhiltgen on GitHub. ### What are you trying to do? Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04). Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good. Not working ("amdgpu [0] gfx1103 is not supported"): ``` OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20" time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..." time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]" time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected" ``` My current workaround is to force gfx1102 (no issues so far): ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20" time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..." time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]" time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M" time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 8192M" ``` Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000) Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios: ``` > CPU OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve ollama run llama2:latest "where was beethoven born?" --verbose Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. total duration: 4.343514826s load duration: 264.691µs prompt eval duration: 168.205ms prompt eval rate: 0.00 tokens/s eval count: 26 token(s) eval duration: 4.174563s eval rate: 6.23 tokens/s ``` ``` > GPU HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve ollama run llama2:latest "where was beethoven born?" --verbose Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. total duration: 1.513455927s load duration: 161.535µs prompt eval duration: 65.979ms prompt eval rate: 0.00 tokens/s eval count: 27 token(s) eval duration: 1.446805s eval rate: 18.66 tokens/s ``` ### How should we solve this? _No response_ ### What is the impact of not solving this? _No response_ ### Anything else? _No response_
GiteaMirror added the amdfeature requestlinuxwindows labels 2026-05-03 15:46:51 -05:00
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

Glad to hear the override is working. You shouldn't need to set OLLAMA_LLM_LIBRARY - it should auto-detect the Radeon GPU is present and use the correct library, only the HSA_OVERRIDE_GFX_VERSION should be required for your setup.

At the moment, we're not planning to auto-override as this can lead to crashes in some scenarios, so we're steering users to manually set this to ensure they understand what's going on. AMD is working on adding more general support for families of GPUs which should eliminate the need for this override approach.

<!-- gh-comment-id:2009073882 --> @dhiltgen commented on GitHub (Mar 20, 2024): Glad to hear the override is working. You shouldn't need to set `OLLAMA_LLM_LIBRARY` - it should auto-detect the Radeon GPU is present and use the correct library, only the `HSA_OVERRIDE_GFX_VERSION` should be required for your setup. At the moment, we're not planning to auto-override as this can lead to crashes in some scenarios, so we're steering users to manually set this to ensure they understand what's going on. AMD is working on adding more general support for families of GPUs which should eliminate the need for this override approach.
Author
Owner

@wowpala commented on GitHub (Mar 23, 2024):

Same with me, on Windows.

Please also add support on Windows.

$env:OLLAMA_ORIGINS="app://obsidian.md*"; ollama serve
time=2024-03-24T02:21:24.575+08:00 level=INFO source=images.go:806 msg="total blobs: 8"
time=2024-03-24T02:21:24.599+08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-24T02:21:24.601+08:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-24T02:21:24.601+08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to C:\\Users\\Jeff\\AppData\\Local\\Temp\\ollama3340791837\\runners ..."
time=2024-03-24T02:21:24.654+08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11.3 rocm_v5.7 cpu_avx2]"
time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library nvml.dll"
time=2024-03-24T02:21:29.424+08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-24T02:21:29.424+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-24T02:21:29.442+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103"
time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]"
time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-24T02:21:29.510+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-24T02:21:29.526+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970"
time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices"
time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M"
time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103"
time=2024-03-24T02:21:29.527+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]"
time=2024-03-24T02:21:29.528+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-24T02:21:29.528+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-24T02:21:29.590+08:00 level=INFO source=llm.go:85 msg="GPU not available, falling back to CPU"
<!-- gh-comment-id:2016568857 --> @wowpala commented on GitHub (Mar 23, 2024): Same with me, on Windows. Please also add support on Windows. ``` $env:OLLAMA_ORIGINS="app://obsidian.md*"; ollama serve time=2024-03-24T02:21:24.575+08:00 level=INFO source=images.go:806 msg="total blobs: 8" time=2024-03-24T02:21:24.599+08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-24T02:21:24.601+08:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-24T02:21:24.601+08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to C:\\Users\\Jeff\\AppData\\Local\\Temp\\ollama3340791837\\runners ..." time=2024-03-24T02:21:24.654+08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11.3 rocm_v5.7 cpu_avx2]" time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library nvml.dll" time=2024-03-24T02:21:29.424+08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-24T02:21:29.424+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-24T02:21:29.442+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103" time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]" time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-24T02:21:29.510+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-24T02:21:29.526+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970" time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices" time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M" time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103" time=2024-03-24T02:21:29.527+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]" time=2024-03-24T02:21:29.528+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-24T02:21:29.528+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-24T02:21:29.590+08:00 level=INFO source=llm.go:85 msg="GPU not available, falling back to CPU" ```
Author
Owner

@zlwu commented on GitHub (May 4, 2024):

Cool, 7840U has AVX512, is there any improvements over AVX2?

<!-- gh-comment-id:2093994819 --> @zlwu commented on GitHub (May 4, 2024): Cool, 7840U has AVX512, is there any improvements over AVX2?
Author
Owner

@Thanos-CP commented on GitHub (May 5, 2024):

hey i have the same problem, my Amd 780m didnt get recognized by ubuntu, you also cant use docm with this gpu, could you maybe help me fix this for me, Im knew to this and dont know what to do thank you

<!-- gh-comment-id:2094854335 --> @Thanos-CP commented on GitHub (May 5, 2024): hey i have the same problem, my Amd 780m didnt get recognized by ubuntu, you also cant use docm with this gpu, could you maybe help me fix this for me, Im knew to this and dont know what to do thank you
Author
Owner

@dhiltgen commented on GitHub (May 5, 2024):

To clarify, unfortunately the override is not supported on windows due to ROCm limitations. That's tracked via #3107

<!-- gh-comment-id:2094975019 --> @dhiltgen commented on GitHub (May 5, 2024): To clarify, unfortunately the override is not supported on windows due to ROCm limitations. That's tracked via #3107
Author
Owner

@TheophileH commented on GitHub (May 6, 2024):

Hello,
The override HSA_OVERRIDE_GFX_VERSION=10.3.0 doesn't work for me.

I'm using
AMD GPU: gfx1032 (x2)
OS: Ubuntu 22.04
CPU with no AVX support

I got the following error:

May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.541Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.592Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:37055/health": dial tcp 127.0.0.1:37055: connect: connection refused"
May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.091Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: -1 "
May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.092Z level=DEBUG source=server.go:832 msg="stopping llama server"

The full log is available here

Please, let me know what I should modify.

<!-- gh-comment-id:2095825726 --> @TheophileH commented on GitHub (May 6, 2024): Hello, The override _HSA_OVERRIDE_GFX_VERSION=10.3.0_ doesn't work for me. I'm using AMD GPU: gfx1032 (x2) OS: Ubuntu 22.04 CPU with no AVX support I got the following error: > May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.541Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding" May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.592Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:37055/health\": dial tcp 127.0.0.1:37055: connect: connection refused" May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.091Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: -1 " May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.092Z level=DEBUG source=server.go:832 msg="stopping llama server" The full log is [available here](https://drive.google.com/file/d/1mOaelrj9qxqajab3YP1F5OBnhJpXqI9S/view?usp=sharing) Please, let me know what I should modify.
Author
Owner

@dhiltgen commented on GitHub (May 7, 2024):

@TheophileH I'm not positive, but you may have hit #4105

<!-- gh-comment-id:2099434841 --> @dhiltgen commented on GitHub (May 7, 2024): @TheophileH I'm not positive, but you may have hit #4105
Author
Owner

@bryndin commented on GitHub (May 9, 2024):

I was able to make it run on AMD 780M GPU on Windows 11. (Perf increas of about x2)

Thanks to @likelovewant for providing instructions and the specific version of Ollama.
See https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU-/issues/3

@dhiltgen Is it possible to integrate this solution into Ollama? Nothing else but that build worked.

<!-- gh-comment-id:2102169362 --> @bryndin commented on GitHub (May 9, 2024): I was able to make it run on AMD 780M GPU on Windows 11. (Perf increas of about x2) Thanks to @likelovewant for providing instructions and the specific version of Ollama. See https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU-/issues/3 @dhiltgen Is it possible to integrate this solution into Ollama? Nothing else but that build worked.
Author
Owner

@TheophileH commented on GitHub (May 10, 2024):

@TheophileH I'm not positive, but you may have hit #4105

Thanks for your response, @dhiltgen.
As mentioned in #4105, I used OLLAMA_VERSION=0.1.34 and set OLLAMA_TMPDIR=/usr/share/ollama/.
I still get the same log but now, it runs forever before throwing the following error:

Error: timed out waiting for llama runner to start:

PS:
I have both rocm6 and cuda12 installed on my machine.

<!-- gh-comment-id:2104202448 --> @TheophileH commented on GitHub (May 10, 2024): > @TheophileH I'm not positive, but you may have hit #4105 Thanks for your response, @dhiltgen. As mentioned in #4105, I used OLLAMA_VERSION=0.1.34 and set OLLAMA_TMPDIR=/usr/share/ollama/. I still get the same [log](https://github.com/ollama/ollama/issues/3189#issuecomment-2095825726) but now, it runs forever before throwing the following error: > Error: timed out waiting for llama runner to start: PS: I have both rocm6 and cuda12 installed on my machine.
Author
Owner

@AlexHe99 commented on GitHub (Jul 2, 2024):

@thbley @TheophileH @dhiltgen

Please follow my practice to use iGPU 780M(gfx1103) at Linux.

https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md

What are you trying to do?

Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04).

Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good.

Not working ("amdgpu [0] gfx1103 is not supported"):

OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..."
time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected"

My current workaround is to force gfx1102 (no issues so far):

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..."
time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory  8192M"

Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000)

Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios:

> CPU
OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       4.343514826s
load duration:        264.691µs
prompt eval duration: 168.205ms
prompt eval rate:     0.00 tokens/s
eval count:           26 token(s)
eval duration:        4.174563s
eval rate:            6.23 tokens/s
> GPU
HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       1.513455927s
load duration:        161.535µs
prompt eval duration: 65.979ms
prompt eval rate:     0.00 tokens/s
eval count:           27 token(s)
eval duration:        1.446805s
eval rate:            18.66 tokens/s

How should we solve this?

No response

What is the impact of not solving this?

No response

Anything else?

No response

<!-- gh-comment-id:2201797274 --> @AlexHe99 commented on GitHub (Jul 2, 2024): @thbley @TheophileH @dhiltgen Please follow my practice to use iGPU 780M(gfx1103) at Linux. https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md > ### What are you trying to do? > Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04). > > Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good. > > Not working ("amdgpu [0] gfx1103 is not supported"): > > ``` > OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve > time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20" > time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" > time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" > time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..." > time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]" > time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" > time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" > time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" > time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU" > time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected" > ``` > > My current workaround is to force gfx1102 (no issues so far): > > ``` > HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve > time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20" > time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" > time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" > time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..." > time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]" > time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" > time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" > time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" > time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M" > time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 8192M" > ``` > > Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000) > > Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios: > > ``` > > CPU > OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve > ollama run llama2:latest "where was beethoven born?" --verbose > Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. > total duration: 4.343514826s > load duration: 264.691µs > prompt eval duration: 168.205ms > prompt eval rate: 0.00 tokens/s > eval count: 26 token(s) > eval duration: 4.174563s > eval rate: 6.23 tokens/s > ``` > > ``` > > GPU > HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve > ollama run llama2:latest "where was beethoven born?" --verbose > Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. > total duration: 1.513455927s > load duration: 161.535µs > prompt eval duration: 65.979ms > prompt eval rate: 0.00 tokens/s > eval count: 27 token(s) > eval duration: 1.446805s > eval rate: 18.66 tokens/s > ``` > > ### How should we solve this? > _No response_ > > ### What is the impact of not solving this? > _No response_ > > ### Anything else? > _No response_
Author
Owner

@pearsonc commented on GitHub (Jul 5, 2024):

Hey everyone, just wanted to chime in and say that I'd love to see support for AMD Radeon 780M (gfx1103) added to Ollama!

However, I think it's worth noting that this would require ROCm to add support for this specific chipset. To make this happen, I've opened a discussion and a feature request over on the ROCm GitHub page:

Rocm Feature Radeon 780M Discussion

Rocm Radeon 780M Feature Request

If you're interested in seeing this support added, please head on over to the discussion and give it a thumbs up! Let's help push for this change and make it happen.

Maybe add a reply to the discussion highlighting your support for the feature request.

<!-- gh-comment-id:2210940145 --> @pearsonc commented on GitHub (Jul 5, 2024): Hey everyone, just wanted to chime in and say that I'd love to see support for AMD Radeon 780M (gfx1103) added to Ollama! However, I think it's worth noting that this would require ROCm to add support for this specific chipset. To make this happen, I've opened a discussion and a feature request over on the ROCm GitHub page: [Rocm Feature Radeon 780M Discussion](https://github.com/ROCm/ROCm/discussions/3360) [Rocm Radeon 780M Feature Request](https://github.com/ROCm/ROCm/issues/3398) If you're interested in seeing this support added, please head on over to the discussion and give it a thumbs up! Let's help push for this change and make it happen. Maybe add a reply to the discussion highlighting your support for the feature request.
Author
Owner

@N4S4 commented on GitHub (Aug 26, 2024):

Hello,
not sure if this is the right place, I am experiencing an issue where when setting the override with systemd edit ollama.service it dores not take place after restart on ubuntu 24, ryzen 9 7940hs radeon 780m

output from journalctl -u ollama --no-pager

ollama[49742]: 2024/08/25 08:15:47 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:*

HSA_OVERRIDE_GFX_VERSION: remains empty

could someone help me?

<!-- gh-comment-id:2310368016 --> @N4S4 commented on GitHub (Aug 26, 2024): Hello, not sure if this is the right place, I am experiencing an issue where when setting the override with systemd edit ollama.service it dores not take place after restart on ubuntu 24, ryzen 9 7940hs radeon 780m output from journalctl -u ollama --no-pager ollama[49742]: 2024/08/25 08:15:47 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* HSA_OVERRIDE_GFX_VERSION: remains empty could someone help me?
Author
Owner

@manojparvathaneni commented on GitHub (Sep 24, 2024):

has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md?

<!-- gh-comment-id:2370061088 --> @manojparvathaneni commented on GitHub (Sep 24, 2024): has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md?
Author
Owner

@serg-157 commented on GitHub (Jan 29, 2025):

has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md?

I managed to run it on 680M with HSA_OVERRIDE_GFX_VERSION="10.3.0"

<!-- gh-comment-id:2623053323 --> @serg-157 commented on GitHub (Jan 29, 2025): > has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md? I managed to run it on 680M with HSA_OVERRIDE_GFX_VERSION="10.3.0"
Author
Owner

@Cognifyi commented on GitHub (Jul 4, 2025):

env

  • gpu: AMD Ryzen 7 7840HS with Radeon 780M Graphics
  • sys: Fedora 42 kernel 6.15
  • rocm: 6.4
  • default VRAM: 4G, and can't increase it in the BIOS
  • ollama: client version is 0.9.3
rocminfo

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.15
Runtime Ext Version:     1.7
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========
HSA Agents


Agent 1


Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5137
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 28399368(0x1b15708) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 28399368(0x1b15708) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 28399368(0x1b15708) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 28399368(0x1b15708) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1103
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 5567(0x15bf)
ASIC Revision: 9(0x9)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2700
BDFID: 25344
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 40
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 14199684(0xd8ab84) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 14199684(0xd8ab84) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1103
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx11-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32


Agent 3


Name: aie2
Uuid: AIE-XX
Marketing Name: AIE-ML
Vendor Name: AMD
Feature: AGENT_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 1(0x1)
Queue Min Size: 64(0x40)
Queue Max Size: 64(0x40)
Queue Type: SINGLE
Node: 0
Device Type: DSP
Cache Info:
L2: 2048(0x800) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 0(0x0)
Max Clock Freq. (MHz): 0
BDFID: 0
Internal Node ID: 0
Compute Unit: 0
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:0
Memory Properties:
Features: AGENT_DISPATCH
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, COARSE GRAINED
Size: 28399368(0x1b15708) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65536(0x10000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:0KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 28399368(0x1b15708) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*** Done ***

1. small model

1.1 cpu

OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
 for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     107.20 tokens/s
eval rate:            19.98 tokens/s
prompt eval rate:     792.38 tokens/s
eval rate:            49.13 tokens/s
prompt eval rate:     771.83 tokens/s
eval rate:            47.59 tokens/s
prompt eval rate:     795.08 tokens/s
eval rate:            48.96 tokens/s
prompt eval rate:     815.03 tokens/s
eval rate:            49.15 tokens/s
prompt eval rate:     715.84 tokens/s
eval rate:            49.33 tokens/s
prompt eval rate:     760.14 tokens/s
eval rate:            47.68 tokens/s
prompt eval rate:     723.07 tokens/s
eval rate:            48.66 tokens/s
prompt eval rate:     770.71 tokens/s
eval rate:            49.64 tokens/s
prompt eval rate:     815.97 tokens/s
eval rate:            47.99 tokens/s

Image

1.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve
time=2025-07-04T16:54:53.855+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-07-04T16:54:53.907+08:00 level=INFO source=images.go:476 msg="total blobs: 111"
time=2025-07-04T16:54:53.930+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-07-04T16:54:53.947+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-04T16:54:53.947+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-04T16:54:53.954+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-07-04T16:54:53.955+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2
time=2025-07-04T16:54:53.955+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.7 GiB"

for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     125.13 tokens/s
eval rate:            64.57 tokens/s
prompt eval rate:     615.54 tokens/s
eval rate:            60.95 tokens/s
prompt eval rate:     622.10 tokens/s
eval rate:            59.60 tokens/s
prompt eval rate:     626.71 tokens/s
eval rate:            34.40 tokens/s
prompt eval rate:     307.29 tokens/s
eval rate:            43.58 tokens/s
prompt eval rate:     609.51 tokens/s
eval rate:            61.88 tokens/s
prompt eval rate:     606.04 tokens/s
eval rate:            60.33 tokens/s
prompt eval rate:     560.74 tokens/s
eval rate:            62.77 tokens/s
prompt eval rate:     528.20 tokens/s
eval rate:            32.86 tokens/s
prompt eval rate:     653.42 tokens/s
eval rate:            60.51 tokens/s

Image

2. big model

2.1 cpu

OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     37.74 tokens/s
eval rate:            14.29 tokens/s
prompt eval rate:     224.57 tokens/s
eval rate:            14.50 tokens/s
prompt eval rate:     223.62 tokens/s
eval rate:            14.25 tokens/s
prompt eval rate:     214.52 tokens/s
eval rate:            14.41 tokens/s
prompt eval rate:     206.24 tokens/s
eval rate:            14.32 tokens/s
prompt eval rate:     220.67 tokens/s
eval rate:            14.12 tokens/s
prompt eval rate:     204.51 tokens/s
eval rate:            14.35 tokens/s
prompt eval rate:     222.51 tokens/s
eval rate:            14.04 tokens/s
prompt eval rate:     211.58 tokens/s
eval rate:            14.03 tokens/s
prompt eval rate:     209.59 tokens/s
eval rate:            14.12 tokens/s

Image

2.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve
time=2025-07-04T17:08:41.498+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-07-04T17:08:41.565+08:00 level=INFO source=images.go:476 msg="total blobs: 111"
time=2025-07-04T17:08:41.592+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-07-04T17:08:41.613+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-04T17:08:41.614+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-04T17:08:41.621+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-07-04T17:08:41.622+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2
time=2025-07-04T17:08:41.622+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.6 GiB"
for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     57.79 tokens/s
eval rate:            14.60 tokens/s
prompt eval rate:     205.32 tokens/s
eval rate:            14.47 tokens/s
prompt eval rate:     208.91 tokens/s
eval rate:            14.19 tokens/s
prompt eval rate:     223.63 tokens/s
eval rate:            14.21 tokens/s
prompt eval rate:     207.97 tokens/s
eval rate:            14.26 tokens/s
prompt eval rate:     215.12 tokens/s
eval rate:            14.22 tokens/s
prompt eval rate:     215.77 tokens/s
eval rate:            14.61 tokens/s
prompt eval rate:     225.22 tokens/s
eval rate:            14.56 tokens/s
prompt eval rate:     217.88 tokens/s
eval rate:            14.52 tokens/s
prompt eval rate:     224.89 tokens/s
eval rate:            14.23 tokens/s

Image

Questions

small model (< VRAM) works well for gpu

but big model (> VRAM) can't work well for gpu? why?

BIOS can't increase VRAM, max is 4G

<!-- gh-comment-id:3035216134 --> @Cognifyi commented on GitHub (Jul 4, 2025): # env * gpu: AMD Ryzen 7 7840HS with Radeon 780M Graphics * sys: Fedora 42 kernel 6.15 * rocm: 6.4 * default VRAM: 4G, and can't increase it in the BIOS * ollama: client version is 0.9.3 <details> <summary>rocminfo</summary> <pre><code> ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5137 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1103 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB Chip ID: 5567(0x15bf) ASIC Revision: 9(0x9) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2700 BDFID: 25344 Internal Node ID: 1 Compute Unit: 12 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 40 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 14199684(0xd8ab84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 14199684(0xd8ab84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1103 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ******* Agent 3 ******* Name: aie2 Uuid: AIE-XX Marketing Name: AIE-ML Vendor Name: AMD Feature: AGENT_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 1(0x1) Queue Min Size: 64(0x40) Queue Max Size: 64(0x40) Queue Type: SINGLE Node: 0 Device Type: DSP Cache Info: L2: 2048(0x800) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 0(0x0) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 0 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:0 Memory Properties: Features: AGENT_DISPATCH Pool Info: Pool 1 Segment: GLOBAL; FLAGS: KERNARG, COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65536(0x10000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:0KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *** Done *** </code></pre> </details> # 1. small model ## 1.1 cpu ``` OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 107.20 tokens/s eval rate: 19.98 tokens/s prompt eval rate: 792.38 tokens/s eval rate: 49.13 tokens/s prompt eval rate: 771.83 tokens/s eval rate: 47.59 tokens/s prompt eval rate: 795.08 tokens/s eval rate: 48.96 tokens/s prompt eval rate: 815.03 tokens/s eval rate: 49.15 tokens/s prompt eval rate: 715.84 tokens/s eval rate: 49.33 tokens/s prompt eval rate: 760.14 tokens/s eval rate: 47.68 tokens/s prompt eval rate: 723.07 tokens/s eval rate: 48.66 tokens/s prompt eval rate: 770.71 tokens/s eval rate: 49.64 tokens/s prompt eval rate: 815.97 tokens/s eval rate: 47.99 tokens/s ``` ![Image](https://github.com/user-attachments/assets/cff583de-8029-4afc-89d1-f7f3b61e7cf9) ## 1.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve time=2025-07-04T16:54:53.855+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-07-04T16:54:53.907+08:00 level=INFO source=images.go:476 msg="total blobs: 111" time=2025-07-04T16:54:53.930+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0" time=2025-07-04T16:54:53.947+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)" time=2025-07-04T16:54:53.947+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-07-04T16:54:53.954+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-07-04T16:54:53.955+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2 time=2025-07-04T16:54:53.955+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.7 GiB" ``` ``` for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 125.13 tokens/s eval rate: 64.57 tokens/s prompt eval rate: 615.54 tokens/s eval rate: 60.95 tokens/s prompt eval rate: 622.10 tokens/s eval rate: 59.60 tokens/s prompt eval rate: 626.71 tokens/s eval rate: 34.40 tokens/s prompt eval rate: 307.29 tokens/s eval rate: 43.58 tokens/s prompt eval rate: 609.51 tokens/s eval rate: 61.88 tokens/s prompt eval rate: 606.04 tokens/s eval rate: 60.33 tokens/s prompt eval rate: 560.74 tokens/s eval rate: 62.77 tokens/s prompt eval rate: 528.20 tokens/s eval rate: 32.86 tokens/s prompt eval rate: 653.42 tokens/s eval rate: 60.51 tokens/s ``` ![Image](https://github.com/user-attachments/assets/1057dd0b-7ee4-45cf-ba4c-ef34d453a3e2) # 2. big model ## 2.1 cpu ``` OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 37.74 tokens/s eval rate: 14.29 tokens/s prompt eval rate: 224.57 tokens/s eval rate: 14.50 tokens/s prompt eval rate: 223.62 tokens/s eval rate: 14.25 tokens/s prompt eval rate: 214.52 tokens/s eval rate: 14.41 tokens/s prompt eval rate: 206.24 tokens/s eval rate: 14.32 tokens/s prompt eval rate: 220.67 tokens/s eval rate: 14.12 tokens/s prompt eval rate: 204.51 tokens/s eval rate: 14.35 tokens/s prompt eval rate: 222.51 tokens/s eval rate: 14.04 tokens/s prompt eval rate: 211.58 tokens/s eval rate: 14.03 tokens/s prompt eval rate: 209.59 tokens/s eval rate: 14.12 tokens/s ``` ![Image](https://github.com/user-attachments/assets/3eb52fea-b4b5-40dd-b0ac-d6863b4b28e0) ## 2.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve time=2025-07-04T17:08:41.498+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-07-04T17:08:41.565+08:00 level=INFO source=images.go:476 msg="total blobs: 111" time=2025-07-04T17:08:41.592+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0" time=2025-07-04T17:08:41.613+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)" time=2025-07-04T17:08:41.614+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-07-04T17:08:41.621+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-07-04T17:08:41.622+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2 time=2025-07-04T17:08:41.622+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.6 GiB" ``` ``` for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 57.79 tokens/s eval rate: 14.60 tokens/s prompt eval rate: 205.32 tokens/s eval rate: 14.47 tokens/s prompt eval rate: 208.91 tokens/s eval rate: 14.19 tokens/s prompt eval rate: 223.63 tokens/s eval rate: 14.21 tokens/s prompt eval rate: 207.97 tokens/s eval rate: 14.26 tokens/s prompt eval rate: 215.12 tokens/s eval rate: 14.22 tokens/s prompt eval rate: 215.77 tokens/s eval rate: 14.61 tokens/s prompt eval rate: 225.22 tokens/s eval rate: 14.56 tokens/s prompt eval rate: 217.88 tokens/s eval rate: 14.52 tokens/s prompt eval rate: 224.89 tokens/s eval rate: 14.23 tokens/s ``` ![Image](https://github.com/user-attachments/assets/3b15d4fe-ba76-440f-8e98-17812a0ca879) # Questions small model (< VRAM) works well for gpu but big model (> VRAM) can't work well for gpu? why? BIOS can't increase VRAM, max is 4G
Author
Owner

@MrUhu commented on GitHub (Jul 13, 2025):

So you wrote, that you have 4gb of VRAM available for the iGPU.

Questions

small model (< VRAM) works well for gpu

but big model (> VRAM) can't work well for gpu? why?

BIOS can't increase VRAM, max is 4G

The big model is 6.9gb big, If I'm not mistaken.
6.9gb is bigger than your VRAM can fit (you also need to consider, that the Key-Value-Cache needs to fit in VRAM for best performance). So about 1/3 of the layers of this model are calculated on the CPU because they need to be in system memory (RAM).
The limiting factor here is therefore your CPU.

That's why the performance between the CPU and GPU test is the same.

<!-- gh-comment-id:3066868464 --> @MrUhu commented on GitHub (Jul 13, 2025): So you wrote, that you have 4gb of VRAM available for the iGPU. > # Questions > small model (< VRAM) works well for gpu > > but big model (> VRAM) can't work well for gpu? why? > > BIOS can't increase VRAM, max is 4G The big model is 6.9gb big, If I'm not mistaken. 6.9gb is bigger than your VRAM can fit (you also need to consider, that the Key-Value-Cache needs to fit in VRAM for best performance). So about 1/3 of the layers of this model are calculated on the CPU because they need to be in system memory (RAM). The limiting factor here is therefore your CPU. That's why the performance between the CPU and GPU test is the same.
Author
Owner

@lukedupin commented on GitHub (Sep 4, 2025):

Arch Linux, Framework 13, AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics

When I run the workaround:

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve

I was getting stack crash. But, I uninstalled ollama, and then re-installed ollama ollama-rocm. Magically it started working. I came here to post my stack trace but found that it worked.

Hope this helps someone.

<!-- gh-comment-id:3254296214 --> @lukedupin commented on GitHub (Sep 4, 2025): Arch Linux, Framework 13, AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics When I run the workaround: ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve ``` I was getting stack crash. But, I uninstalled ollama, and then re-installed ollama ollama-rocm. Magically it started working. I came here to post my stack trace but found that it worked. Hope this helps someone.
Author
Owner

@dhiltgen commented on GitHub (Mar 11, 2026):

Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the RC a try and let us know if you run into any problems.

<!-- gh-comment-id:4041996867 --> @dhiltgen commented on GitHub (Mar 11, 2026): Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the [RC a try](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions) and let us know if you run into any problems.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64003