[GH-ISSUE #3189] Add support for amd Radeon 780M gfx1103 - override works #48477

New Issue

GiteaMirror · 2026-04-28T08:36:25-05:00

GiteaMirror commented

2026-04-28 08:36:25 -05:00

Originally created by @thomas-0816 on GitHub (Mar 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3189

Originally assigned to: @dhiltgen on GitHub.

What are you trying to do?

Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04).

Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good.

Not working ("amdgpu [0] gfx1103 is not supported"):

OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..."
time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected"

My current workaround is to force gfx1102 (no issues so far):

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..."
time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory  8192M"

Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives
cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000)

Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios:

> CPU
OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       4.343514826s
load duration:        264.691µs
prompt eval duration: 168.205ms
prompt eval rate:     0.00 tokens/s
eval count:           26 token(s)
eval duration:        4.174563s
eval rate:            6.23 tokens/s

> GPU
HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       1.513455927s
load duration:        161.535µs
prompt eval duration: 65.979ms
prompt eval rate:     0.00 tokens/s
eval count:           27 token(s)
eval duration:        1.446805s
eval rate:            18.66 tokens/s

How should we solve this?

No response

What is the impact of not solving this?

No response

Anything else?

No response

Originally created by @thomas-0816 on GitHub (Mar 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3189 Originally assigned to: @dhiltgen on GitHub. ### What are you trying to do? Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04). Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good. Not working ("amdgpu [0] gfx1103 is not supported"): ``` OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20" time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..." time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]" time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected" ``` My current workaround is to force gfx1102 (no issues so far): ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20" time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..." time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]" time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M" time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 8192M" ``` Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000) Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios: ``` > CPU OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve ollama run llama2:latest "where was beethoven born?" --verbose Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. total duration: 4.343514826s load duration: 264.691µs prompt eval duration: 168.205ms prompt eval rate: 0.00 tokens/s eval count: 26 token(s) eval duration: 4.174563s eval rate: 6.23 tokens/s ``` ``` > GPU HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve ollama run llama2:latest "where was beethoven born?" --verbose Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. total duration: 1.513455927s load duration: 161.535µs prompt eval duration: 65.979ms prompt eval rate: 0.00 tokens/s eval count: 27 token(s) eval duration: 1.446805s eval rate: 18.66 tokens/s ``` ### How should we solve this? _No response_ ### What is the impact of not solving this? _No response_ ### Anything else? _No response_

GiteaMirror added the amd feature request linux windows labels 2026-04-28 08:36:26 -05:00

GiteaMirror closed this issue

2026-04-28 08:36:30 -05:00

GiteaMirror commented

2026-04-28 08:36:33 -05:00

@dhiltgen commented on GitHub (Mar 20, 2024):

Glad to hear the override is working. You shouldn't need to set OLLAMA_LLM_LIBRARY - it should auto-detect the Radeon GPU is present and use the correct library, only the HSA_OVERRIDE_GFX_VERSION should be required for your setup.

At the moment, we're not planning to auto-override as this can lead to crashes in some scenarios, so we're steering users to manually set this to ensure they understand what's going on. AMD is working on adding more general support for families of GPUs which should eliminate the need for this override approach.

@dhiltgen commented on GitHub (Mar 20, 2024): Glad to hear the override is working. You shouldn't need to set `OLLAMA_LLM_LIBRARY` - it should auto-detect the Radeon GPU is present and use the correct library, only the `HSA_OVERRIDE_GFX_VERSION` should be required for your setup. At the moment, we're not planning to auto-override as this can lead to crashes in some scenarios, so we're steering users to manually set this to ensure they understand what's going on. AMD is working on adding more general support for families of GPUs which should eliminate the need for this override approach.

GiteaMirror commented

2026-04-28 08:36:34 -05:00

@wowpala commented on GitHub (Mar 23, 2024):

Same with me, on Windows.

Please also add support on Windows.

$env:OLLAMA_ORIGINS="app://obsidian.md*"; ollama serve
time=2024-03-24T02:21:24.575+08:00 level=INFO source=images.go:806 msg="total blobs: 8"
time=2024-03-24T02:21:24.599+08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-24T02:21:24.601+08:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-24T02:21:24.601+08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to C:\\Users\\Jeff\\AppData\\Local\\Temp\\ollama3340791837\\runners ..."
time=2024-03-24T02:21:24.654+08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11.3 rocm_v5.7 cpu_avx2]"
time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library nvml.dll"
time=2024-03-24T02:21:29.424+08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-24T02:21:29.424+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-24T02:21:29.442+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103"
time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]"
time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-24T02:21:29.510+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-24T02:21:29.526+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970"
time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices"
time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M"
time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103"
time=2024-03-24T02:21:29.527+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]"
time=2024-03-24T02:21:29.528+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-24T02:21:29.528+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-24T02:21:29.590+08:00 level=INFO source=llm.go:85 msg="GPU not available, falling back to CPU"

@wowpala commented on GitHub (Mar 23, 2024): Same with me, on Windows. Please also add support on Windows. ``` $env:OLLAMA_ORIGINS="app://obsidian.md*"; ollama serve time=2024-03-24T02:21:24.575+08:00 level=INFO source=images.go:806 msg="total blobs: 8" time=2024-03-24T02:21:24.599+08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-24T02:21:24.601+08:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-24T02:21:24.601+08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to C:\\Users\\Jeff\\AppData\\Local\\Temp\\ollama3340791837\\runners ..." time=2024-03-24T02:21:24.654+08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11.3 rocm_v5.7 cpu_avx2]" time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-24T02:21:29.415+08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library nvml.dll" time=2024-03-24T02:21:29.424+08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-24T02:21:29.424+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-24T02:21:29.442+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103" time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]" time=2024-03-24T02:21:29.455+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-24T02:21:29.455+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-24T02:21:29.510+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-24T02:21:29.526+08:00 level=INFO source=amd_windows.go:40 msg="AMD Driver: 50422970" time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:69 msg="detected 1 hip devices" time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:87 msg="[0] Name: AMD Radeon(TM) 780M" time=2024-03-24T02:21:29.527+08:00 level=INFO source=amd_windows.go:90 msg="[0] GcnArchName: gfx1103" time=2024-03-24T02:21:29.527+08:00 level=WARN source=amd_windows.go:100 msg="amdgpu [0] gfx1103 is not supported by C:\\Users\\Jeff\\AppData\\Local\\Programs\\Ollama\\rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx906]" time=2024-03-24T02:21:29.528+08:00 level=WARN source=amd_windows.go:102 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-24T02:21:29.528+08:00 level=INFO source=amd_windows.go:128 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-24T02:21:29.590+08:00 level=INFO source=llm.go:85 msg="GPU not available, falling back to CPU" ```

GiteaMirror commented

2026-04-28 08:36:35 -05:00

@zlwu commented on GitHub (May 4, 2024):

Cool, 7840U has AVX512, is there any improvements over AVX2?

@zlwu commented on GitHub (May 4, 2024): Cool, 7840U has AVX512, is there any improvements over AVX2?

GiteaMirror commented

2026-04-28 08:36:36 -05:00

@Thanos-CP commented on GitHub (May 5, 2024):

hey i have the same problem, my Amd 780m didnt get recognized by ubuntu, you also cant use docm with this gpu, could you maybe help me fix this for me, Im knew to this and dont know what to do thank you

@Thanos-CP commented on GitHub (May 5, 2024): hey i have the same problem, my Amd 780m didnt get recognized by ubuntu, you also cant use docm with this gpu, could you maybe help me fix this for me, Im knew to this and dont know what to do thank you

GiteaMirror commented

2026-04-28 08:36:38 -05:00

@dhiltgen commented on GitHub (May 5, 2024):

To clarify, unfortunately the override is not supported on windows due to ROCm limitations. That's tracked via #3107

@dhiltgen commented on GitHub (May 5, 2024): To clarify, unfortunately the override is not supported on windows due to ROCm limitations. That's tracked via #3107

GiteaMirror commented

2026-04-28 08:36:39 -05:00

@TheophileH commented on GitHub (May 6, 2024):

Hello,
The override HSA_OVERRIDE_GFX_VERSION=10.3.0 doesn't work for me.

I'm using
AMD GPU: gfx1032 (x2)
OS: Ubuntu 22.04
CPU with no AVX support

I got the following error:

May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.541Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.592Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:37055/health": dial tcp 127.0.0.1:37055: connect: connection refused"
May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.091Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: -1 "
May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.092Z level=DEBUG source=server.go:832 msg="stopping llama server"

The full log is available here

Please, let me know what I should modify.

@TheophileH commented on GitHub (May 6, 2024): Hello, The override _HSA_OVERRIDE_GFX_VERSION=10.3.0_ doesn't work for me. I'm using AMD GPU: gfx1032 (x2) OS: Ubuntu 22.04 CPU with no AVX support I got the following error: > May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.541Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding" May 06 09:29:20 user ollama[1550]: time=2024-05-06T09:29:20.592Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:37055/health\": dial tcp 127.0.0.1:37055: connect: connection refused" May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.091Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: -1 " May 06 09:29:21 user ollama[1550]: time=2024-05-06T09:29:21.092Z level=DEBUG source=server.go:832 msg="stopping llama server" The full log is [available here](https://drive.google.com/file/d/1mOaelrj9qxqajab3YP1F5OBnhJpXqI9S/view?usp=sharing) Please, let me know what I should modify.

GiteaMirror commented

2026-04-28 08:36:40 -05:00

@dhiltgen commented on GitHub (May 7, 2024):

@TheophileH I'm not positive, but you may have hit #4105

@dhiltgen commented on GitHub (May 7, 2024): @TheophileH I'm not positive, but you may have hit #4105

GiteaMirror commented

2026-04-28 08:36:40 -05:00

@bryndin commented on GitHub (May 9, 2024):

I was able to make it run on AMD 780M GPU on Windows 11. (Perf increas of about x2)

Thanks to @likelovewant for providing instructions and the specific version of Ollama.
See https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU-/issues/3

@dhiltgen Is it possible to integrate this solution into Ollama? Nothing else but that build worked.

@bryndin commented on GitHub (May 9, 2024): I was able to make it run on AMD 780M GPU on Windows 11. (Perf increas of about x2) Thanks to @likelovewant for providing instructions and the specific version of Ollama. See https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU-/issues/3 @dhiltgen Is it possible to integrate this solution into Ollama? Nothing else but that build worked.

GiteaMirror commented

2026-04-28 08:36:41 -05:00

@TheophileH commented on GitHub (May 10, 2024):

@TheophileH I'm not positive, but you may have hit #4105

Thanks for your response, @dhiltgen.
As mentioned in #4105, I used OLLAMA_VERSION=0.1.34 and set OLLAMA_TMPDIR=/usr/share/ollama/.
I still get the same log but now, it runs forever before throwing the following error:

Error: timed out waiting for llama runner to start:

PS:
I have both rocm6 and cuda12 installed on my machine.

@TheophileH commented on GitHub (May 10, 2024): > @TheophileH I'm not positive, but you may have hit #4105 Thanks for your response, @dhiltgen. As mentioned in #4105, I used OLLAMA_VERSION=0.1.34 and set OLLAMA_TMPDIR=/usr/share/ollama/. I still get the same [log](https://github.com/ollama/ollama/issues/3189#issuecomment-2095825726) but now, it runs forever before throwing the following error: > Error: timed out waiting for llama runner to start: PS: I have both rocm6 and cuda12 installed on my machine.

GiteaMirror commented

2026-04-28 08:36:42 -05:00

@AlexHe99 commented on GitHub (Jul 2, 2024):

@thbley @TheophileH @dhiltgen

Please follow my practice to use iGPU 780M(gfx1103) at Linux.

https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md

What are you trying to do?

Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04).

Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good.

Not working ("amdgpu [0] gfx1103 is not supported"):

OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..."
time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected"

My current workaround is to force gfx1102 (no issues so far):

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20"
time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..."
time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6"
time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M"
time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory  8192M"

Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000)

Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios:

> CPU
OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       4.343514826s
load duration:        264.691µs
prompt eval duration: 168.205ms
prompt eval rate:     0.00 tokens/s
eval count:           26 token(s)
eval duration:        4.174563s
eval rate:            6.23 tokens/s

> GPU
HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve
ollama run llama2:latest "where was beethoven born?" --verbose
Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770.
total duration:       1.513455927s
load duration:        161.535µs
prompt eval duration: 65.979ms
prompt eval rate:     0.00 tokens/s
eval count:           27 token(s)
eval duration:        1.446805s
eval rate:            18.66 tokens/s

How should we solve this?

No response

What is the impact of not solving this?

No response

Anything else?

No response

@AlexHe99 commented on GitHub (Jul 2, 2024): @thbley @TheophileH @dhiltgen Please follow my practice to use iGPU 780M(gfx1103) at Linux. https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md > ### What are you trying to do? > Please support GPU acceleration using "AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics" on Linux (Ubuntu 22.04). > > Newer notebooks are shipped with AMD 7840U and support setting VRAM from 1GB to 8GB in the bios. With GPU acceleration only 1 vCPU is used and user experience with 7B models is quite good. > > Not working ("amdgpu [0] gfx1103 is not supported"): > > ``` > OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve > time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:806 msg="total blobs: 20" > time=2024-03-17T03:02:49.566+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" > time=2024-03-17T03:02:49.566+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" > time=2024-03-17T03:02:49.566+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3117537729/runners ..." > time=2024-03-17T03:02:51.423+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx cpu_avx2 cuda_v11 cpu]" > time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-17T03:02:51.424+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" > time=2024-03-17T03:02:51.427+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" > time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1103 is not supported by /tmp/ollama3117537729/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" > time=2024-03-17T03:02:51.430+01:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" > time=2024-03-17T03:02:51.430+01:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU" > time=2024-03-17T03:02:51.430+01:00 level=INFO source=routes.go:1133 msg="no GPU detected" > ``` > > My current workaround is to force gfx1102 (no issues so far): > > ``` > HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve > time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:806 msg="total blobs: 20" > time=2024-03-17T03:04:54.436+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" > time=2024-03-17T03:04:54.437+01:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" > time=2024-03-17T03:04:54.437+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2902102366/runners ..." > time=2024-03-17T03:04:56.315+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60000 cpu cpu_avx]" > time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-17T03:04:56.315+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-17T03:04:56.317+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" > time=2024-03-17T03:04:56.317+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-17T03:04:56.317+01:00 level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.3.6" > time=2024-03-17T03:04:56.318+01:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1103]" > time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 8192M" > time=2024-03-17T03:04:56.319+01:00 level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 8192M" > ``` > > Note: using "rocm_v6" was not working for me, so I chose "rocm_v60000" (ls /tmp/ollama758986631/runners/ gives cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60000) > > Some benchmarks using 7840U (numbers from second run), ubuntu 22.04, kernel 6.5, vram switched to 8GB in bios: > > ``` > > CPU > OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve > ollama run llama2:latest "where was beethoven born?" --verbose > Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. > total duration: 4.343514826s > load duration: 264.691µs > prompt eval duration: 168.205ms > prompt eval rate: 0.00 tokens/s > eval count: 26 token(s) > eval duration: 4.174563s > eval rate: 6.23 tokens/s > ``` > > ``` > > GPU > HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000u_avx2" /usr/bin/ollama serve > ollama run llama2:latest "where was beethoven born?" --verbose > Ludwig van Beethoven was born in Bonn, Germany on December 16, 1770. > total duration: 1.513455927s > load duration: 161.535µs > prompt eval duration: 65.979ms > prompt eval rate: 0.00 tokens/s > eval count: 27 token(s) > eval duration: 1.446805s > eval rate: 18.66 tokens/s > ``` > > ### How should we solve this? > _No response_ > > ### What is the impact of not solving this? > _No response_ > > ### Anything else? > _No response_

GiteaMirror commented

2026-04-28 08:36:43 -05:00

@pearsonc commented on GitHub (Jul 5, 2024):

Hey everyone, just wanted to chime in and say that I'd love to see support for AMD Radeon 780M (gfx1103) added to Ollama!

However, I think it's worth noting that this would require ROCm to add support for this specific chipset. To make this happen, I've opened a discussion and a feature request over on the ROCm GitHub page:

Rocm Feature Radeon 780M Discussion

Rocm Radeon 780M Feature Request

If you're interested in seeing this support added, please head on over to the discussion and give it a thumbs up! Let's help push for this change and make it happen.

Maybe add a reply to the discussion highlighting your support for the feature request.

@pearsonc commented on GitHub (Jul 5, 2024): Hey everyone, just wanted to chime in and say that I'd love to see support for AMD Radeon 780M (gfx1103) added to Ollama! However, I think it's worth noting that this would require ROCm to add support for this specific chipset. To make this happen, I've opened a discussion and a feature request over on the ROCm GitHub page: [Rocm Feature Radeon 780M Discussion](https://github.com/ROCm/ROCm/discussions/3360) [Rocm Radeon 780M Feature Request](https://github.com/ROCm/ROCm/issues/3398) If you're interested in seeing this support added, please head on over to the discussion and give it a thumbs up! Let's help push for this change and make it happen. Maybe add a reply to the discussion highlighting your support for the feature request.

GiteaMirror commented

2026-04-28 08:36:45 -05:00

@N4S4 commented on GitHub (Aug 26, 2024):

Hello,
not sure if this is the right place, I am experiencing an issue where when setting the override with systemd edit ollama.service it dores not take place after restart on ubuntu 24, ryzen 9 7940hs radeon 780m

output from journalctl -u ollama --no-pager

ollama[49742]: 2024/08/25 08:15:47 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:*

HSA_OVERRIDE_GFX_VERSION: remains empty

could someone help me?

@N4S4 commented on GitHub (Aug 26, 2024): Hello, not sure if this is the right place, I am experiencing an issue where when setting the override with systemd edit ollama.service it dores not take place after restart on ubuntu 24, ryzen 9 7940hs radeon 780m output from journalctl -u ollama --no-pager ollama[49742]: 2024/08/25 08:15:47 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* HSA_OVERRIDE_GFX_VERSION: remains empty could someone help me?

GiteaMirror commented

2026-04-28 08:36:46 -05:00

@manojparvathaneni commented on GitHub (Sep 24, 2024):

has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md?

@manojparvathaneni commented on GitHub (Sep 24, 2024): has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md?

GiteaMirror commented

2026-04-28 08:36:47 -05:00

@serg-157 commented on GitHub (Jan 29, 2025):

has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md?

I managed to run it on 680M with HSA_OVERRIDE_GFX_VERSION="10.3.0"

@serg-157 commented on GitHub (Jan 29, 2025): > has anyone tried this override option with Radeon 680M? Do y'all think this workaround described here will work with this model of graphics card: https://github.com/alexhegit/Playing-with-ROCm/blob/main/inference/LLM/Run_Ollama_with_AMD_iGPU780M-QuickStart.md? I managed to run it on 680M with HSA_OVERRIDE_GFX_VERSION="10.3.0"

GiteaMirror commented

2026-04-28 08:36:47 -05:00

@Cognifyi commented on GitHub (Jul 4, 2025):

env

gpu: AMD Ryzen 7 7840HS with Radeon 780M Graphics
sys: Fedora 42 kernel 6.15
rocm: 6.4
default VRAM: 4G, and can't increase it in the BIOS
ollama: client version is 0.9.3

rocminfo

ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ========== HSA Agents Agent 1 Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5137 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: Agent 2 Name: gfx1103 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB Chip ID: 5567(0x15bf) ASIC Revision: 9(0x9) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2700 BDFID: 25344 Internal Node ID: 1 Compute Unit: 12 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 40 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 14199684(0xd8ab84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 14199684(0xd8ab84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1103 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 Agent 3

Name: aie2 Uuid: AIE-XX Marketing Name: AIE-ML Vendor Name: AMD Feature: AGENT_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 1(0x1) Queue Min Size: 64(0x40) Queue Max Size: 64(0x40) Queue Type: SINGLE Node: 0 Device Type: DSP Cache Info: L2: 2048(0x800) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 0(0x0) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 0 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:0 Memory Properties: Features: AGENT_DISPATCH Pool Info: Pool 1 Segment: GLOBAL; FLAGS: KERNARG, COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65536(0x10000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:0KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *** Done ***

1. small model

1.1 cpu

OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
 for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     107.20 tokens/s
eval rate:            19.98 tokens/s
prompt eval rate:     792.38 tokens/s
eval rate:            49.13 tokens/s
prompt eval rate:     771.83 tokens/s
eval rate:            47.59 tokens/s
prompt eval rate:     795.08 tokens/s
eval rate:            48.96 tokens/s
prompt eval rate:     815.03 tokens/s
eval rate:            49.15 tokens/s
prompt eval rate:     715.84 tokens/s
eval rate:            49.33 tokens/s
prompt eval rate:     760.14 tokens/s
eval rate:            47.68 tokens/s
prompt eval rate:     723.07 tokens/s
eval rate:            48.66 tokens/s
prompt eval rate:     770.71 tokens/s
eval rate:            49.64 tokens/s
prompt eval rate:     815.97 tokens/s
eval rate:            47.99 tokens/s

1.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve
time=2025-07-04T16:54:53.855+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-07-04T16:54:53.907+08:00 level=INFO source=images.go:476 msg="total blobs: 111"
time=2025-07-04T16:54:53.930+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-07-04T16:54:53.947+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-04T16:54:53.947+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-04T16:54:53.954+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-07-04T16:54:53.955+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2
time=2025-07-04T16:54:53.955+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.7 GiB"

for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     125.13 tokens/s
eval rate:            64.57 tokens/s
prompt eval rate:     615.54 tokens/s
eval rate:            60.95 tokens/s
prompt eval rate:     622.10 tokens/s
eval rate:            59.60 tokens/s
prompt eval rate:     626.71 tokens/s
eval rate:            34.40 tokens/s
prompt eval rate:     307.29 tokens/s
eval rate:            43.58 tokens/s
prompt eval rate:     609.51 tokens/s
eval rate:            61.88 tokens/s
prompt eval rate:     606.04 tokens/s
eval rate:            60.33 tokens/s
prompt eval rate:     560.74 tokens/s
eval rate:            62.77 tokens/s
prompt eval rate:     528.20 tokens/s
eval rate:            32.86 tokens/s
prompt eval rate:     653.42 tokens/s
eval rate:            60.51 tokens/s

2. big model

2.1 cpu

OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve
for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     37.74 tokens/s
eval rate:            14.29 tokens/s
prompt eval rate:     224.57 tokens/s
eval rate:            14.50 tokens/s
prompt eval rate:     223.62 tokens/s
eval rate:            14.25 tokens/s
prompt eval rate:     214.52 tokens/s
eval rate:            14.41 tokens/s
prompt eval rate:     206.24 tokens/s
eval rate:            14.32 tokens/s
prompt eval rate:     220.67 tokens/s
eval rate:            14.12 tokens/s
prompt eval rate:     204.51 tokens/s
eval rate:            14.35 tokens/s
prompt eval rate:     222.51 tokens/s
eval rate:            14.04 tokens/s
prompt eval rate:     211.58 tokens/s
eval rate:            14.03 tokens/s
prompt eval rate:     209.59 tokens/s
eval rate:            14.12 tokens/s

2.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve
time=2025-07-04T17:08:41.498+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-07-04T17:08:41.565+08:00 level=INFO source=images.go:476 msg="total blobs: 111"
time=2025-07-04T17:08:41.592+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-07-04T17:08:41.613+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-04T17:08:41.614+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-04T17:08:41.621+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-07-04T17:08:41.622+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2
time=2025-07-04T17:08:41.622+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.6 GiB"

for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done
prompt eval rate:     57.79 tokens/s
eval rate:            14.60 tokens/s
prompt eval rate:     205.32 tokens/s
eval rate:            14.47 tokens/s
prompt eval rate:     208.91 tokens/s
eval rate:            14.19 tokens/s
prompt eval rate:     223.63 tokens/s
eval rate:            14.21 tokens/s
prompt eval rate:     207.97 tokens/s
eval rate:            14.26 tokens/s
prompt eval rate:     215.12 tokens/s
eval rate:            14.22 tokens/s
prompt eval rate:     215.77 tokens/s
eval rate:            14.61 tokens/s
prompt eval rate:     225.22 tokens/s
eval rate:            14.56 tokens/s
prompt eval rate:     217.88 tokens/s
eval rate:            14.52 tokens/s
prompt eval rate:     224.89 tokens/s
eval rate:            14.23 tokens/s

Questions

small model (< VRAM) works well for gpu

but big model (> VRAM) can't work well for gpu? why?

BIOS can't increase VRAM, max is 4G

@Cognifyi commented on GitHub (Jul 4, 2025): # env * gpu: AMD Ryzen 7 7840HS with Radeon 780M Graphics * sys: Fedora 42 kernel 6.15 * rocm: 6.4 * default VRAM: 4G, and can't increase it in the BIOS * ollama: client version is 0.9.3 <details> <summary>rocminfo</summary> <pre><code> ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7840HS with Radeon 780M Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5137 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1103 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB Chip ID: 5567(0x15bf) ASIC Revision: 9(0x9) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2700 BDFID: 25344 Internal Node ID: 1 Compute Unit: 12 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 40 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 14199684(0xd8ab84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 14199684(0xd8ab84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1103 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ******* Agent 3 ******* Name: aie2 Uuid: AIE-XX Marketing Name: AIE-ML Vendor Name: AMD Feature: AGENT_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 1(0x1) Queue Min Size: 64(0x40) Queue Max Size: 64(0x40) Queue Type: SINGLE Node: 0 Device Type: DSP Cache Info: L2: 2048(0x800) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 0(0x0) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 0 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:0 Memory Properties: Features: AGENT_DISPATCH Pool Info: Pool 1 Segment: GLOBAL; FLAGS: KERNARG, COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65536(0x10000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:0KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 28399368(0x1b15708) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *** Done *** </code></pre> </details> # 1. small model ## 1.1 cpu ``` OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 107.20 tokens/s eval rate: 19.98 tokens/s prompt eval rate: 792.38 tokens/s eval rate: 49.13 tokens/s prompt eval rate: 771.83 tokens/s eval rate: 47.59 tokens/s prompt eval rate: 795.08 tokens/s eval rate: 48.96 tokens/s prompt eval rate: 815.03 tokens/s eval rate: 49.15 tokens/s prompt eval rate: 715.84 tokens/s eval rate: 49.33 tokens/s prompt eval rate: 760.14 tokens/s eval rate: 47.68 tokens/s prompt eval rate: 723.07 tokens/s eval rate: 48.66 tokens/s prompt eval rate: 770.71 tokens/s eval rate: 49.64 tokens/s prompt eval rate: 815.97 tokens/s eval rate: 47.99 tokens/s ``` ![Image](https://github.com/user-attachments/assets/cff583de-8029-4afc-89d1-f7f3b61e7cf9) ## 1.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve time=2025-07-04T16:54:53.855+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-07-04T16:54:53.907+08:00 level=INFO source=images.go:476 msg="total blobs: 111" time=2025-07-04T16:54:53.930+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0" time=2025-07-04T16:54:53.947+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)" time=2025-07-04T16:54:53.947+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-07-04T16:54:53.954+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-07-04T16:54:53.955+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2 time=2025-07-04T16:54:53.955+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.7 GiB" ``` ``` for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3:1b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 125.13 tokens/s eval rate: 64.57 tokens/s prompt eval rate: 615.54 tokens/s eval rate: 60.95 tokens/s prompt eval rate: 622.10 tokens/s eval rate: 59.60 tokens/s prompt eval rate: 626.71 tokens/s eval rate: 34.40 tokens/s prompt eval rate: 307.29 tokens/s eval rate: 43.58 tokens/s prompt eval rate: 609.51 tokens/s eval rate: 61.88 tokens/s prompt eval rate: 606.04 tokens/s eval rate: 60.33 tokens/s prompt eval rate: 560.74 tokens/s eval rate: 62.77 tokens/s prompt eval rate: 528.20 tokens/s eval rate: 32.86 tokens/s prompt eval rate: 653.42 tokens/s eval rate: 60.51 tokens/s ``` ![Image](https://github.com/user-attachments/assets/1057dd0b-7ee4-45cf-ba4c-ef34d453a3e2) # 2. big model ## 2.1 cpu ``` OLLAMA_LLM_LIBRARY="cpu_avx2" /usr/bin/ollama serve for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 37.74 tokens/s eval rate: 14.29 tokens/s prompt eval rate: 224.57 tokens/s eval rate: 14.50 tokens/s prompt eval rate: 223.62 tokens/s eval rate: 14.25 tokens/s prompt eval rate: 214.52 tokens/s eval rate: 14.41 tokens/s prompt eval rate: 206.24 tokens/s eval rate: 14.32 tokens/s prompt eval rate: 220.67 tokens/s eval rate: 14.12 tokens/s prompt eval rate: 204.51 tokens/s eval rate: 14.35 tokens/s prompt eval rate: 222.51 tokens/s eval rate: 14.04 tokens/s prompt eval rate: 211.58 tokens/s eval rate: 14.03 tokens/s prompt eval rate: 209.59 tokens/s eval rate: 14.12 tokens/s ``` ![Image](https://github.com/user-attachments/assets/3eb52fea-b4b5-40dd-b0ac-d6863b4b28e0) ## 2.2 AMD Ryzen 7 7840HS with Radeon 780M Graphics ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000_avx2" HIP_VISIBLE_DEVICES=0 ollama serve time=2025-07-04T17:08:41.498+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION:11.0.2 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v60000_avx2 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/run/media/kali/Data/work/model/.ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-07-04T17:08:41.565+08:00 level=INFO source=images.go:476 msg="total blobs: 111" time=2025-07-04T17:08:41.592+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0" time=2025-07-04T17:08:41.613+08:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)" time=2025-07-04T17:08:41.614+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-07-04T17:08:41.621+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-07-04T17:08:41.622+08:00 level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.2 time=2025-07-04T17:08:41.622+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="4.0 GiB" available="2.6 GiB" ``` ``` for run in {1..10}; do echo "where was beethoven born?" | ollama run gemma3n:e4b --verbose 2>&1 >/dev/null | grep "eval rate:"; done prompt eval rate: 57.79 tokens/s eval rate: 14.60 tokens/s prompt eval rate: 205.32 tokens/s eval rate: 14.47 tokens/s prompt eval rate: 208.91 tokens/s eval rate: 14.19 tokens/s prompt eval rate: 223.63 tokens/s eval rate: 14.21 tokens/s prompt eval rate: 207.97 tokens/s eval rate: 14.26 tokens/s prompt eval rate: 215.12 tokens/s eval rate: 14.22 tokens/s prompt eval rate: 215.77 tokens/s eval rate: 14.61 tokens/s prompt eval rate: 225.22 tokens/s eval rate: 14.56 tokens/s prompt eval rate: 217.88 tokens/s eval rate: 14.52 tokens/s prompt eval rate: 224.89 tokens/s eval rate: 14.23 tokens/s ``` ![Image](https://github.com/user-attachments/assets/3b15d4fe-ba76-440f-8e98-17812a0ca879) # Questions small model (< VRAM) works well for gpu but big model (> VRAM) can't work well for gpu? why? BIOS can't increase VRAM, max is 4G

GiteaMirror commented

2026-04-28 08:36:48 -05:00

@MrUhu commented on GitHub (Jul 13, 2025):

So you wrote, that you have 4gb of VRAM available for the iGPU.

Questions

small model (< VRAM) works well for gpu

but big model (> VRAM) can't work well for gpu? why?

BIOS can't increase VRAM, max is 4G

The big model is 6.9gb big, If I'm not mistaken.
6.9gb is bigger than your VRAM can fit (you also need to consider, that the Key-Value-Cache needs to fit in VRAM for best performance). So about 1/3 of the layers of this model are calculated on the CPU because they need to be in system memory (RAM).
The limiting factor here is therefore your CPU.

That's why the performance between the CPU and GPU test is the same.

@MrUhu commented on GitHub (Jul 13, 2025): So you wrote, that you have 4gb of VRAM available for the iGPU. > # Questions > small model (< VRAM) works well for gpu > > but big model (> VRAM) can't work well for gpu? why? > > BIOS can't increase VRAM, max is 4G The big model is 6.9gb big, If I'm not mistaken. 6.9gb is bigger than your VRAM can fit (you also need to consider, that the Key-Value-Cache needs to fit in VRAM for best performance). So about 1/3 of the layers of this model are calculated on the CPU because they need to be in system memory (RAM). The limiting factor here is therefore your CPU. That's why the performance between the CPU and GPU test is the same.

GiteaMirror commented

2026-04-28 08:36:52 -05:00

@lukedupin commented on GitHub (Sep 4, 2025):

Arch Linux, Framework 13, AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics

When I run the workaround:

HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve

I was getting stack crash. But, I uninstalled ollama, and then re-installed ollama ollama-rocm. Magically it started working. I came here to post my stack trace but found that it worked.

Hope this helps someone.

@lukedupin commented on GitHub (Sep 4, 2025): Arch Linux, Framework 13, AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics When I run the workaround: ``` HSA_OVERRIDE_GFX_VERSION="11.0.2" OLLAMA_LLM_LIBRARY="rocm_v60000" /usr/bin/ollama serve ``` I was getting stack crash. But, I uninstalled ollama, and then re-installed ollama ollama-rocm. Magically it started working. I came here to post my stack trace but found that it worked. Hope this helps someone.

GiteaMirror commented

2026-04-28 08:36:54 -05:00

@dhiltgen commented on GitHub (Mar 11, 2026):

Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the RC a try and let us know if you run into any problems.

@dhiltgen commented on GitHub (Mar 11, 2026): Release 0.17.8 updates Linux to ROCm v7 which covers support for this GPU. Please give the [RC a try](https://github.com/ollama/ollama/blob/main/docs/linux.mdx#installing-specific-versions) and let us know if you run into any problems.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#48477