[GH-ISSUE #9180] Ollama not running with ROCm backend? #31739

Open
opened 2026-04-22 12:27:48 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @eliranwong on GitHub (Feb 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9180

What is the issue?

Hi, I've just done a quick speed test with Ollama and Llama.cpp, using the same model files, on my iGPU-only device.

Below are the results:

Ollama Speed Test Result

LLama.cpp Speed Test Result with CPU backend

LLama.cpp Speed Test Result with ROCm backend

Ollama speed is slight slower than Llama.cpp with CPU backend, but much slower than Llama.cpp with ROCm backend. It appears to me that Ollama may have a chance not running with ROCm backend. Did I miss something?

Relevant log output

[Ollama Speed Test Result](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#speed-test-with-ollama)

[LLama.cpp Speed Test Result with CPU backend](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#test-speed-with-llamacpp---cpu-backend)

[LLama.cpp Speed Test Result with ROCm backend](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#test-speed-with-llamacpp---rocm-backend)

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.5.11

Originally created by @eliranwong on GitHub (Feb 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9180 ### What is the issue? Hi, I've just done a quick speed test with Ollama and Llama.cpp, using the same model files, on my iGPU-only device. Below are the results: [Ollama Speed Test Result](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#speed-test-with-ollama) [LLama.cpp Speed Test Result with CPU backend](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#test-speed-with-llamacpp---cpu-backend) [LLama.cpp Speed Test Result with ROCm backend](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#test-speed-with-llamacpp---rocm-backend) Ollama speed is slight slower than Llama.cpp with CPU backend, but much slower than Llama.cpp with ROCm backend. It appears to me that Ollama may have a chance not running with ROCm backend. Did I miss something? ### Relevant log output ```shell [Ollama Speed Test Result](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#speed-test-with-ollama) [LLama.cpp Speed Test Result with CPU backend](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#test-speed-with-llamacpp---cpu-backend) [LLama.cpp Speed Test Result with ROCm backend](https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#test-speed-with-llamacpp---rocm-backend) ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.5.11
GiteaMirror added the performancebugamd labels 2026-04-22 12:27:48 -05:00
Author
Owner

@eliranwong commented on GitHub (Feb 18, 2025):

Remarks: On my another device, having dual discrete AMD GPUs, both Ollama and Llama.cpp runs with similar speed. The issue mentioned above is spotted on a iGPU-only device.

<!-- gh-comment-id:2664256234 --> @eliranwong commented on GitHub (Feb 18, 2025): Remarks: On my another device, having dual discrete AMD GPUs, both Ollama and Llama.cpp runs with similar speed. The issue mentioned above is spotted on a iGPU-only device.
Author
Owner

@rick-github commented on GitHub (Feb 18, 2025):

Server logs may aid in debugging.

<!-- gh-comment-id:2664267728 --> @rick-github commented on GitHub (Feb 18, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@elarbor commented on GitHub (Feb 18, 2025):

我也想知道部署后怎么对外调用http api

<!-- gh-comment-id:2664459167 --> @elarbor commented on GitHub (Feb 18, 2025): 我也想知道部署后怎么对外调用http api
Author
Owner

@eliranwong commented on GitHub (Feb 18, 2025):

Server logs may aid in debugging.

Appreciate your help. Here is the server log.

server_log_output.txt

From the server log, I spot these lines may be relevant:

Feb 16 07:31:24 ai ollama[1781]: time=2025-02-16T07:31:24.513Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Feb 16 07:31:24 ai ollama[1781]: time=2025-02-16T07:31:24.525Z level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Feb 16 07:31:24 ai ollama[1781]: time=2025-02-16T07:31:24.533Z level=WARN source=amd_linux.go:376 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942])" gpu_type=gfx1150 gpu=0 library=/usr/local/lib/ollama/rocm

Particularly, in my case, I need Ollama to support gfx type gfx1151, as I am running a RDNA-3.5-based iGPU, AMD Radeon™ 890M.

Information below may also help:

I am using ROCm version: 6.3.2

When I run:

ls /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*.dat

/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat
<!-- gh-comment-id:2664615148 --> @eliranwong commented on GitHub (Feb 18, 2025): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. Appreciate your help. Here is the server log. [server_log_output.txt](https://github.com/user-attachments/files/18839087/server_log_output.txt) From the server log, I spot these lines may be relevant: ``` Feb 16 07:31:24 ai ollama[1781]: time=2025-02-16T07:31:24.513Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Feb 16 07:31:24 ai ollama[1781]: time=2025-02-16T07:31:24.525Z level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Feb 16 07:31:24 ai ollama[1781]: time=2025-02-16T07:31:24.533Z level=WARN source=amd_linux.go:376 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942])" gpu_type=gfx1150 gpu=0 library=/usr/local/lib/ollama/rocm ``` Particularly, in my case, I need Ollama to support gfx type `gfx1151`, as I am running a RDNA-3.5-based iGPU, AMD Radeon™ 890M. Information below may also help: I am using ROCm version: 6.3.2 When I run: > ls /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*.dat ```output /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat ```
Author
Owner

@eliranwong commented on GitHub (Feb 18, 2025):

我也想知道部署后怎么对外调用http api

我不肯定是否完全明白您的提問,Ollama server 本身有 http api,llama.cpp 可用 llama-server, 從開發或用戶而言,我相信有好多方式,我個人用 AgentMake 支援 Ollama 和 Llama.cpp

<!-- gh-comment-id:2664621582 --> @eliranwong commented on GitHub (Feb 18, 2025): > 我也想知道部署后怎么对外调用http api 我不肯定是否完全明白您的提問,Ollama server 本身有 http api,llama.cpp 可用 [llama-server](https://github.com/ggml-org/llama.cpp#llama-server), 從開發或用戶而言,我相信有好多方式,我個人用 [AgentMake](https://github.com/eliranwong/agentmake) 支援 Ollama 和 Llama.cpp
Author
Owner

@rick-github commented on GitHub (Feb 18, 2025):

Have you tried overriding driver selection by setting HSA_OVERRIDE_GFX_VERSION?

<!-- gh-comment-id:2665059357 --> @rick-github commented on GitHub (Feb 18, 2025): Have you tried overriding driver selection by setting [`HSA_OVERRIDE_GFX_VERSION`](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides-on-linux)?
Author
Owner

@eliranwong commented on GitHub (Feb 18, 2025):

Have you tried overriding driver selection by setting HSA_OVERRIDE_GFX_VERSION?

I tried to use with and without HSA_OVERRIDE_GFX_VERSION, no difference in terms of speed with Ollama. But with llama.cpp speed is much faster with HSA_OVERRIDE_GFX_VERSION=11.5.1. No difference for Ollama.

I wrote some notes about the choice of 11.5.1 at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#remarks-about-hsa_override_gfx_version

By default, gfx number in rocminfo output is gfx1150, but overriding with HSA_OVERRIDE_GFX_VERSION=11.5.1 works for llama.cpp only.

<!-- gh-comment-id:2665088072 --> @eliranwong commented on GitHub (Feb 18, 2025): > Have you tried overriding driver selection by setting [`HSA_OVERRIDE_GFX_VERSION`](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides-on-linux)? I tried to use with and without HSA_OVERRIDE_GFX_VERSION, no difference in terms of speed with Ollama. But with llama.cpp speed is much faster with `HSA_OVERRIDE_GFX_VERSION=11.5.1`. No difference for Ollama. I wrote some notes about the choice of `11.5.1` at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu/blob/main/igpu_only/GPD_Pocket_4.md#remarks-about-hsa_override_gfx_version By default, gfx number in `rocminfo` output is gfx1150, but overriding with `HSA_OVERRIDE_GFX_VERSION=11.5.1` works for llama.cpp only.
Author
Owner

@eliranwong commented on GitHub (Feb 19, 2025):

I put all together the setup notes of ROCm, ollama, and llama.cpp at https://github.com/eliranwong/AMD_iGPU_AI_Setup, for your reference.

<!-- gh-comment-id:2669266870 --> @eliranwong commented on GitHub (Feb 19, 2025): I put all together the setup notes of ROCm, ollama, and llama.cpp at https://github.com/eliranwong/AMD_iGPU_AI_Setup, for your reference.
Author
Owner

@eliranwong commented on GitHub (Feb 25, 2025):

An update: I also tested llama.cpp that runs Vulkan backend. It is even faster than ROCm backend.

I documented the results at: https://github.com/eliranwong/AMD_iGPU_AI_Setup#speed-tests

If Ollama doesn't support ROCm well, would you consider to add Vulkan support to Ollama?

<!-- gh-comment-id:2681713273 --> @eliranwong commented on GitHub (Feb 25, 2025): An update: I also tested llama.cpp that runs Vulkan backend. It is even faster than ROCm backend. I documented the results at: https://github.com/eliranwong/AMD_iGPU_AI_Setup#speed-tests If Ollama doesn't support ROCm well, would you consider to add Vulkan support to Ollama?
Author
Owner

@rick-github commented on GitHub (Feb 25, 2025):

https://github.com/ollama/ollama/pull/5059

<!-- gh-comment-id:2681818326 --> @rick-github commented on GitHub (Feb 25, 2025): https://github.com/ollama/ollama/pull/5059
Author
Owner

@eliranwong commented on GitHub (Feb 25, 2025):

An update: I also tested llama.cpp that runs Vulkan backend. It is even faster than ROCm backend.

I documented the results at: https://github.com/eliranwong/AMD_iGPU_AI_Setup#speed-tests

If Ollama doesn't support ROCm well, would you consider to add Vulkan support to Ollama?

sorry, corrected the links for testing Vulkan backends

<!-- gh-comment-id:2681957477 --> @eliranwong commented on GitHub (Feb 25, 2025): > An update: I also tested llama.cpp that runs Vulkan backend. It is even faster than ROCm backend. > > I documented the results at: https://github.com/eliranwong/AMD_iGPU_AI_Setup#speed-tests > > If Ollama doesn't support ROCm well, would you consider to add Vulkan support to Ollama? sorry, corrected the links for testing Vulkan backends
Author
Owner

@eliranwong commented on GitHub (Feb 26, 2025):

I think this is the cause of the issue:

When I run:

ls /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*.dat

/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat

However, when I run:

ls /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx*.dat

/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx900.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx906.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx908.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx90a.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx940.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx941.dat
/usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx942.dat

I think the current ollama library is too old to support gfx1151.

<!-- gh-comment-id:2686341092 --> @eliranwong commented on GitHub (Feb 26, 2025): I think this is the cause of the issue: When I run: > ls /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*.dat ```output /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat ``` However, when I run: > ls /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx*.dat ```output /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1030.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1100.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1101.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx1102.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx900.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx906.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx908.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx90a.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx940.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx941.dat /usr/local/lib/ollama/rocm/rocblas/library/TensileLibrary_lazy_gfx942.dat ``` I think the current ollama library is too old to support `gfx1151`.
Author
Owner

@rick-github commented on GitHub (Feb 26, 2025):

https://github.com/ollama/ollama/pull/9304

<!-- gh-comment-id:2686348760 --> @rick-github commented on GitHub (Feb 26, 2025): https://github.com/ollama/ollama/pull/9304
Author
Owner

@eliranwong commented on GitHub (Feb 26, 2025):

#9304

Brilliant! Much appreciated.

<!-- gh-comment-id:2686350938 --> @eliranwong commented on GitHub (Feb 26, 2025): > [#9304](https://github.com/ollama/ollama/pull/9304) Brilliant! Much appreciated.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31739