[GH-ISSUE #2503] Support Radeon RX 5700 XT (gfx1010) #63502

New Issue

GiteaMirror · 2026-05-03T13:51:53-05:00

GiteaMirror commented

2026-05-03 13:51:53 -05:00

Originally created by @scabros on GitHub (Feb 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2503

Originally assigned to: @dhiltgen on GitHub.

Hi! Congrats for the great project!

We were trying to test ollama with AMD GPU support and we struggled a bit because the install guides are not clear that CUDA libraries are required for ollama (or llama.cpp) to work properly even with team red GPUs.

The error when running ollama run llama2 was (leaving here for reference):
...
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: ROCm0 KV buffer size = 1024.00 MiB
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_new_context_with_model: ROCm_Host input buffer size = 13.01 MiB
llama_new_context_with_model: ROCm0 compute buffer size = 164.00 MiB
llama_new_context_with_model: ROCm_Host compute buffer size = 8.00 MiB
llama_new_context_with_model: graph splits (measure): 3
CUDA error: shared object initialization failed
current device: 0, in function ggml_cuda_op_flatten at /home/devel/ollama/llm/llama.cpp/ggml-cuda.cu:9208
hipGetLastError()
loading library /tmp/ollama3700311510/rocm_v6/libext_server.so
GGML_ASSERT: /home/devel/ollama/llm/llama.cpp/ggml-cuda.cu:241: !"CUDA error"
[New LWP 4411]
[New LWP 4412]
[New LWP 4413]
[New LWP 4414]
[New LWP 4415]
...

after we installed the cuda libraries as per the instructions HERE the problem went away.

we also faced problems with ROCm 6.0.2 support for different gpu models (in our case it is a RX 5700 XT, arch gfx1010), the current binary packages doesn't contain TensileLibrary.dat (that somehow "maps" the kernels objects to use with different GPUs).

We had this error:

time=2024-02-09T17:05:38.481Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3752973675/rocm_v6/libext_server.so"
time=2024-02-09T17:05:38.481Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010
free(): invalid pointer
SIGABRT: abort
PC=0x7f4bb13739fc m=3 sigcode=18446744073709551610

so I downloaded full rocm source and tried to build again, just to get the right command for compiling TensileLibrary.dat :

this was the command that cmake uses:
'/home/devel/rocBLAS/build/virtualenv/lib/python3.10/site-packages/Tensile/bin/TensileCreateLibrary' '--merge-files' '--separate-architectures' '--lazy-library-loading' '--no-short-file-names' '--library-print-debug' '--code-object-version=default' '--cxx-compiler=hipcc' '--jobs=14' '--library-format=msgpack' '--architecture=gfx1012' '/home/devel/rocBLAS/library/src/blas3/Tensile/Logic/asm_full' '/home/devel/rocBLAS/build/Tensile' 'HIP'

this is the command I used to generate a new TensileLibrary.dat:
'/home/devel/rocBLAS/build/virtualenv/lib/python3.10/site-packages/Tensile/bin/TensileCreateLibrary' '--merge-files' '--no-short-file-names' '--library-print-debug' '--code-object-version=default' '--cxx-compiler=hipcc' '--jobs=14' '--library-format=msgpack' '/home/devel/rocBLAS/library/src/blas3/Tensile/Logic/asm_full' '/home/devel/rocBLAS/build/Tensile' 'HIP'

(i removed '--separate-architectures' '--lazy-library-loading' as per instructions in this bug )

Hope this helps to others! Thanks again!

Originally created by @scabros on GitHub (Feb 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2503 Originally assigned to: @dhiltgen on GitHub. Hi! Congrats for the great project! We were trying to test ollama with AMD GPU support and we struggled a bit because the install guides are not clear that CUDA libraries are required for ollama (or llama.cpp) to work properly even with team red GPUs. The error when running ollama run llama2 was (leaving here for reference): ... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: ROCm_Host input buffer size = 13.01 MiB llama_new_context_with_model: ROCm0 compute buffer size = 164.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 8.00 MiB llama_new_context_with_model: graph splits (measure): 3 **CUDA error: shared object initialization failed current device: 0, in function ggml_cuda_op_flatten at /home/devel/ollama/llm/llama.cpp/ggml-cuda.cu:9208 hipGetLastError() loading library /tmp/ollama3700311510/rocm_v6/libext_server.so GGML_ASSERT: /home/devel/ollama/llm/llama.cpp/ggml-cuda.cu:241: !"CUDA error"** [New LWP 4411] [New LWP 4412] [New LWP 4413] [New LWP 4414] [New LWP 4415] ... after we installed the cuda libraries as per the instructions [HERE](https://developer.nvidia.com/cuda-downloads) the problem went away. we also faced problems with ROCm 6.0.2 support for different gpu models (in our case it is a RX 5700 XT, arch gfx1010), the current binary packages doesn't contain TensileLibrary.dat (that somehow "maps" the kernels objects to use with different GPUs). We had this error: time=2024-02-09T17:05:38.481Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3752973675/rocm_v6/libext_server.so" time=2024-02-09T17:05:38.481Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" **rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010** free(): invalid pointer SIGABRT: abort PC=0x7f4bb13739fc m=3 sigcode=18446744073709551610 so I downloaded full rocm source and tried to build again, just to get the right command for compiling TensileLibrary.dat : this was the command that cmake uses: '/home/devel/rocBLAS/build/virtualenv/lib/python3.10/site-packages/Tensile/bin/TensileCreateLibrary' '--merge-files' '--separate-architectures' '--lazy-library-loading' '--no-short-file-names' '--library-print-debug' '--code-object-version=default' '--cxx-compiler=hipcc' '--jobs=14' '--library-format=msgpack' '--architecture=gfx1012' '/home/devel/rocBLAS/library/src/blas3/Tensile/Logic/asm_full' '/home/devel/rocBLAS/build/Tensile' 'HIP' this is the command I used to generate a new TensileLibrary.dat: '/home/devel/rocBLAS/build/virtualenv/lib/python3.10/site-packages/Tensile/bin/TensileCreateLibrary' '--merge-files' '--no-short-file-names' '--library-print-debug' '--code-object-version=default' '--cxx-compiler=hipcc' '--jobs=14' '--library-format=msgpack' '/home/devel/rocBLAS/library/src/blas3/Tensile/Logic/asm_full' '/home/devel/rocBLAS/build/Tensile' 'HIP' (i removed '--separate-architectures' '--lazy-library-loading' as per instructions [in this bug](https://github.com/ROCm/Tensile/issues/1757) ) Hope this helps to others! Thanks again!

GiteaMirror added the documentation amd labels 2026-05-03 13:51:57 -05:00

GiteaMirror commented

2026-05-03 13:52:01 -05:00

@dhiltgen commented on GitHub (Mar 12, 2024):

Unfortunately, the official ROCm builds from AMD don't currently support the RX 5700 XT.

With the new release 0.1.29, we'll now detect this incompatibility, and gracefully fall back to CPU mode and log some information in the server log about what happened. There is a facility to override this manually with HSA_OVERRIDE_GFX_VERSION however, I'm not sure what supported gfx target will work for this GPU.

AMD is working on updates to a future version of ROCm v6 that will support GPU families, so there's hope this will be added in the future. There's also a possibility to build the ROCm tensor library to add specific targets, however I haven't investigated the details on that yet.

@dhiltgen commented on GitHub (Mar 12, 2024): Unfortunately, the official ROCm builds from AMD don't currently support the RX 5700 XT. With the new release 0.1.29, we'll now detect this incompatibility, and gracefully fall back to CPU mode and log some information in the server log about what happened. There is a facility to override this manually with `HSA_OVERRIDE_GFX_VERSION` however, I'm not sure what supported gfx target will work for this GPU. AMD is working on updates to a future version of ROCm v6 that will support GPU families, so there's hope this will be added in the future. There's also a possibility to build the ROCm tensor library to add specific targets, however I haven't investigated the details on that yet.

GiteaMirror commented

2026-05-03 13:52:02 -05:00

@jpmcb commented on GitHub (Mar 15, 2024):

It may be a good idea to keep an eye on ROCm/Tensile now that gfx1010 support has merged: https://github.com/ROCm/Tensile/pull/1897 which, from my understanding, once they cut a new version of that driver, it should flow into the main ROCm/ROCm build.

So, depending on how ollama detects GPU support, it may just start working.

I also ran into this and would love support of slightly older Radeon RX 5600:

❯ sudo ollama serve

...

time=2024-03-15T14:15:29.915-06:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-15T14:15:29.915-06:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-15T14:15:29.927-06:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-15T14:15:29.927-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-15T14:15:29.927-06:00 level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-15T14:15:29.928-06:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1010]"
time=2024-03-15T14:15:29.943-06:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1010 is not supported by /tmp/ollama402405963/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-15T14:15:29.943-06:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-15T14:15:29.943-06:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-15T14:15:29.943-06:00 level=INFO source=routes.go:1133 msg="no GPU detected"

but thanks for building in detection of AMD gpu families that aren't supported: saved me lots of useless debugging time 👀 happy to help validate any patches: feel free to @ me

@jpmcb commented on GitHub (Mar 15, 2024): It may be a good idea to keep an eye on `ROCm/Tensile` now that `gfx1010` support has merged: https://github.com/ROCm/Tensile/pull/1897 which, from my understanding, once they cut a new version of that driver, it should flow into the main `ROCm/ROCm` build. So, depending on how ollama detects GPU support, it may just start working. --- I also ran into this and would love support of slightly older Radeon RX 5600: ``` ❯ sudo ollama serve ... time=2024-03-15T14:15:29.915-06:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-15T14:15:29.915-06:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-15T14:15:29.927-06:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-15T14:15:29.927-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-15T14:15:29.927-06:00 level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-15T14:15:29.928-06:00 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1010]" time=2024-03-15T14:15:29.943-06:00 level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1010 is not supported by /tmp/ollama402405963/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-03-15T14:15:29.943-06:00 level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-15T14:15:29.943-06:00 level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU" time=2024-03-15T14:15:29.943-06:00 level=INFO source=routes.go:1133 msg="no GPU detected" ``` but thanks for building in detection of AMD gpu families that aren't supported: saved me lots of useless debugging time 👀 happy to help validate any patches: feel free to @ me

GiteaMirror commented

2026-05-03 13:52:02 -05:00

@puccaso commented on GitHub (Mar 28, 2024):

Having this same issue - running suse tumbleweed.
due to AMD only supporting opensuse enterprise, i can't actually get the ROCM drivers installed completely... i'd have to change to ubuntu..
i have thought about trying distrobox, but the requirements show a kernel has to be installed and dkim involved so i dont want to break current system testing this.
i will eventually find a way though.
love this project indeed!

@puccaso commented on GitHub (Mar 28, 2024): Having this same issue - running suse tumbleweed. due to AMD only supporting opensuse enterprise, i can't actually get the ROCM drivers installed completely... i'd have to change to ubuntu.. i have thought about trying distrobox, but the requirements show a kernel has to be installed and dkim involved so i dont want to break current system testing this. i will eventually find a way though. love this project indeed!

GiteaMirror commented

2026-05-03 13:52:03 -05:00

@aionik-me commented on GitHub (Apr 17, 2024):

The gfx1010 should work, but you'll need to manually override what's allowed and in some cases map it to closest supported type. I suggest you try nixos flake of ollama here: [https://github.com/abysssol/ollama-flake]

Only modifications that you may need for your "unsupported" gpu on nixos:

You may need to map rocblas to support your gpu (defaults): sudo ln -s /nix/store/nwij51jpnczrf8qc0zhdnd4rhxdwsgsj-rocblas-5.7.1/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat /nix/store/nwij51jpnczrf8qc0zhdnd4rhxdwsgsj-rocblas-5.7.1/lib/rocblas/library/TensileLibrary.dat
You may also need to pass or export this type of env variable: HSA_OVERRIDE_GFX_VERSION: "10.3.0"
Match these to your closest GPU capabilities.

netdata is a good way to monitor remote, dig into specific periods of time, isolate when GPU is being used, or balanced, etc.

Please let me know if you have questions or feedkback.

@aionik-me commented on GitHub (Apr 17, 2024): The gfx1010 should work, but you'll need to manually override what's allowed and in some cases map it to closest supported type. I suggest you try nixos flake of ollama here: [https://github.com/abysssol/ollama-flake] Only modifications that you may need for your "unsupported" gpu on nixos: - You may need to map rocblas to support your gpu (defaults): `sudo ln -s /nix/store/nwij51jpnczrf8qc0zhdnd4rhxdwsgsj-rocblas-5.7.1/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat /nix/store/nwij51jpnczrf8qc0zhdnd4rhxdwsgsj-rocblas-5.7.1/lib/rocblas/library/TensileLibrary.dat` - You may also need to pass or export this type of env variable: HSA_OVERRIDE_GFX_VERSION: "10.3.0" - Match these to your closest GPU capabilities. netdata is a good way to monitor remote, dig into specific periods of time, isolate when GPU is being used, or balanced, etc. Please let me know if you have questions or feedkback.

GiteaMirror commented

2026-05-03 13:52:04 -05:00

@skidunion commented on GitHub (May 25, 2024):

Match these to your closest GPU capabilities

@aionik-me could you provide some more insight on what that means? I followed the previous steps on Windows, but I just get gibberish when I try to prompt the model with anything. With CPU compute it works fine.

2024/05/25 16:52:11 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR:C:\\Users\\<user>\\AppData\\Local\\Programs\\Ollama\\ollama_runners OLLAMA_TMPDIR:]"
time=2024-05-25T16:52:11.474+03:00 level=INFO source=images.go:704 msg="total blobs: 20"
time=2024-05-25T16:52:11.475+03:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-25T16:52:11.476+03:00 level=INFO source=routes.go:1054 msg="Listening on 192.168.0.100:11434 (version 0.1.38)"
time=2024-05-25T16:52:11.476+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v5.7 cpu cpu_avx cpu_avx2 cuda_v11.3]"
time=2024-05-25T16:52:11.510+03:00 level=INFO source=amd_windows.go:63 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
time=2024-05-25T16:52:11.861+03:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1010:xnack- driver=0.0 name="AMD Radeon RX 5700 XT" total="8.0 GiB" available="7.9 GiB"
time=2024-05-25T16:52:18.161+03:00 level=INFO source=amd_windows.go:63 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
time=2024-05-25T16:52:20.404+03:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="7.9 GiB" memory.required.full="5.4 GiB" memory.required.partial="5.4 GiB" memory.required.kv="512.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-05-25T16:52:20.404+03:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="7.9 GiB" memory.required.full="5.4 GiB" memory.required.partial="5.4 GiB" memory.required.kv="512.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-05-25T16:52:20.409+03:00 level=INFO source=server.go:320 msg="starting llama server" cmd="C:\\Users\\<user>\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\rocm_v5.7\\ollama_llama_server.exe --model C:\\Users\\<user>\\.ollama\\models\\blobs\\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 4096 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --parallel 1 --port 49782"
time=2024-05-25T16:52:20.412+03:00 level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-05-25T16:52:20.412+03:00 level=INFO source=server.go:504 msg="waiting for llama runner to start responding"
time=2024-05-25T16:52:20.412+03:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
INFO [wmain] build info | build=2770 commit="952d03d" tid="9456" timestamp=1716645140
INFO [wmain] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="9456" timestamp=1716645140 total_threads=16
INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="49782" tid="9456" timestamp=1716645140
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:\Users\<user>\.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
time=2024-05-25T16:52:20.663+03:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'

rocBLAS warning: No paths matched C:\Users\<user>\AppData\Local\Programs\Ollama\rocm\\rocblas\library\*gfx1010*co. Make sure that ROCBLAS_TENSILE_LIBPATH is set correctly.
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 5700 XT, compute capability 10.1, VMM: no
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  4155.99 MiB
llm_load_tensors:        CPU buffer size =   281.81 MiB
......................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   512.00 MiB
llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.50 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   296.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    16.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
INFO [wmain] model loaded | tid="9456" timestamp=1716645143
time=2024-05-25T16:52:23.676+03:00 level=INFO source=server.go:545 msg="llama runner started in 3.26 seconds"
[GIN] 2024/05/25 - 16:52:31 | 200 |   13.7334707s |   192.168.0.106 | POST     "/api/chat"

@skidunion commented on GitHub (May 25, 2024): > Match these to your closest GPU capabilities @aionik-me could you provide some more insight on what that means? I followed the previous steps on Windows, but I just get gibberish when I try to prompt the model with anything. With CPU compute it works fine. ![image](https://github.com/ollama/ollama/assets/34306421/9cb24345-9d23-4ed6-8c1e-df7e752f2599) ``` 2024/05/25 16:52:11 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR:C:\\Users\\<user>\\AppData\\Local\\Programs\\Ollama\\ollama_runners OLLAMA_TMPDIR:]" time=2024-05-25T16:52:11.474+03:00 level=INFO source=images.go:704 msg="total blobs: 20" time=2024-05-25T16:52:11.475+03:00 level=INFO source=images.go:711 msg="total unused blobs removed: 0" time=2024-05-25T16:52:11.476+03:00 level=INFO source=routes.go:1054 msg="Listening on 192.168.0.100:11434 (version 0.1.38)" time=2024-05-25T16:52:11.476+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v5.7 cpu cpu_avx cpu_avx2 cuda_v11.3]" time=2024-05-25T16:52:11.510+03:00 level=INFO source=amd_windows.go:63 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 time=2024-05-25T16:52:11.861+03:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1010:xnack- driver=0.0 name="AMD Radeon RX 5700 XT" total="8.0 GiB" available="7.9 GiB" time=2024-05-25T16:52:18.161+03:00 level=INFO source=amd_windows.go:63 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 time=2024-05-25T16:52:20.404+03:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="7.9 GiB" memory.required.full="5.4 GiB" memory.required.partial="5.4 GiB" memory.required.kv="512.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="677.5 MiB" time=2024-05-25T16:52:20.404+03:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="7.9 GiB" memory.required.full="5.4 GiB" memory.required.partial="5.4 GiB" memory.required.kv="512.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="677.5 MiB" time=2024-05-25T16:52:20.409+03:00 level=INFO source=server.go:320 msg="starting llama server" cmd="C:\\Users\\<user>\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\rocm_v5.7\\ollama_llama_server.exe --model C:\\Users\\<user>\\.ollama\\models\\blobs\\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 4096 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --parallel 1 --port 49782" time=2024-05-25T16:52:20.412+03:00 level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-05-25T16:52:20.412+03:00 level=INFO source=server.go:504 msg="waiting for llama runner to start responding" time=2024-05-25T16:52:20.412+03:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error" INFO [wmain] build info | build=2770 commit="952d03d" tid="9456" timestamp=1716645140 INFO [wmain] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="9456" timestamp=1716645140 total_threads=16 INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="49782" tid="9456" timestamp=1716645140 llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:\Users\<user>\.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors time=2024-05-25T16:52:20.663+03:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens definition check successful ( 256/128256 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' rocBLAS warning: No paths matched C:\Users\<user>\AppData\Local\Programs\Ollama\rocm\\rocblas\library\*gfx1010*co. Make sure that ROCBLAS_TENSILE_LIBPATH is set correctly. ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 5700 XT, compute capability 10.1, VMM: no llm_load_tensors: ggml ctx size = 0.30 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 4155.99 MiB llm_load_tensors: CPU buffer size = 281.81 MiB ...................................................................................... llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 512.00 MiB llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.50 MiB llama_new_context_with_model: ROCm0 compute buffer size = 296.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 16.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 INFO [wmain] model loaded | tid="9456" timestamp=1716645143 time=2024-05-25T16:52:23.676+03:00 level=INFO source=server.go:545 msg="llama runner started in 3.26 seconds" [GIN] 2024/05/25 - 16:52:31 | 200 | 13.7334707s | 192.168.0.106 | POST "/api/chat" ```

GiteaMirror commented

2026-05-03 13:52:05 -05:00

@Zippy-boy commented on GitHub (Jun 8, 2024):

Iv been trying to do the same with my RX5700.
I noticed this when trying to serve:
time=2024-06-08T13:06:06.875Z level=WARN source=amd_linux.go:296 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1010 library=/opt/rocm/lib supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"

Im pretty sure that the GFX1010 is supported my ROCm now, so is there an amdgpu supported_type file that i can add the gfx1010 to. And then see if it works with it? Im not very knowledgeable in this so this may be stupid :/

@Zippy-boy commented on GitHub (Jun 8, 2024): Iv been trying to do the same with my RX5700. I noticed this when trying to serve: `time=2024-06-08T13:06:06.875Z level=WARN source=amd_linux.go:296 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1010 library=/opt/rocm/lib supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"` Im pretty sure that the GFX1010 is supported my ROCm now, so is there an amdgpu supported_type file that i can add the gfx1010 to. And then see if it works with it? Im not very knowledgeable in this so this may be stupid :/

GiteaMirror commented

2026-05-03 13:52:06 -05:00

@b0o commented on GitHub (Jun 11, 2024):

I was trying to run HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and getting Error: llama runner process has terminated: signal: aborted (core dumped) error:Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010.

I saw a lot of suggestions to use HSA_OVERRIDE_GFX_VERSION="10.3.0" but that caused my GPU to crash.

However, I tried symlinking TensileLibrary_lazy_gfx1010.dat to TensileLibrary_lazy_gfx1030.dat: sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat, and now it seems to be working!

Running HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and then running a model like ollama run mistral successfully loads and runs the model on my GPU.

@b0o commented on GitHub (Jun 11, 2024): I was trying to run `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and getting `Error: llama runner process has terminated: signal: aborted (core dumped) error:Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010`. I saw a lot of suggestions to use `HSA_OVERRIDE_GFX_VERSION="10.3.0"` but that caused my GPU to crash. However, I tried symlinking `TensileLibrary_lazy_gfx1010.dat` to `TensileLibrary_lazy_gfx1030.dat`: `sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat`, and now it seems to be working! Running `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and then running a model like `ollama run mistral` successfully loads and runs the model on my GPU.

GiteaMirror commented

2026-05-03 13:52:07 -05:00

@thednp commented on GitHub (Jun 25, 2024):

Any news on this?

@thednp commented on GitHub (Jun 25, 2024): Any news on this?

GiteaMirror commented

2026-05-03 13:52:08 -05:00

@kdta91 commented on GitHub (Jul 2, 2024):

I was trying to run HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and getting Error: llama runner process has terminated: signal: aborted (core dumped) error:Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010.

I saw a lot of suggestions to use HSA_OVERRIDE_GFX_VERSION="10.3.0" but that caused my GPU to crash.

However, I tried symlinking TensileLibrary_lazy_gfx1010.dat to TensileLibrary_lazy_gfx1030.dat: sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat, and now it seems to be working!

Running HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and then running a model like ollama run mistral successfully loads and runs the model on my GPU.

This works for me in my case. Using RX 5600 XT, however, the VRAM usage is at constant >=5GB.

@kdta91 commented on GitHub (Jul 2, 2024): > I was trying to run `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and getting `Error: llama runner process has terminated: signal: aborted (core dumped) error:Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010`. > > I saw a lot of suggestions to use `HSA_OVERRIDE_GFX_VERSION="10.3.0"` but that caused my GPU to crash. > > However, I tried symlinking `TensileLibrary_lazy_gfx1010.dat` to `TensileLibrary_lazy_gfx1030.dat`: `sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat`, and now it seems to be working! > > Running `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and then running a model like `ollama run mistral` successfully loads and runs the model on my GPU. This works for me in my case. Using RX 5600 XT, however, the VRAM usage is at constant >=5GB.

GiteaMirror commented

2026-05-03 13:52:09 -05:00

@murlakatamenka commented on GitHub (Jul 16, 2024):

I was trying to run HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and getting Error: llama runner process has terminated: signal: aborted (core dumped) error:Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010.

I saw a lot of suggestions to use HSA_OVERRIDE_GFX_VERSION="10.3.0" but that caused my GPU to crash.

However, I tried symlinking TensileLibrary_lazy_gfx1010.dat to TensileLibrary_lazy_gfx1030.dat: sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat, and now it seems to be working!

Running HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and then running a model like ollama run mistral successfully loads and runs the model on my GPU.

Do you run Arch with ollama-rocm package?

@murlakatamenka commented on GitHub (Jul 16, 2024): > I was trying to run `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and getting `Error: llama runner process has terminated: signal: aborted (core dumped) error:Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010`. > > I saw a lot of suggestions to use `HSA_OVERRIDE_GFX_VERSION="10.3.0"` but that caused my GPU to crash. > > However, I tried symlinking `TensileLibrary_lazy_gfx1010.dat` to `TensileLibrary_lazy_gfx1030.dat`: `sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat`, and now it seems to be working! > > Running `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and then running a model like `ollama run mistral` successfully loads and runs the model on my GPU. Do you run Arch with `ollama-rocm` package?

GiteaMirror commented

2026-05-03 13:52:10 -05:00

@b0o commented on GitHub (Jul 16, 2024):

Yes, I'm on Arch, and I'm using ollama-rocm-git but IIRC ollama-rocm worked as well.

@b0o commented on GitHub (Jul 16, 2024): Yes, I'm on Arch, and I'm using `ollama-rocm-git` but IIRC `ollama-rocm` worked as well.

GiteaMirror commented

2026-05-03 13:52:11 -05:00

@Mr-Ples commented on GitHub (Jul 18, 2024):

guy has it working here
https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832

@Mr-Ples commented on GitHub (Jul 18, 2024): guy has it working here https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832

GiteaMirror commented

2026-05-03 13:52:14 -05:00

@d8rt8v commented on GitHub (Jul 31, 2024):

Yep, confirming this on win 11.

guy has it working here #2453 (comment)

Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases
Download rocblas.for.gfx1010-.xnack-.with.building.guide.7z from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7.
Go to C:\Users\User\AppData\Local\Programs\Ollama\rocm and copy-replace rocblas.dll alongside library folder from downloaded zip.

@d8rt8v commented on GitHub (Jul 31, 2024): Yep, confirming this on win 11. > guy has it working here [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832) 1. Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases 2. Download `rocblas.for.gfx1010-.xnack-.with.building.guide.7z` from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7. 3. Go to `C:\Users\User\AppData\Local\Programs\Ollama\rocm` and copy-replace `rocblas.dll` alongside `library` folder from downloaded zip.

GiteaMirror commented

2026-05-03 13:52:16 -05:00

@thednp commented on GitHub (Jul 31, 2024):

Yep, confirming this on win 11.

guy has it working here #2453 (comment)

Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases

Download rocblas.for.gfx1010-.xnack-.with.building.guide.7z from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7.

Go to C:\Users\User\AppData\Local\Programs\Ollama\rocm and copy-replace rocblas.dll alongside library folder from downloaded zip.

Have anyone recreated this in Linux by any chance?

@thednp commented on GitHub (Jul 31, 2024): > Yep, confirming this on win 11. > > > guy has it working here [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832) > > 1. Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases > 2. Download `rocblas.for.gfx1010-.xnack-.with.building.guide.7z` from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7. > 3. Go to `C:\Users\User\AppData\Local\Programs\Ollama\rocm` and copy-replace `rocblas.dll` alongside `library` folder from downloaded zip. Have anyone recreated this in Linux by any chance?

GiteaMirror commented

2026-05-03 13:52:18 -05:00

@LeonKraim commented on GitHub (Aug 2, 2024):

Doesn't work on win 10 with gfx1010.

Yep, confirming this on win 11.

guy has it working here #2453 (comment)

Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases

Download rocblas.for.gfx1010-.xnack-.with.building.guide.7z from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7.

Go to C:\Users\User\AppData\Local\Programs\Ollama\rocm and copy-replace rocblas.dll alongside library folder from downloaded zip.

@LeonKraim commented on GitHub (Aug 2, 2024): Doesn't work on win 10 with gfx1010. > Yep, confirming this on win 11. > > > guy has it working here [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832) > > 1. Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases > 2. Download `rocblas.for.gfx1010-.xnack-.with.building.guide.7z` from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7. > 3. Go to `C:\Users\User\AppData\Local\Programs\Ollama\rocm` and copy-replace `rocblas.dll` alongside `library` folder from downloaded zip.

GiteaMirror commented

2026-05-03 13:52:19 -05:00

@JeremyLG commented on GitHub (Aug 2, 2024):

I can confirm it works on Win10 for gfx1010 on my side.
Very very cool, thanks a lot

@JeremyLG commented on GitHub (Aug 2, 2024): I can confirm it works on Win10 for gfx1010 on my side. Very very cool, thanks a lot

GiteaMirror commented

2026-05-03 13:52:19 -05:00

@d8rt8v commented on GitHub (Aug 3, 2024):

Have you installed AMD HIP 5.7.1 previously?

Doesn't work on win 10 with gfx1010.

Yep, confirming this on win 11.

guy has it working here #2453 (comment)

Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases

Download rocblas.for.gfx1010-.xnack-.with.building.guide.7z from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7.

Go to C:\Users\User\AppData\Local\Programs\Ollama\rocm and copy-replace rocblas.dll alongside library folder from downloaded zip.

@d8rt8v commented on GitHub (Aug 3, 2024): Have you installed [AMD HIP 5.7.1](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html) previously? > Doesn't work on win 10 with gfx1010. > > > Yep, confirming this on win 11. > > > guy has it working here [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832) > > > > > > > > 1. Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases > > 2. Download `rocblas.for.gfx1010-.xnack-.with.building.guide.7z` from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7. > > 3. Go to `C:\Users\User\AppData\Local\Programs\Ollama\rocm` and copy-replace `rocblas.dll` alongside `library` folder from downloaded zip.

GiteaMirror commented

2026-05-03 13:52:20 -05:00

@LeonKraim commented on GitHub (Aug 3, 2024):

Yes, i did fully follow the instructions
the logs still tell me 'amdgpu unsupported'
AMD HIP seems to have installed succesfully

@LeonKraim commented on GitHub (Aug 3, 2024): Yes, i did fully follow the instructions the logs still tell me 'amdgpu unsupported' AMD HIP seems to have installed succesfully

GiteaMirror commented

2026-05-03 13:52:21 -05:00

@ptitjes commented on GitHub (Sep 2, 2024):

Did someone try this on Linux?

@ptitjes commented on GitHub (Sep 2, 2024): Did someone try this on Linux?

GiteaMirror commented

2026-05-03 13:52:21 -05:00

@justinlovinger commented on GitHub (Sep 2, 2024):

Note, this works out of the box on the latest NixOS:

services.ollama = {
  enable = true;
  acceleration = "rocm";
  rocmOverrideGfx = "10.1.0";
};

@justinlovinger commented on GitHub (Sep 2, 2024): Note, this works out of the box on the latest NixOS: ```nix services.ollama = { enable = true; acceleration = "rocm"; rocmOverrideGfx = "10.1.0"; }; ```

GiteaMirror commented

2026-05-03 13:52:22 -05:00

@thednp commented on GitHub (Oct 25, 2024):

Anyone have RX 5700XT running on a debian based distro?

I've installed latest ollama and rocm;
I've added ollama and my_username to the render group (as suggested in the image below)
I'm getting some permissions error related to /dev/kfd device.

Anyone?

@thednp commented on GitHub (Oct 25, 2024): Anyone have RX 5700XT running on a debian based distro? * I've installed latest ollama and rocm; * I've added ollama and my_username to the *render* group (as suggested in the image below) * I'm getting some permissions error related to `/dev/kfd` device. ![Screenshot from 2024-10-25 17-31-26](https://github.com/user-attachments/assets/5ffb50d7-b024-4e19-bfb8-77d637cd6713) Anyone?

GiteaMirror commented

2026-05-03 13:52:23 -05:00

@tecnomanu commented on GitHub (Nov 1, 2024):

Yep, confirming this on win 11.

guy has it working here #2453 (comment)

Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases

Download rocblas.for.gfx1010-.xnack-.with.building.guide.7z from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7.

Go to C:\Users\User\AppData\Local\Programs\Ollama\rocm and copy-replace rocblas.dll alongside library folder from downloaded zip.

Have anyone recreated this in Linux by any chance?
rocblas

Hi! i can reply in ubuntu v24.

The difference here is that you need to create the symlinking with the library inside the ollama directory instead of the rocm opt folder.
For be clare:
Instead of sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat i do this here: sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat

Check the error return by Ollama when you are running the server and fail, there is the correct folder where is the TensileLibrary.dat file.

See you and thanks @b0o !!!! ❤️🙌

@tecnomanu commented on GitHub (Nov 1, 2024): > > Yep, confirming this on win 11. > > > guy has it working here [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832) > > > > > > > > 1. Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases > > 2. Download `rocblas.for.gfx1010-.xnack-.with.building.guide.7z` from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7. > > 3. Go to `C:\Users\User\AppData\Local\Programs\Ollama\rocm` and copy-replace `rocblas.dll` alongside `library` folder from downloaded zip. > > Have anyone recreated this in Linux by any chance? > rocblas Hi! i can reply in ubuntu v24. The difference here is that you need to create the symlinking with the library inside the ollama directory instead of the rocm opt folder. For be clare: Instead of ```sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat``` i do this here: ```sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat``` Check the error return by Ollama when you are running the server and fail, there is the correct folder where is the TensileLibrary.dat file. See you and thanks @b0o !!!! ❤️🙌

GiteaMirror commented

2026-05-03 13:52:24 -05:00

@b0o commented on GitHub (Nov 1, 2024):

No problem. By the way, it seems this workaround is unnecessary as of opencl-amd-dev 6.2.1, the TensileLibrary_lazy_gfx1010.dat file is now included in the package.

@b0o commented on GitHub (Nov 1, 2024): No problem. By the way, it seems this workaround is unnecessary as of opencl-amd-dev 6.2.1, the `TensileLibrary_lazy_gfx1010.dat` file is now included in the package.

GiteaMirror commented

2026-05-03 13:52:25 -05:00

@tecnomanu commented on GitHub (Nov 2, 2024):

@b0o i need to do this with the last version installed, because in Ubuntu, Ollama use the own libraries, maybe ollama need to update the libraries. I hope they will be careful because with this hack already works.

@tecnomanu commented on GitHub (Nov 2, 2024): @b0o i need to do this with the last version installed, because in Ubuntu, Ollama use the own libraries, maybe ollama need to update the libraries. I hope they will be careful because with this hack already works.

GiteaMirror commented

2026-05-03 13:52:25 -05:00

@arg0x commented on GitHub (Nov 4, 2024):

I'm using RX 5600M 6GB which seems to have the same arch gfx1010

$amdgpu-arch
gfx1010
gfx90c

I've installed ollama-rocm using aur, set sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat, tried to run with both HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve and HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve

I'm getting this error:

$ ollama run phi3.5
Error: llama runner process has terminated: CUDA error: shared object initialization failed
  current device: 0, in function ggml_cuda_compute_forward at /build/ollama-rocm/src/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326
  err
/build/ollama-rocm/src/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error

ollama serve output: https://pastebin.com/b6nSbHUa

@arg0x commented on GitHub (Nov 4, 2024): I'm using RX 5600M 6GB which seems to have the same arch `gfx1010` ```shell $amdgpu-arch gfx1010 gfx90c ``` I've installed ollama-rocm using aur, set `sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat`, tried to run with both `HSA_OVERRIDE_GFX_VERSION="10.1.0" ollama serve` and `HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve` I'm getting this error: ```shell $ ollama run phi3.5 Error: llama runner process has terminated: CUDA error: shared object initialization failed current device: 0, in function ggml_cuda_compute_forward at /build/ollama-rocm/src/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326 err /build/ollama-rocm/src/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error ``` ollama serve output: https://pastebin.com/b6nSbHUa

GiteaMirror commented

2026-05-03 13:52:26 -05:00

@fabdacab commented on GitHub (Nov 20, 2024):

Yep, confirming this on win 11.

guy has it working here #2453 (comment)

Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases

Download rocblas.for.gfx1010-.xnack-.with.building.guide.7z from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7.

Go to C:\Users\User\AppData\Local\Programs\Ollama\rocm and copy-replace rocblas.dll alongside library folder from downloaded zip.

Have anyone recreated this in Linux by any chance?
rocblas

Hi! i can reply in ubuntu v24.

The difference here is that you need to create the symlinking with the library inside the ollama directory instead of the rocm opt folder. For be clare: Instead of sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat i do this here: sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat

Check the error return by Ollama when you are running the server and fail, there is the correct folder where is the TensileLibrary.dat file.

See you and thanks @b0o !!!! ❤️🙌

I also have a Radeon 5700XT and a spare pc to experiment then I tried this:

Using a fresh Ubuntu 24.04, I followed the steps to have ROCm installed from the official page. Latest version (6.2 at the time of this post):
https://rocm.docs.amd.com/projects/amdsmi/en/latest/install/install.html#install-amdgpu-driver-and-amd-smi-with-rocm

GPU was detected so everything was looking fine so far.

Then installed Ollama with the default command line command:
curl -fsSL https://ollama.com/install.sh | sh

Kept a couple of terminal instances to monitor CPU (with htop) and GPU (with 'watch' and 'rocm-smi').

Installed and executed llama3.2 model:
ollama run --verbose llama3.2

Just to see if all was working, I tried like this and it was still CPU bounded.

Then applied what @tecnomanu did.
Noticed the folder was empty before I created the symlink there:
sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat

Applied also the environment below:
export HSA_OVERRIDE_GFX_VERSION="10.1.0"

Started the model again and it worked!
Only the GPU got the load this time!
This GPU is not the fastest for this kind of workload maybe is being choked by the rest of the PC I am using an X79 with an old Xeon E5 2650 V2 cpu. Still a lot faster than depending on the tired threads from my old CPU!

@fabdacab commented on GitHub (Nov 20, 2024): > > > Yep, confirming this on win 11. > > > > guy has it working here [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2236193832) > > > > > > > > > > > > 1. Install ollama prebuild binary installer from here https://github.com/likelovewant/ollama-for-amd/releases > > > 2. Download `rocblas.for.gfx1010-.xnack-.with.building.guide.7z` from https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/tag/v0.5.7. > > > 3. Go to `C:\Users\User\AppData\Local\Programs\Ollama\rocm` and copy-replace `rocblas.dll` alongside `library` folder from downloaded zip. > > > > > > Have anyone recreated this in Linux by any chance? > > rocblas > > Hi! i can reply in ubuntu v24. > > The difference here is that you need to create the symlinking with the library inside the ollama directory instead of the rocm opt folder. For be clare: Instead of `sudo ln -s /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat` i do this here: `sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat` > > Check the error return by Ollama when you are running the server and fail, there is the correct folder where is the TensileLibrary.dat file. > > See you and thanks @b0o !!!! ❤️🙌 I also have a Radeon 5700XT and a spare pc to experiment then I tried this: Using a fresh Ubuntu 24.04, I followed the steps to have ROCm installed from the official page. Latest version (6.2 at the time of this post): **https://rocm.docs.amd.com/projects/amdsmi/en/latest/install/install.html#install-amdgpu-driver-and-amd-smi-with-rocm** GPU was detected so everything was looking fine so far. Then installed Ollama with the default command line command: **curl -fsSL https://ollama.com/install.sh | sh** Kept a couple of terminal instances to monitor CPU (with htop) and GPU (with 'watch' and 'rocm-smi'). Installed and executed llama3.2 model: **ollama run --verbose llama3.2** Just to see if all was working, I tried like this and it was still CPU bounded. Then applied what @tecnomanu did. Noticed the folder was empty before I created the symlink there: **sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat** Applied also the environment below: **export HSA_OVERRIDE_GFX_VERSION="10.1.0"** Started the model again and it worked! Only the GPU got the load this time! This GPU is not the fastest for this kind of workload maybe is being choked by the rest of the PC I am using an X79 with an old Xeon E5 2650 V2 cpu. Still a lot faster than depending on the tired threads from my old CPU!

GiteaMirror commented

2026-05-03 13:52:27 -05:00

@thednp commented on GitHub (Nov 21, 2024):

@fabdacab I can confirm your steps worked for me too. THANK YOU!

@thednp commented on GitHub (Nov 21, 2024): @fabdacab I can confirm your steps worked for me too. THANK YOU!

GiteaMirror commented

2026-05-03 13:52:28 -05:00

@RealDishwash commented on GitHub (Feb 8, 2025):

If anyone's looking for a simpler solution, I installed ramalama.

Then it'll automatically use rocm for you.

ramalama run llama3.2

Here's the results when i bench it using rocm.

❯ ramalama bench llama3.2
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 3B Q4_K - Medium         |   1.87 GiB |     3.21 B | ROCm       | 999 |         pp512 |        690.26 ± 2.30 |
| llama 3B Q4_K - Medium         |   1.87 GiB |     3.21 B | ROCm       | 999 |         tg128 |         77.14 ± 0.09 |

build: aa6fb132 (4607)

@RealDishwash commented on GitHub (Feb 8, 2025): If anyone's looking for a simpler solution, I installed [ramalama](https://github.com/containers/ramalamal). Then it'll automatically use rocm for you. `ramalama run llama3.2` Here's the results when i bench it using rocm. ``` ❯ ramalama bench llama3.2 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: | | llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | ROCm | 999 | pp512 | 690.26 ± 2.30 | | llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | ROCm | 999 | tg128 | 77.14 ± 0.09 | build: aa6fb132 (4607) ```

GiteaMirror commented

2026-05-03 13:52:30 -05:00

@Split7fire commented on GitHub (Feb 8, 2025):

If anyone's looking for a simpler solution, I installed ramalama.

Then it'll automatically use rocm for you.

ramalama run llama3.2

Here's the results when i bench it using rocm.

❯ ramalama bench llama3.2
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 3B Q4_K - Medium         |   1.87 GiB |     3.21 B | ROCm       | 999 |         pp512 |        690.26 ± 2.30 |
| llama 3B Q4_K - Medium         |   1.87 GiB |     3.21 B | ROCm       | 999 |         tg128 |         77.14 ± 0.09 |

build: aa6fb132 (4607)

Tried on my setup but get

ramalama bench llama3.2
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|    1.88 GB/   1.88 GB  10.80 MB/s        0s

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32

and that’s all

@Split7fire commented on GitHub (Feb 8, 2025): > If anyone's looking for a simpler solution, I installed [ramalama](https://github.com/containers/ramalamal). > > Then it'll automatically use rocm for you. > > `ramalama run llama3.2` > > Here's the results when i bench it using rocm. > > ``` > ❯ ramalama bench llama3.2 > ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no > ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no > ggml_cuda_init: found 1 ROCm devices: > Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32 > | model | size | params | backend | ngl | test | t/s | > | ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: | > | llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | ROCm | 999 | pp512 | 690.26 ± 2.30 | > | llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | ROCm | 999 | tg128 | 77.14 ± 0.09 | > > build: aa6fb132 (4607) > ``` Tried on my setup but get ``` ramalama bench llama3.2 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.88 GB/ 1.88 GB 10.80 MB/s 0s ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 5700 XT, gfx1010:xnack- (0x1010), VMM: no, Wave Size: 32 ``` and that’s all

GiteaMirror commented

2026-05-03 13:52:33 -05:00

@RealDishwash commented on GitHub (Feb 8, 2025):

I think you need podman installed for it to work, it runs inside a container.

@RealDishwash commented on GitHub (Feb 8, 2025): I think you need podman installed for it to work, it runs inside a container.

GiteaMirror commented

2026-05-03 13:52:35 -05:00

@Split7fire commented on GitHub (Feb 8, 2025):

I think you need podman installed for it to work, it runs inside a container.

I already have docker. And I saw that ramalama successfully download all required ROCm.

@Split7fire commented on GitHub (Feb 8, 2025): > I think you need podman installed for it to work, it runs inside a container. I already have docker. And I saw that ramalama successfully download all required ROCm.

GiteaMirror commented

2026-05-03 13:52:37 -05:00

@RealDishwash commented on GitHub (Feb 8, 2025):

I think you need podman installed for it to work, it runs inside a container.

I already have docker. And I saw that ramalama successfully download all required ROCm.

You have to export this environment variable to use it with docker.

export RAMALAMA_CONTAINER_ENGINE=docker

@RealDishwash commented on GitHub (Feb 8, 2025): > > I think you need podman installed for it to work, it runs inside a container. > > I already have docker. And I saw that ramalama successfully download all required ROCm. You have to export this environment variable to use it with docker. `export RAMALAMA_CONTAINER_ENGINE=docker`

GiteaMirror commented

2026-05-03 13:52:38 -05:00

@Split7fire commented on GitHub (Feb 8, 2025):

I think you need podman installed for it to work, it runs inside a container.

I already have docker. And I saw that ramalama successfully download all required ROCm.

You have to export this environment variable to use it with docker.

export RAMALAMA_CONTAINER_ENGINE=docker

I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS.

@Split7fire commented on GitHub (Feb 8, 2025): > > > I think you need podman installed for it to work, it runs inside a container. > > > > > > I already have docker. And I saw that ramalama successfully download all required ROCm. > > You have to export this environment variable to use it with docker. > > `export RAMALAMA_CONTAINER_ENGINE=docker` I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS.

GiteaMirror commented

2026-05-03 13:52:39 -05:00

@RealDishwash commented on GitHub (Feb 8, 2025):

I think you need podman installed for it to work, it runs inside a container.

I already have docker. And I saw that ramalama successfully download all required ROCm.

You have to export this environment variable to use it with docker.
export RAMALAMA_CONTAINER_ENGINE=docker

I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS.

😭 I'm on aurora-dx and it's fine for me. Can you try running it on your cpu to see if it'll work on your cpu at least?

ramalama --ngl 0 bench llama3.2

@RealDishwash commented on GitHub (Feb 8, 2025): > > > > I think you need podman installed for it to work, it runs inside a container. > > > > > > > > > I already have docker. And I saw that ramalama successfully download all required ROCm. > > > > > > You have to export this environment variable to use it with docker. > > `export RAMALAMA_CONTAINER_ENGINE=docker` > > I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS. 😭 I'm on aurora-dx and it's fine for me. Can you try running it on your cpu to see if it'll work on your cpu at least? `ramalama --ngl 0 bench llama3.2`

GiteaMirror commented

2026-05-03 13:52:40 -05:00

@Split7fire commented on GitHub (Feb 9, 2025):

I think you need podman installed for it to work, it runs inside a container.

I already have docker. And I saw that ramalama successfully download all required ROCm.

You have to export this environment variable to use it with docker.
export RAMALAMA_CONTAINER_ENGINE=docker

I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS.

😭 I'm on aurora-dx and it's fine for me. Can you try running it on your cpu to see if it'll work on your cpu at least?

ramalama --ngl 0 bench llama3.2

How did you install ramalama? pip? containers?

@Split7fire commented on GitHub (Feb 9, 2025): > > > > > I think you need podman installed for it to work, it runs inside a container. > > > > > > > > > > > > I already have docker. And I saw that ramalama successfully download all required ROCm. > > > > > > > > > You have to export this environment variable to use it with docker. > > > `export RAMALAMA_CONTAINER_ENGINE=docker` > > > > > > I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS. > > 😭 I'm on aurora-dx and it's fine for me. Can you try running it on your cpu to see if it'll work on your cpu at least? > > `ramalama --ngl 0 bench llama3.2` How did you install ramalama? pip? containers?

GiteaMirror commented

2026-05-03 13:52:41 -05:00

@Split7fire commented on GitHub (Feb 9, 2025):

I think you need podman installed for it to work, it runs inside a container.

I already have docker. And I saw that ramalama successfully download all required ROCm.

You have to export this environment variable to use it with docker.
export RAMALAMA_CONTAINER_ENGINE=docker

I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS.

😭 I'm on aurora-dx and it's fine for me. Can you try running it on your cpu to see if it'll work on your cpu at least?

ramalama --ngl 0 bench llama3.2

strange. Your command output this:

usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE] [--gpu] [--image IMAGE] [--nocontainer] [--runtime {llama.cpp,vllm}] [--store STORE] [-v]
                {help,bench,benchmark,containers,ps,convert,info,list,ls,login,logout,perplexity,pull,push,rm,run,serve,stop,version} ...
ramalama: error: argument subcommand: invalid choice: '0' (choose from help, bench, benchmark, containers, ps, convert, info, list, ls, login, logout, perplexity, pull, push, rm, run, serve, stop, version)

@Split7fire commented on GitHub (Feb 9, 2025): > > > > > I think you need podman installed for it to work, it runs inside a container. > > > > > > > > > > > > I already have docker. And I saw that ramalama successfully download all required ROCm. > > > > > > > > > You have to export this environment variable to use it with docker. > > > `export RAMALAMA_CONTAINER_ENGINE=docker` > > > > > > I checked: my system has podman. I'm using Bluefin-dx (Fedora) and it may be connected with atomic nature of OS. > > 😭 I'm on aurora-dx and it's fine for me. Can you try running it on your cpu to see if it'll work on your cpu at least? > > `ramalama --ngl 0 bench llama3.2` strange. Your command output this: ``` usage: ramalama [-h] [--container] [--debug] [--dryrun] [--engine ENGINE] [--gpu] [--image IMAGE] [--nocontainer] [--runtime {llama.cpp,vllm}] [--store STORE] [-v] {help,bench,benchmark,containers,ps,convert,info,list,ls,login,logout,perplexity,pull,push,rm,run,serve,stop,version} ... ramalama: error: argument subcommand: invalid choice: '0' (choose from help, bench, benchmark, containers, ps, convert, info, list, ls, login, logout, perplexity, pull, push, rm, run, serve, stop, version) ```

GiteaMirror commented

2026-05-03 13:52:42 -05:00

@RealDishwash commented on GitHub (Feb 9, 2025):

@Split7fire Can you open an issue on the bluefin github (or send a message on the ublue discord) for this, might be worth doing it there. I don't want to pollute this issue. Just tag this issue in it so i know where it is.

@RealDishwash commented on GitHub (Feb 9, 2025): @Split7fire Can you open an issue on the bluefin github (or send a message on the ublue discord) for this, might be worth doing it there. I don't want to pollute this issue. Just tag this issue in it so i know where it is.

GiteaMirror commented

2026-05-03 13:52:42 -05:00

@mon-jai commented on GitHub (Mar 17, 2025):

This issue could be resolved if #9650 is merged.

Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it isn't supported by ROCm. Just run the installer and you are good to go, no manual file swapping required.

For now, we can use the binaries compiled by @McBane87: https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858

@mon-jai commented on GitHub (Mar 17, 2025): This issue could be resolved if #9650 is merged. Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it isn't supported by ROCm. Just run the installer and you are good to go, no manual file swapping required. For now, we can use the binaries compiled by @McBane87: https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858

GiteaMirror commented

2026-05-03 13:52:44 -05:00

@Split7fire commented on GitHub (Mar 18, 2025):

@mon-jai Great news, but can you elaborate what installer is mentioned? Official one, or forked one?

@Split7fire commented on GitHub (Mar 18, 2025): @mon-jai Great news, but can you elaborate what installer is mentioned? Official one, or forked one?

GiteaMirror commented

2026-05-03 13:52:44 -05:00

@mon-jai commented on GitHub (Mar 18, 2025):

@Split7fire The "Newest build (v0.6.1)" from the first comment.

@mon-jai commented on GitHub (Mar 18, 2025): @Split7fire The "Newest build (v0.6.1)" from the first comment.

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#63502