[GH-ISSUE #1821] amd64 binary for version 0.1.18 won't work with rocm-6.0.0 #47549

Closed
opened 2026-04-28 04:08:48 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @chirvo on GitHub (Jan 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1821

Originally assigned to: @dhiltgen on GitHub.

This happens when using the Linux binary downloaded from the web page.

2024/01/06 09:03:56 images.go:834: total blobs: 0                                                                                                                                                                                                                                                                                                                                    
2024/01/06 09:03:56 images.go:841: total unused blobs removed: 0                                                                                                                                                                                                                                                                                                                     
2024/01/06 09:03:56 routes.go:929: Listening on 127.0.0.1:11434 (version 0.1.18)                                                                                                                                                                                                                                                                                                     
2024/01/06 09:03:56 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]                                                                                                                                                                                                                                                                                                         
2024/01/06 09:03:56 gpu.go:34: Detecting GPU type                                                                                                                                                                                                                                                                                                                                    
2024/01/06 09:03:56 gpu.go:39: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or directory                                                                                                                                                                     
2024/01/06 09:03:56 gpu.go:48: Radeon GPU detected                                                                                                                                                                                                                                                                                                                                   

...

ollama  | 2024/01/06 09:06:14 llm.go:90: Failed to load dynamic library rocm - falling back to CPU mode Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory                                                                                                                                      
ollama  | 2024/01/06 09:06:14 gpu.go:146: 22476 MB VRAM available, loading up to 609 rocm GPU layers out of 22                                                                                                                

The binary fails back to CPU because it can't load 3 libraries.

  • libhipblas.so.1
  • librocblas.so.3
  • librocsparse.so.0

Workaround:

Just create the pertinent symbolic links for each library:

cd /opt/rocm-6.0.0/lib
ln -sf libhipblas.so.2.0.60000 libhipblas.so.1
ln -sf librocblas.so.4.0.60000 librocblas.so.3
ln -sf librocsparse.so.1.0.0.60000 librocsparse.so.0

After linking the libraries everything should work.

2024/01/06 09:56:20 images.go:834: total blobs: 5
2024/01/06 09:56:20 images.go:841: total unused blobs removed: 0
2024/01/06 09:56:20 routes.go:929: Listening on 127.0.0.1:11434 (version 0.1.18)
2024/01/06 09:56:20 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/06 09:56:20 gpu.go:34: Detecting GPU type
�2024/01/06 09:56:20 gpu.go:39: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or directory
2024/01/06 09:56:20 gpu.go:48: Radeon GPU detected
T[GIN] 2024/01/06 - 09:56:35 | 200 |       31.67µs |       127.0.0.1 | HEAD     "/"
\[GIN] 2024/01/06 - 09:56:35 | 200 |     402.531µs |       127.0.0.1 | POST     "/api/show"
\[GIN] 2024/01/06 - 09:56:35 | 200 |     198.656µs |       127.0.0.1 | POST     "/api/show"
�2024/01/06 09:56:36 shim_ext_server_linux.go:24: Updating PATH to /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/ollama4208096896/rocm
2024/01/06 09:56:36 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama4208096896/rocm/libext_server.so
2024/01/06 09:56:36 gpu.go:146: 22522 MB VRAM available, loading up to 610 rocm GPU layers out of 22
2024/01/06 09:56:36 ext_server_common.go:143: Initializing internal llama server
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: Radeon RX 7900 XTX, compute capability 11.0

Just wanted to document this in case somebody else have the same issue.

Cheers, and thank you all for the amazing work you're doing with ollama.

Originally created by @chirvo on GitHub (Jan 6, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1821 Originally assigned to: @dhiltgen on GitHub. This happens when using the Linux binary downloaded from [the web page](https://ollama.ai/download/ollama-linux-amd64). ``` 2024/01/06 09:03:56 images.go:834: total blobs: 0 2024/01/06 09:03:56 images.go:841: total unused blobs removed: 0 2024/01/06 09:03:56 routes.go:929: Listening on 127.0.0.1:11434 (version 0.1.18) 2024/01/06 09:03:56 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/06 09:03:56 gpu.go:34: Detecting GPU type 2024/01/06 09:03:56 gpu.go:39: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or directory 2024/01/06 09:03:56 gpu.go:48: Radeon GPU detected ... ollama | 2024/01/06 09:06:14 llm.go:90: Failed to load dynamic library rocm - falling back to CPU mode Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory ollama | 2024/01/06 09:06:14 gpu.go:146: 22476 MB VRAM available, loading up to 609 rocm GPU layers out of 22 ``` The binary fails back to CPU because it can't load 3 libraries. - libhipblas.so.1 - librocblas.so.3 - librocsparse.so.0 **Workaround:** Just create the pertinent symbolic links for each library: ```bash cd /opt/rocm-6.0.0/lib ln -sf libhipblas.so.2.0.60000 libhipblas.so.1 ln -sf librocblas.so.4.0.60000 librocblas.so.3 ln -sf librocsparse.so.1.0.0.60000 librocsparse.so.0 ``` After linking the libraries everything should work. ``` 2024/01/06 09:56:20 images.go:834: total blobs: 5 2024/01/06 09:56:20 images.go:841: total unused blobs removed: 0 2024/01/06 09:56:20 routes.go:929: Listening on 127.0.0.1:11434 (version 0.1.18) 2024/01/06 09:56:20 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/06 09:56:20 gpu.go:34: Detecting GPU type �2024/01/06 09:56:20 gpu.go:39: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or directory 2024/01/06 09:56:20 gpu.go:48: Radeon GPU detected T[GIN] 2024/01/06 - 09:56:35 | 200 | 31.67µs | 127.0.0.1 | HEAD "/" \[GIN] 2024/01/06 - 09:56:35 | 200 | 402.531µs | 127.0.0.1 | POST "/api/show" \[GIN] 2024/01/06 - 09:56:35 | 200 | 198.656µs | 127.0.0.1 | POST "/api/show" �2024/01/06 09:56:36 shim_ext_server_linux.go:24: Updating PATH to /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/ollama4208096896/rocm 2024/01/06 09:56:36 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama4208096896/rocm/libext_server.so 2024/01/06 09:56:36 gpu.go:146: 22522 MB VRAM available, loading up to 610 rocm GPU layers out of 22 2024/01/06 09:56:36 ext_server_common.go:143: Initializing internal llama server ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 ROCm devices: Device 0: Radeon RX 7900 XTX, compute capability 11.0 ``` Just wanted to document this in case somebody else have the same issue. Cheers, and thank you all for the amazing work you're doing with ollama.
Author
Owner

@MauriceKayser commented on GitHub (Jan 7, 2024):

I had the same issue with the same card and rocm version, and also tried the symbolic link trick, but it still seemed to use the CPU instead of the GPU (checked with htop + nvtop), even though it printed:

gpu.go:34: Detecting GPU type
gpu.go:39: CUDA not detected: nvml vram init failure: 9
gpu.go:48: Radeon GPU detected
..
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: Radeon RX 7900 XTX, compute capability 11.0
..
llm_load_tensors: ggml ctx size =    0.38 MiB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  = 8781.79 MiB
llm_load_tensors: offloading 21 repeating layers to GPU
llm_load_tensors: offloaded 21/33 layers to GPU
llm_load_tensors: VRAM used: 16434.47 MiB
..
llama_new_context_with_model: compute buffer total size = 187.22 MiB
llama_new_context_with_model: VRAM scratch buffer: 184.04 MiB
gpu.go:146: 22707 MB VRAM available, loading up to 21 rocm GPU layers out of 32
ext_server_common.go:143: Initializing internal llama server
..
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 168.00 MB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_build_graph: non-view tensors processed: 1124/1124
llama_new_context_with_model: compute buffer total size = 187.22 MiB
llama_new_context_with_model: VRAM scratch buffer: 184.04 MiB
llama_new_context_with_model: total VRAM used: 16786.50 MiB (model: 16434.47 MiB, context: 352.04 MiB)
ext_server_common.go:151: Starting internal llama main loop
ext_server_common.go:165: loaded 0 images

usage

<!-- gh-comment-id:1879910160 --> @MauriceKayser commented on GitHub (Jan 7, 2024): I had the same issue with the same card and rocm version, and also tried the symbolic link trick, but it still seemed to use the CPU instead of the GPU (checked with `htop` + `nvtop`), even though it printed: ``` gpu.go:34: Detecting GPU type gpu.go:39: CUDA not detected: nvml vram init failure: 9 gpu.go:48: Radeon GPU detected .. ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 ROCm devices: Device 0: Radeon RX 7900 XTX, compute capability 11.0 .. llm_load_tensors: ggml ctx size = 0.38 MiB llm_load_tensors: using ROCm for GPU acceleration llm_load_tensors: mem required = 8781.79 MiB llm_load_tensors: offloading 21 repeating layers to GPU llm_load_tensors: offloaded 21/33 layers to GPU llm_load_tensors: VRAM used: 16434.47 MiB .. llama_new_context_with_model: compute buffer total size = 187.22 MiB llama_new_context_with_model: VRAM scratch buffer: 184.04 MiB gpu.go:146: 22707 MB VRAM available, loading up to 21 rocm GPU layers out of 32 ext_server_common.go:143: Initializing internal llama server .. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: VRAM kv self = 168.00 MB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_build_graph: non-view tensors processed: 1124/1124 llama_new_context_with_model: compute buffer total size = 187.22 MiB llama_new_context_with_model: VRAM scratch buffer: 184.04 MiB llama_new_context_with_model: total VRAM used: 16786.50 MiB (model: 16434.47 MiB, context: 352.04 MiB) ext_server_common.go:151: Starting internal llama main loop ext_server_common.go:165: loaded 0 images ``` ![usage](https://i.imgur.com/2wvSKRG.png)
Author
Owner

@dhiltgen commented on GitHub (Jan 7, 2024):

At the moment, the official binary build for 0.1.18 is linked against ROCm v5 not version 6. According to the AMD docs, v6 and v5 are not compatible. I'm working on a PR #1819 to support concurrent v5 and v6 libraries which should solve this.

<!-- gh-comment-id:1879912073 --> @dhiltgen commented on GitHub (Jan 7, 2024): At the moment, the official binary build for 0.1.18 is linked against ROCm v5 not version 6. According to the AMD docs, v6 and v5 are not compatible. I'm working on a PR #1819 to support concurrent v5 and v6 libraries which should solve this.
Author
Owner

@chirvo commented on GitHub (Jan 7, 2024):

Hmm. Curious.

Here's a screenshot of ollama running llama2-uncensored, describing in detail all the episodes of "El Quijote de La Mancha". For monitoring I'm using btop++, nvtop, and radeontop.

294792773-cb7b6c1c-a3bf-4679-9ae0-eb7fdde777d0

You can see that ollama is using the video card.

There's another way I can tell my video card is being used, but is totally unscientific and it has to be with a sound that my PSU makes when the GPU is demanding more power.

I can say that it is working, at least for me.

A small note here: I'm running ollama serve in a docker container, based on the ubuntu:jammy image and a pristine install of rocm-6.0.0.

EDIT: never mind the Llama output, it's repeating itself to death.
EDIT 2: editing image to remove personal info. My bad.

<!-- gh-comment-id:1880226724 --> @chirvo commented on GitHub (Jan 7, 2024): Hmm. Curious. Here's a screenshot of ollama running llama2-uncensored, describing in detail all the episodes of "El Quijote de La Mancha". For monitoring I'm using btop++, nvtop, and radeontop. ![294792773-cb7b6c1c-a3bf-4679-9ae0-eb7fdde777d0](https://github.com/jmorganca/ollama/assets/1088243/95b53d8b-1051-4730-bc59-6a047baf4a3f) You can see that ollama is using the video card. There's another way I can tell my video card is being used, but is totally unscientific and it has to be with a sound that my PSU makes when the GPU is demanding more power. I can say that it is working, at least for me. A small note here: I'm running ollama serve in a docker container, based on the ubuntu:jammy image and a pristine install of rocm-6.0.0. **EDIT:** never mind the Llama output, it's repeating itself to death. **EDIT 2**: editing image to remove personal info. My bad.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47549