[GH-ISSUE #2869] Ollama doesn't use Radeon RX 6600 #27511

Closed
opened 2026-04-22 04:54:21 -05:00 by GiteaMirror · 22 comments
Owner

Originally created by @nameiwillforget on GitHub (Mar 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2869

Originally assigned to: @dhiltgen on GitHub.

I'm using Arch Linux with the latest updates installed and ollama installed from its AUR package. When I use the Smaug model, it uses my CPU considerably but my GPU not at all:
amdgpu
I put the output of ollama serve and ollama running Smaug into a file:
ollama.txt
smaug.txt
I've installed Cuda because I thought for a moment it is needed, but I don't think that's the reason it doesn't work.

Originally created by @nameiwillforget on GitHub (Mar 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2869 Originally assigned to: @dhiltgen on GitHub. I'm using Arch Linux with the latest updates installed and ollama installed from its AUR package. When I use the Smaug model, it uses my CPU considerably but my GPU not at all: ![amdgpu](https://github.com/ollama/ollama/assets/81373487/be629472-a4eb-4f31-b8e9-726e2f9a8c21) I put the output of `ollama serve` and ollama running Smaug into a file: [ollama.txt](https://github.com/ollama/ollama/files/14466737/ollama.txt) [smaug.txt](https://github.com/ollama/ollama/files/14466741/smaug.txt) I've installed Cuda because I thought for a moment it is needed, but I don't think that's the reason it doesn't work.
Author
Owner

@dhiltgen commented on GitHub (Mar 1, 2024):

Can you run the server with OLLAMA_DEBUG=1 set so we can see some more diagnostic information about why it wasn't able to initialize the GPU?

<!-- gh-comment-id:1974055405 --> @dhiltgen commented on GitHub (Mar 1, 2024): Can you run the server with `OLLAMA_DEBUG=1` set so we can see some more diagnostic information about why it wasn't able to initialize the GPU?
Author
Owner

@nameiwillforget commented on GitHub (Mar 1, 2024):

Alright:
ollama.txt

<!-- gh-comment-id:1974087265 --> @nameiwillforget commented on GitHub (Mar 1, 2024): Alright: [ollama.txt](https://github.com/ollama/ollama/files/14466947/ollama.txt)
Author
Owner

@dhiltgen commented on GitHub (Mar 1, 2024):

The attached log doesn't seem to have debug enabled. Try...

sudo systemctl stop ollama
OLLAMA_DEBUG=1 ollama serve
<!-- gh-comment-id:1974090385 --> @dhiltgen commented on GitHub (Mar 1, 2024): The attached log doesn't seem to have debug enabled. Try... ``` sudo systemctl stop ollama OLLAMA_DEBUG=1 ollama serve ```
Author
Owner

@tannisroot commented on GitHub (Mar 2, 2024):

If this is the model you are trying to run:
https://ollama.com/sammcj/smaug
note that it is 44GB in size.
Rx 6600 has only 8GB of VRAM.
I've found that Ollama won't use the GPU (at least on Linux) if it can't allocate it entirely to GPU's VRAM and fallback to CPU.

<!-- gh-comment-id:1974129216 --> @tannisroot commented on GitHub (Mar 2, 2024): If this is the model you are trying to run: https://ollama.com/sammcj/smaug note that it is 44GB in size. Rx 6600 has only 8GB of VRAM. I've found that Ollama won't use the GPU (at least on Linux) if it can't allocate it entirely to GPU's VRAM and fallback to CPU.
Author
Owner

@nameiwillforget commented on GitHub (Mar 2, 2024):

Alright, here it is again:
ollama.txt
Looks the same to me though.

<!-- gh-comment-id:1974146410 --> @nameiwillforget commented on GitHub (Mar 2, 2024): Alright, here it is again: [ollama.txt](https://github.com/ollama/ollama/files/14467373/ollama.txt) Looks the same to me though.
Author
Owner

@nameiwillforget commented on GitHub (Mar 2, 2024):

If this is the model you are trying to run: https://ollama.com/sammcj/smaug note that it is 44GB in size. Rx 6600 has only 8GB of VRAM. I've found that Ollama won't use the GPU (at least on Linux) if it can't allocate it entirely to GPU's VRAM and fallback to CPU.

Oh, I see. Is this intended behavior?

<!-- gh-comment-id:1974146723 --> @nameiwillforget commented on GitHub (Mar 2, 2024): > If this is the model you are trying to run: https://ollama.com/sammcj/smaug note that it is 44GB in size. Rx 6600 has only 8GB of VRAM. I've found that Ollama won't use the GPU (at least on Linux) if it can't allocate it entirely to GPU's VRAM and fallback to CPU. Oh, I see. Is this intended behavior?
Author
Owner

@dhiltgen commented on GitHub (Mar 2, 2024):

Hmm... your output doesn't look like what I'm expecting to see as ollama starts up when we're doing initial GPU discovery. Here's what I see with 0.1.27 on a system with an RX 7600

% OLLAMA_DEBUG=1 ollama serve
time=2024-03-02T17:03:56.859Z level=INFO source=images.go:710 msg="total blobs: 11"
time=2024-03-02T17:03:56.860Z level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-03-02T17:03:56.861Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
time=2024-03-02T17:03:56.861Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-02T17:03:59.554Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v6 rocm_v5 cpu cpu_avx]"
time=2024-03-02T17:03:59.554Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-03-02T17:03:59.554Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-03-02T17:03:59.554Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-02T17:03:59.554Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/daniel/libnvidia-ml.so*]"
time=2024-03-02T17:03:59.555Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-03-02T17:03:59.555Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-03-02T17:03:59.555Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /home/daniel/librocm_smi64.so*]"
time=2024-03-02T17:03:59.555Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002]"
wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.6.0.60002
dlsym: rsmi_init
dlsym: rsmi_shut_down
dlsym: rsmi_dev_memory_total_get
dlsym: rsmi_dev_memory_usage_get
dlsym: rsmi_version_get
dlsym: rsmi_num_monitor_devices
dlsym: rsmi_dev_id_get
dlsym: rsmi_dev_name_get
dlsym: rsmi_dev_brand_get
dlsym: rsmi_dev_vendor_name_get
dlsym: rsmi_dev_vram_vendor_get
dlsym: rsmi_dev_serial_number_get
dlsym: rsmi_dev_subsystem_name_get
dlsym: rsmi_dev_vbios_version_get
time=2024-03-02T17:03:59.558Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-03-02T17:03:59.558Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-02T17:03:59.558Z level=INFO source=gpu.go:155 msg="AMD Driver: 6.3.6"
time=2024-03-02T17:03:59.558Z level=DEBUG source=amd.go:76 msg="malformed gfx_target_version 0"
discovered 1 ROCm GPU Devices
[0] ROCm device name: Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600]
[0] ROCm brand: Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600]
[0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
[0] ROCm VRAM vendor: samsung
rsmi_dev_serial_number_get failed: 2
[0] ROCm subsystem name: RX 7600 Challenger OC
[0] ROCm vbios version: 113-D7451000-0001
[0] ROCm totalMem 8573157376
[0] ROCm usedMem 27176960
time=2024-03-02T17:03:59.561Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 7126M available memory"

That said, yes, if you're attempting to load a 44G model into a 8G GPU, then most of the work is being done by the CPU.

<!-- gh-comment-id:1974855373 --> @dhiltgen commented on GitHub (Mar 2, 2024): Hmm... your output doesn't look like what I'm expecting to see as ollama starts up when we're doing initial GPU discovery. Here's what I see with 0.1.27 on a system with an RX 7600 ``` % OLLAMA_DEBUG=1 ollama serve time=2024-03-02T17:03:56.859Z level=INFO source=images.go:710 msg="total blobs: 11" time=2024-03-02T17:03:56.860Z level=INFO source=images.go:717 msg="total unused blobs removed: 0" time=2024-03-02T17:03:56.861Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)" time=2024-03-02T17:03:56.861Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-02T17:03:59.554Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v6 rocm_v5 cpu cpu_avx]" time=2024-03-02T17:03:59.554Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-03-02T17:03:59.554Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-03-02T17:03:59.554Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-02T17:03:59.554Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/daniel/libnvidia-ml.so*]" time=2024-03-02T17:03:59.555Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-03-02T17:03:59.555Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" time=2024-03-02T17:03:59.555Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /home/daniel/librocm_smi64.so*]" time=2024-03-02T17:03:59.555Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002]" wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.6.0.60002 dlsym: rsmi_init dlsym: rsmi_shut_down dlsym: rsmi_dev_memory_total_get dlsym: rsmi_dev_memory_usage_get dlsym: rsmi_version_get dlsym: rsmi_num_monitor_devices dlsym: rsmi_dev_id_get dlsym: rsmi_dev_name_get dlsym: rsmi_dev_brand_get dlsym: rsmi_dev_vendor_name_get dlsym: rsmi_dev_vram_vendor_get dlsym: rsmi_dev_serial_number_get dlsym: rsmi_dev_subsystem_name_get dlsym: rsmi_dev_vbios_version_get time=2024-03-02T17:03:59.558Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-03-02T17:03:59.558Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-02T17:03:59.558Z level=INFO source=gpu.go:155 msg="AMD Driver: 6.3.6" time=2024-03-02T17:03:59.558Z level=DEBUG source=amd.go:76 msg="malformed gfx_target_version 0" discovered 1 ROCm GPU Devices [0] ROCm device name: Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600] [0] ROCm brand: Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: samsung rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: RX 7600 Challenger OC [0] ROCm vbios version: 113-D7451000-0001 [0] ROCm totalMem 8573157376 [0] ROCm usedMem 27176960 time=2024-03-02T17:03:59.561Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 7126M available memory" ``` That said, yes, if you're attempting to load a 44G model into a 8G GPU, then most of the work is being done by the CPU.
Author
Owner

@Jaspix commented on GitHub (Mar 5, 2024):

I'm running ollama from the official Arch package and facing the same issue. I got this log but all I can see it's both my GPU's getting discovered, however whenever I run a model, even small ones, it defaults to CPU.

discovered 2 ROCm GPU Devices
[0] ROCm device name: Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
[0] ROCm brand: Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
[0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
[0] ROCm VRAM vendor: samsung
rsmi_dev_serial_number_get failed: 2
[0] ROCm subsystem name: Radeon RX 6800M
[0] ROCm vbios version: SWBRT79208.001
[0] ROCm totalMem 12868124672
[0] ROCm usedMem 16650240
[1] ROCm device name: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
[1] ROCm brand: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
[1] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
rsmi_dev_vram_vendor_get failed: 2
rsmi_dev_serial_number_get failed: 2
[1] ROCm subsystem name: Radeon Vega 8
[1] ROCm vbios version: 113-CEZANNE-018
[1] ROCm totalMem 536870912
[1] ROCm usedMem 524304384
[1] ROCm integrated GPU
time=2024-03-04T20:33:33.692-05:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-03-04T20:33:33.692-05:00 level=DEBUG source=gpu.go:254 msg="rocm detected 2 devices with 10208M available memory"
<!-- gh-comment-id:1977786400 --> @Jaspix commented on GitHub (Mar 5, 2024): I'm running ollama from the official Arch package and facing the same issue. I got this log but all I can see it's both my GPU's getting discovered, however whenever I run a model, even small ones, it defaults to CPU. ``` discovered 2 ROCm GPU Devices [0] ROCm device name: Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] [0] ROCm brand: Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: samsung rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: Radeon RX 6800M [0] ROCm vbios version: SWBRT79208.001 [0] ROCm totalMem 12868124672 [0] ROCm usedMem 16650240 [1] ROCm device name: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1] ROCm brand: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] rsmi_dev_vram_vendor_get failed: 2 rsmi_dev_serial_number_get failed: 2 [1] ROCm subsystem name: Radeon Vega 8 [1] ROCm vbios version: 113-CEZANNE-018 [1] ROCm totalMem 536870912 [1] ROCm usedMem 524304384 [1] ROCm integrated GPU time=2024-03-04T20:33:33.692-05:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" time=2024-03-04T20:33:33.692-05:00 level=DEBUG source=gpu.go:254 msg="rocm detected 2 devices with 10208M available memory" ```
Author
Owner

@tannisroot commented on GitHub (Mar 5, 2024):

@Jaspix this is just a guess but could it be trying to use the integrated graphics first, runs out of memory and falls back to CPU?

<!-- gh-comment-id:1977891495 --> @tannisroot commented on GitHub (Mar 5, 2024): @Jaspix this is just a guess but could it be trying to use the integrated graphics first, runs out of memory and falls back to CPU?
Author
Owner

@Jaspix commented on GitHub (Mar 5, 2024):

@Jaspix this is just a guess but could it be trying to use the integrated graphics first, runs out of memory and falls back to CPU?

Possibly, but that would mean the program it's confusing the dedicated for the integrated as ROCR_VISIBLE_DEVICES=0 mean it's using the 0 device, I suppose?

<!-- gh-comment-id:1977944129 --> @Jaspix commented on GitHub (Mar 5, 2024): > @Jaspix this is just a guess but could it be trying to use the integrated graphics first, runs out of memory and falls back to CPU? Possibly, but that would mean the program it's confusing the dedicated for the integrated as ROCR_VISIBLE_DEVICES=0 mean it's using the 0 device, I suppose?
Author
Owner

@totterman commented on GitHub (Mar 5, 2024):

I'm running ollama from the official Arch package and facing the same issue: RX 7600 not detected. Perhaps because GPU libraries are not discovered?

$ OLLAMA_DEBUG=1 ollama serve
time=2024-03-05T10:18:23.527Z level=INFO source=images.go:710 msg="total blobs: 0"
time=2024-03-05T10:18:23.527Z level=INFO source=images.go:717 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
time=2024-03-05T10:18:23.527Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
time=2024-03-05T10:18:23.528Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-05T10:18:23.649Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cpu]"
time=2024-03-05T10:18:23.649Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-03-05T10:18:23.650Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-03-05T10:18:23.650Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-05T10:18:23.650Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/pbt/src/ai/poro/libnvidia-ml.so*]"
time=2024-03-05T10:18:23.656Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-03-05T10:18:23.656Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-03-05T10:18:23.656Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /home/pbt/src/ai/poro/librocm_smi64.so*]"
time=2024-03-05T10:18:23.656Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-03-05T10:18:23.656Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-05T10:18:23.656Z level=INFO source=routes.go:1042 msg="no GPU detected"
<!-- gh-comment-id:1978445509 --> @totterman commented on GitHub (Mar 5, 2024): I'm running ollama from the official Arch package and facing the same issue: RX 7600 not detected. Perhaps because GPU libraries are not discovered? ``` $ OLLAMA_DEBUG=1 ollama serve time=2024-03-05T10:18:23.527Z level=INFO source=images.go:710 msg="total blobs: 0" time=2024-03-05T10:18:23.527Z level=INFO source=images.go:717 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) time=2024-03-05T10:18:23.527Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)" time=2024-03-05T10:18:23.528Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-05T10:18:23.649Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cpu]" time=2024-03-05T10:18:23.649Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-03-05T10:18:23.650Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-03-05T10:18:23.650Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-05T10:18:23.650Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/pbt/src/ai/poro/libnvidia-ml.so*]" time=2024-03-05T10:18:23.656Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-03-05T10:18:23.656Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" time=2024-03-05T10:18:23.656Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /home/pbt/src/ai/poro/librocm_smi64.so*]" time=2024-03-05T10:18:23.656Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-03-05T10:18:23.656Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-05T10:18:23.656Z level=INFO source=routes.go:1042 msg="no GPU detected" ```
Author
Owner

@jmorganca commented on GitHub (Mar 12, 2024):

Hi there, would it be possible to:

This should provide you GPU acceleration on AMD. Let me know if that doesn't work for any reason!

<!-- gh-comment-id:1990941696 --> @jmorganca commented on GitHub (Mar 12, 2024): Hi there, would it be possible to: * Try the new 0.1.29 pre-release with AMD Preview: https://github.com/ollama/ollama/releases/tag/v0.1.29 * The RX 6600 isn't officially supported by AMD ROCm but you can override this by setting `HSA_OVERRIDE_GFX_VERSION="10.3.0"` (you can see how to set this [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)). This should provide you GPU acceleration on AMD. Let me know if that doesn't work for any reason!
Author
Owner

@segg21 commented on GitHub (Mar 12, 2024):

Hi there, would it be possible to:

This should provide you GPU acceleration on AMD. Let me know if that doesn't work for any reason!

didn't work setting environment variable. :/

<!-- gh-comment-id:1992736446 --> @segg21 commented on GitHub (Mar 12, 2024): > Hi there, would it be possible to: > > * Try the new 0.1.29 pre-release with AMD Preview: https://github.com/ollama/ollama/releases/tag/v0.1.29 > * The RX 6600 isn't officially supported by AMD ROCm but you can override this by setting `HSA_OVERRIDE_GFX_VERSION="10.3.0"` (you can see how to set this [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)). > > This should provide you GPU acceleration on AMD. Let me know if that doesn't work for any reason! didn't work setting environment variable. :/
Author
Owner

@dhiltgen commented on GitHub (Mar 13, 2024):

@totterman your logs indicate the ollama binary was compiled without GPU support Dynamic LLM libraries [cpu_avx cpu_avx2 cpu]". It's missing CUDA and ROCm. The official builds we host on github container all CPU types and GPU types in a single release.

@segg21 I just fixed some defects with the iGPU detection logic which might be related to your problem. We'll be updating the binaries for 0.1.29 (still in pre-release) later today to pick up that fix. Please give that a try, and if you're still seeing problems, please share the server log so we can see what's going on.

<!-- gh-comment-id:1992826143 --> @dhiltgen commented on GitHub (Mar 13, 2024): @totterman your logs indicate the ollama binary was compiled without GPU support `Dynamic LLM libraries [cpu_avx cpu_avx2 cpu]"`. It's missing CUDA and ROCm. The official builds we host on github container all CPU types and GPU types in a single release. @segg21 I just fixed some defects with the iGPU detection logic which might be related to your problem. We'll be updating the binaries for 0.1.29 (still in pre-release) later today to pick up that fix. Please give that a try, and if you're still seeing problems, please share the server log so we can see what's going on.
Author
Owner

@dhiltgen commented on GitHub (Mar 18, 2024):

@segg21 0.1.29 is now the latest official release https://github.com/ollama/ollama/releases
If you installed one of the earlier pre-release builds, please re-install.

<!-- gh-comment-id:2003038349 --> @dhiltgen commented on GitHub (Mar 18, 2024): @segg21 0.1.29 is now the latest official release https://github.com/ollama/ollama/releases If you installed one of the earlier pre-release builds, please re-install.
Author
Owner

@dhiltgen commented on GitHub (Mar 19, 2024):

@segg21 can you share your server log?

<!-- gh-comment-id:2005988419 --> @dhiltgen commented on GitHub (Mar 19, 2024): @segg21 can you share your server log?
Author
Owner

@segg21 commented on GitHub (Mar 19, 2024):

@segg21 can you share your server log?

sorry for delay. i meant to provide it with my previous message and forgot.
i'm running into another issue after I uninstalled to attempt again.

I'm attempting to use llama2 model, which i run ollama run llama2. Haven't had this issue before and I've restarted my PC. netstat also doesn't show this port being in use, but I'm now getting the error
Error: Post "http://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:50125->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.

Here's the server.log

Additionally I noticed that it's trying to find the file C:\Users\****\AppData\Local\Programs\Ollama\rocm\/rocblas/library/TensileLibrary.dat which doesn't seem to exist in this folder :/

<!-- gh-comment-id:2007862310 --> @segg21 commented on GitHub (Mar 19, 2024): > @segg21 can you share your server log? sorry for delay. i meant to provide it with my previous message and forgot. i'm running into another issue after I uninstalled to attempt again. I'm attempting to use llama2 model, which i run `ollama run llama2`. Haven't had this issue before and I've restarted my PC. `netstat` also doesn't show this port being in use, but I'm now getting the error `Error: Post "http://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:50125->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.` Here's the [server.log](https://github.com/ollama/ollama/files/14654783/server.log) Additionally I noticed that it's trying to find the file `C:\Users\****\AppData\Local\Programs\Ollama\rocm\/rocblas/library/TensileLibrary.dat` which doesn't seem to exist in this folder :/
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

Unfortunately the ROCm library does not yet support your GPU (gfx1032) and the override mechanism is only possible on linux (see #3107)

The system should detect this and fallback to CPU mode. Is it possible you're running an older pre-release of 0.1.29? Can you uninstall and re-install the latest binaries from https://github.com/ollama/ollama/releases/tag/v0.1.29 just to make sure? If you still see a crash instead of falling back to CPU, that's a bug we want to fix.

<!-- gh-comment-id:2009112162 --> @dhiltgen commented on GitHub (Mar 20, 2024): Unfortunately the ROCm library does not yet support your GPU (gfx1032) and the override mechanism is only possible on linux (see #3107) The system should detect this and fallback to CPU mode. Is it possible you're running an older pre-release of 0.1.29? Can you uninstall and re-install the latest binaries from https://github.com/ollama/ollama/releases/tag/v0.1.29 just to make sure? If you still see a crash instead of falling back to CPU, that's a bug we want to fix.
Author
Owner

@ftoppi commented on GitHub (Mar 20, 2024):

Unfortunately the ROCm library does not yet support your GPU (gfx1032) and the override mechanism is only possible on linux (see #3107)

Hi @dhiltgen , I'm trying to understand the support of AMD GPU. gfx1032 is has "runtime support" according to AMD website. Does it only work with cards with "HIP SDK" support?

Thanks for your work :)

<!-- gh-comment-id:2009318814 --> @ftoppi commented on GitHub (Mar 20, 2024): > Unfortunately the ROCm library does not yet support your GPU (gfx1032) and the override mechanism is only possible on linux (see #3107) > Hi @dhiltgen , I'm trying to understand the support of AMD GPU. gfx1032 is has "runtime support" according to [AMD website](https://rocm.docs.amd.com/en/docs-5.7.1/release/windows_support.html). Does it only work with cards with "HIP SDK" support? Thanks for your work :)
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

@ftoppi yes, that's correct. The HIP SDK math libraries are what make LLMs work on GPUs.

<!-- gh-comment-id:2010193478 --> @dhiltgen commented on GitHub (Mar 20, 2024): @ftoppi yes, that's correct. The HIP SDK math libraries are what make LLMs work on GPUs.
Author
Owner

@muhammedaligurdal commented on GitHub (Jul 26, 2024):

@ftoppi yes, that's correct. The HIP SDK math libraries are what make LLMs work on GPUs.

My graphics card is RX 6600. It saddens me that it supports this graphics card. I tried many methods and failed.

<!-- gh-comment-id:2253254265 --> @muhammedaligurdal commented on GitHub (Jul 26, 2024): > @ftoppi yes, that's correct. The HIP SDK math libraries are what make LLMs work on GPUs. My graphics card is RX 6600. It saddens me that it supports this graphics card. I tried many methods and failed.
Author
Owner

@diogovalada commented on GitHub (Sep 6, 2024):

I am trying in my laptop, with AMD RX 6700s and Windows 11, but it also doesn't use my GPU, only the CPU.
Ollama 0.3.9

When I set the debug environment variable and run ollama run qwen2-math, I get:

[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN] 2024/03/02 - 01:40:27 | 200 |      27.997µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/03/02 - 01:40:27 | 200 |     407.172µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/03/02 - 01:40:27 | 200 |     119.568µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/03/02 - 01:41:01 | 200 |  33.44071703s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/03/02 - 01:42:42 | 200 |         1m21s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/03/02 - 01:42:42 | 400 |          1m4s |       127.0.0.1 | POST     "/api/chat"

Do I need to install ROCM HIP or do the necessary resources already come bundled with Ollama?
Any ideas?

<!-- gh-comment-id:2334718225 --> @diogovalada commented on GitHub (Sep 6, 2024): I am trying in my laptop, with AMD RX 6700s and Windows 11, but it also doesn't use my GPU, only the CPU. Ollama 0.3.9 When I set the debug environment variable and run `ollama run qwen2-math`, I get: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) ``` [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN] 2024/03/02 - 01:40:27 | 200 | 27.997µs | 127.0.0.1 | HEAD "/" [GIN] 2024/03/02 - 01:40:27 | 200 | 407.172µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/03/02 - 01:40:27 | 200 | 119.568µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/03/02 - 01:41:01 | 200 | 33.44071703s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/02 - 01:42:42 | 200 | 1m21s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/02 - 01:42:42 | 400 | 1m4s | 127.0.0.1 | POST "/api/chat" ``` Do I need to install ROCM HIP or do the necessary resources already come bundled with Ollama? Any ideas?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27511