[GH-ISSUE #2166] ROCm container CUDA error #26998

Closed
opened 2026-04-22 03:50:23 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Eelviny on GitHub (Jan 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2166

Originally assigned to: @dhiltgen on GitHub.

I'm attempting to use an AMD Radeon RX 7900 XT on ollama v0.1.21 in a container that I built from the Dockerfile. I use podman to build and run containers, and my OS is Bluefin (Fedora Silverblue spin). I'm unsure whether this is an issue because I'm missing something on my host OS, or an issue with the container.

Here's my run command: podman run -d --privileged --device /dev/kfd:/dev/kfd -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_DEBUG=1 --name ollama localhost/ollama:v0.1.21

Ollama starts up fine, but when I attempt to run model codellama:13b-instruct, ollama crashes. I'm running it with OLLAMA_DEBUG=1, here's the full run:

https://gist.github.com/Eelviny/1d43d6324f68977bd1c653e0b78eca03

What's interesting is that if I run rocm-smi on the container, I get an error, so I suspect it might be more of a container issue than an ollama issue:

========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  [Model : Revision]    Temp    Power    Partitions      SCLK   MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
        Name (20 chars)       (Edge)  (Avg)    (Mem, Compute)                                                       
====================================================================================================================
Traceback (most recent call last):
  File "/usr/bin/rocm-smi", line 3926, in <module>
    showAllConcise(deviceList)
  File "/usr/bin/rocm-smi", line 1827, in showAllConcise
    zip(range(len(max_widths)), values['card%s' % (str(device))])), None)
  File "/usr/bin/rocm-smi", line 693, in printLog
    print(logstr + '\n', end='')
UnicodeEncodeError: 'ascii' codec can't encode character '\xb0' in position 34: ordinal not in range(128)

I then tried to build the main branch at f63dc2d (#2162) but this exhibited completely different behaviour - no logging whatsoever, when trying to do ollama run I would just get the spinning loading symbol forever.

Originally created by @Eelviny on GitHub (Jan 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2166 Originally assigned to: @dhiltgen on GitHub. I'm attempting to use an AMD Radeon RX 7900 XT on ollama v0.1.21 in a container that I built from the Dockerfile. I use podman to build and run containers, and my OS is Bluefin (Fedora Silverblue spin). I'm unsure whether this is an issue because I'm missing something on my host OS, or an issue with the container. Here's my run command: `podman run -d --privileged --device /dev/kfd:/dev/kfd -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_DEBUG=1 --name ollama localhost/ollama:v0.1.21` Ollama starts up fine, but when I attempt to run model codellama:13b-instruct, ollama crashes. I'm running it with OLLAMA_DEBUG=1, here's the full run: https://gist.github.com/Eelviny/1d43d6324f68977bd1c653e0b78eca03 What's interesting is that if I run `rocm-smi` on the container, I get an error, so I suspect it might be more of a container issue than an ollama issue: ``` ========================================= ROCm System Management Interface ========================================= =================================================== Concise Info =================================================== Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% Name (20 chars) (Edge) (Avg) (Mem, Compute) ==================================================================================================================== Traceback (most recent call last): File "/usr/bin/rocm-smi", line 3926, in <module> showAllConcise(deviceList) File "/usr/bin/rocm-smi", line 1827, in showAllConcise zip(range(len(max_widths)), values['card%s' % (str(device))])), None) File "/usr/bin/rocm-smi", line 693, in printLog print(logstr + '\n', end='') UnicodeEncodeError: 'ascii' codec can't encode character '\xb0' in position 34: ordinal not in range(128) ``` I then tried to build the main branch at f63dc2d (#2162) but this exhibited completely different behaviour - no logging whatsoever, when trying to do `ollama run` I would just get the spinning loading symbol forever.
Author
Owner

@Eelviny commented on GitHub (Jan 24, 2024):

Update: My last comment about the main branch not logging was because I didn't build the container with all libraries - I've now tried again without messing with the Dockerfile. Here's a new gist with the GPU logging also:

https://gist.github.com/Eelviny/a62845933b564128d502b62eb999eeb2

<!-- gh-comment-id:1908155994 --> @Eelviny commented on GitHub (Jan 24, 2024): Update: My last comment about the main branch not logging was because I didn't build the container with all libraries - I've now tried again without messing with the Dockerfile. Here's a new gist with the GPU logging also: https://gist.github.com/Eelviny/a62845933b564128d502b62eb999eeb2
Author
Owner

@dhiltgen commented on GitHub (Jan 25, 2024):

Thanks for the log!
discovered 2 ROCm GPU Devices likely indicates an iGPU, which is being tracked with #2054. Can you try the workaround noted in that issue and see if that works for your setup?

<!-- gh-comment-id:1909136570 --> @dhiltgen commented on GitHub (Jan 25, 2024): Thanks for the log! `discovered 2 ROCm GPU Devices` likely indicates an iGPU, which is being tracked with #2054. Can you try the workaround noted in that issue and see if that works for your setup?
Author
Owner

@Eelviny commented on GitHub (Jan 25, 2024):

Thanks! Didn't spot that issue. podman run -d --privileged --device /dev/kfd -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_DEBUG=1 -e ROCR_VISIBLE_DEVICES="0" --name ollama dhiltgen/ollama:0.1.21-rc4 is working great.

Closing this ticket as duplicate.

<!-- gh-comment-id:1910138938 --> @Eelviny commented on GitHub (Jan 25, 2024): Thanks! Didn't spot that issue. `podman run -d --privileged --device /dev/kfd -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_DEBUG=1 -e ROCR_VISIBLE_DEVICES="0" --name ollama dhiltgen/ollama:0.1.21-rc4` is working great. Closing this ticket as duplicate.
Author
Owner

@meminens commented on GitHub (Jan 30, 2024):

I installed rocm and ollama using pacman (instead of podman/docker) on Arch Linux? How can I set ROCR_VISIBLE_DEVICES to 0? I want ollama to use the dedicated GPU, AMD 7900 XTX instead of iGPU.

<!-- gh-comment-id:1917535086 --> @meminens commented on GitHub (Jan 30, 2024): I installed rocm and ollama using pacman (instead of podman/docker) on Arch Linux? How can I set `ROCR_VISIBLE_DEVICES` to `0`? I want ollama to use the dedicated GPU, AMD 7900 XTX instead of iGPU.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26998