feat: ROCm /w RAG and SentenceTransformers #3204

Closed
opened 2025-11-11 15:25:34 -06:00 by GiteaMirror · 22 comments
Owner

Originally created by @Schwenn2002 on GitHub (Jan 6, 2025).

Unfortunately, ROCM is only supported for ollama, but that works fine.

For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU.

In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM.

PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html

Originally created by @Schwenn2002 on GitHub (Jan 6, 2025). Unfortunately, ROCM is only supported for ollama, but that works fine. For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU. In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM. PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html
GiteaMirror added the good first issuehelp wanted labels 2025-11-11 15:25:34 -06:00
Author
Owner

@Schwenn2002 commented on GitHub (Jan 7, 2025):

Another Link:

https://rocm.blogs.amd.com/artificial-intelligence/sentence_transformers_amd/README.html

@Schwenn2002 commented on GitHub (Jan 7, 2025): Another Link: https://rocm.blogs.amd.com/artificial-intelligence/sentence_transformers_amd/README.html
Author
Owner

@Schwenn2002 commented on GitHub (Jan 17, 2025):

If I rebuild the Docker with the attached Dockerfile (docker-compose up --build open-webui-rocm) and then call the embedding model via the console, it is loaded with GPU, obviously ROCm in torch also uses CUDA.

docker-compose.yaml:

services:

  open-webui-rocm:
    build:
      context: .
      dockerfile: ./open-webui-rocm/Dockerfile
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui-rocm
    ipc: host
    volumes:
      - /data/open-webui-rocm/data:/app/backend/data
    ports:
      - 4443:8080
    environment:
      - 'WEBUI_NAME=xxxx'
      - 'WEBUI_URL=https://xxxx'
      - 'OLLAMA_BASE_URL=http://x.x.x.x:11434'
      - 'WEBUI_SECRET_KEY='
      - 'ROCR_VISIBLE_DEVICES=1'
      - 'HSA_OVERRIDE_GFX_VERSION=11.0.0'
      - 'HSA_ENABLE_SDMA=0'
      - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual'
      - 'GLOBAL_LOG_LEVEL=DEBUG'
    cap_add:
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - 993
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

Dockerfile

FROM ghcr.io/open-webui/open-webui:main

RUN apt update
RUN apt install -y wget
RUN apt install -y gpg

RUN apt install -y python3-setuptools python3-wheel libpython3.11

WORKDIR /app/backend
RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb
RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb
RUN amdgpu-install -y --usecase=rocm,hip,opencl --no-dkms

RUN pip3 uninstall -y torch torchvision torchaudio
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 --no-cache-dir

RUN python -c "from datasets import load_dataset"
RUN python -c "from sentence_transformers import InputExample, util"
RUN python -c "from torch.utils.data import DataLoader"
RUN python -c "from torch import nn"
RUN python -c "from sentence_transformers import losses"
RUN python -c "from sentence_transformers import SentenceTransformer, models"

ROCm Test with torch and cuda:

docker exec -it open-webui-rocm /bin/bash

root@3ac111a1e730:/app/backend# python
Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> import torch
>>> print("CUDA aktiviert:", torch.cuda.is_available())
CUDA aktiviert: True
>>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual')
>>> print("Model runs on GPU:", next(model.parameters()).is_cuda)
Model runs on GPU: True

Image
Image

open-webui

The CPU is still used via open-webui (why?). Is it hard coded that the model runs on the CPU?

@Schwenn2002 commented on GitHub (Jan 17, 2025): If I rebuild the Docker with the attached Dockerfile (docker-compose up --build open-webui-rocm) and then call the embedding model via the console, it is loaded with GPU, obviously ROCm in torch also uses CUDA. ### **docker-compose.yaml:** ``` services: open-webui-rocm: build: context: . dockerfile: ./open-webui-rocm/Dockerfile image: ghcr.io/open-webui/open-webui:main container_name: open-webui-rocm ipc: host volumes: - /data/open-webui-rocm/data:/app/backend/data ports: - 4443:8080 environment: - 'WEBUI_NAME=xxxx' - 'WEBUI_URL=https://xxxx' - 'OLLAMA_BASE_URL=http://x.x.x.x:11434' - 'WEBUI_SECRET_KEY=' - 'ROCR_VISIBLE_DEVICES=1' - 'HSA_OVERRIDE_GFX_VERSION=11.0.0' - 'HSA_ENABLE_SDMA=0' - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual' - 'GLOBAL_LOG_LEVEL=DEBUG' cap_add: - SYS_PTRACE security_opt: - seccomp=unconfined devices: - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri group_add: - video - 993 extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped ``` ### **Dockerfile** ``` FROM ghcr.io/open-webui/open-webui:main RUN apt update RUN apt install -y wget RUN apt install -y gpg RUN apt install -y python3-setuptools python3-wheel libpython3.11 WORKDIR /app/backend RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb RUN amdgpu-install -y --usecase=rocm,hip,opencl --no-dkms RUN pip3 uninstall -y torch torchvision torchaudio RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 --no-cache-dir RUN python -c "from datasets import load_dataset" RUN python -c "from sentence_transformers import InputExample, util" RUN python -c "from torch.utils.data import DataLoader" RUN python -c "from torch import nn" RUN python -c "from sentence_transformers import losses" RUN python -c "from sentence_transformers import SentenceTransformer, models" ``` ### **ROCm Test with torch and cuda:** ``` docker exec -it open-webui-rocm /bin/bash root@3ac111a1e730:/app/backend# python Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sentence_transformers import SentenceTransformer >>> import torch >>> print("CUDA aktiviert:", torch.cuda.is_available()) CUDA aktiviert: True >>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual') >>> print("Model runs on GPU:", next(model.parameters()).is_cuda) Model runs on GPU: True ``` ![Image](https://github.com/user-attachments/assets/c853ee85-7170-4138-bcc6-bc893136eece) ![Image](https://github.com/user-attachments/assets/e1fe865d-cb07-43c2-8e59-b54a170ffc5f) ### open-webui **The CPU is still used via open-webui (why?).** Is it hard coded that the model runs on the CPU?
Author
Owner

@Schwenn2002 commented on GitHub (Jan 18, 2025):

Additional information:

For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. This is also confirmed by the fact that the embedding model with CUDA support is loaded by SentenceTransformers.

How do I tell open-webui to use the GPU?

@Schwenn2002 commented on GitHub (Jan 18, 2025): **Additional information:** For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. This is also confirmed by the fact that the embedding model with CUDA support is loaded by SentenceTransformers. How do I tell open-webui to use the GPU?
Author
Owner

@oatmealm commented on GitHub (Jan 18, 2025):

USE_DOCKER_CUDA worked for me as in open-WebUI acknowledged and checked if the gpu is enabled, but on my hardware it only worked once and stopped after restarting the server and simply hangs after attempting to load the libraries. You also need to make sure your pip installed the corresponding libraries of PyTorch that are built for rocm of course.

@oatmealm commented on GitHub (Jan 18, 2025): USE_DOCKER_CUDA worked for me as in open-WebUI acknowledged and checked if the gpu is enabled, but on my hardware it only worked once and stopped after restarting the server and simply hangs after attempting to load the libraries. You also need to make sure your pip installed the corresponding libraries of PyTorch that are built for rocm of course.
Author
Owner

@Schwenn2002 commented on GitHub (Jan 18, 2025):

Thank you very much, now the Docker from open-webui actually runs with ROCm.

Perhaps an open-webui:rocm can be built?

The adjustments are specified in the Dockerfile above (usecase=rocm should be sufficient for amdgpu-install).

@Schwenn2002 commented on GitHub (Jan 18, 2025): Thank you very much, now the Docker from open-webui actually runs with ROCm. **Perhaps an open-webui:rocm can be built?** The adjustments are specified in the Dockerfile above (usecase=rocm should be sufficient for amdgpu-install).
Author
Owner

@oatmealm commented on GitHub (Jan 18, 2025):

You can also use it outside docker. Works the same. But as mentioned for me it caused problems. I think you can simply adapt the existing Dockerfile for this use case. Provided you also make sure to install the PyTorch libraries (torch, torchvision ... ) from the alternative pip index.

@oatmealm commented on GitHub (Jan 18, 2025): You can also use it outside docker. Works the same. But as mentioned for me it caused problems. I think you can simply adapt the existing Dockerfile for this use case. Provided you also make sure to install the PyTorch libraries (torch, torchvision ... ) from the alternative pip index.
Author
Owner

@Schwenn2002 commented on GitHub (Jan 18, 2025):

I have already customized Docker and integrated ROCm with the Dockerfile mentioned above.

It would only be good for updates if I didn't have to do a rebuild every time an update for open-webui is released.

With the Dockerfile above it should be easy to offer ROCm as a standard container (just like we do with CUDA like open-webui:rocm with a Tag). That would help everyone...

@Schwenn2002 commented on GitHub (Jan 18, 2025): I have already customized Docker and integrated ROCm with the Dockerfile mentioned above. It would only be good for updates if I didn't have to do a rebuild every time an update for open-webui is released. With the Dockerfile above it should be easy to offer ROCm as a standard container (just like we do with CUDA like open-webui:rocm with a Tag). That would help everyone...
Author
Owner

@oatmealm commented on GitHub (Jan 19, 2025):

BTW, just for completion for people who run opwenwebui on bare metal, downgrading pytorch-rocm to 2.4.x solved the problem and gpu doesn't segfault.

@oatmealm commented on GitHub (Jan 19, 2025): BTW, just for completion for people who run opwenwebui on bare metal, downgrading pytorch-rocm to 2.4.x solved the problem and gpu doesn't segfault.
Author
Owner

@Schwenn2002 commented on GitHub (Jan 20, 2025):

Attached are my updated files, the Docker must then be started with docker-compose up -d --build!

Testing ROCm in a container:

docker exec -it open-webui-rocm /bin/bash

root@3ac111a1e730:/app/backend# python
Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> import torch
>>> print("CUDA aktiviert:", torch.cuda.is_available())
CUDA aktiviert: True
>>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual')
>>> print("Model runs on GPU:", next(model.parameters()).is_cuda)
Model runs on GPU: True

Or try the ROCm commands in the container:

docker exec -it open-webui-rocm /bin/bash
rocm-smi

docker-compose.yaml

  open-webui-rocm:
    build:
      context: .
      dockerfile: ./open-webui-rocm/Dockerfile
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui-rocm
    ipc: host

    volumes:
      - /data/open-webui-rocm/data:/app/backend/data
      - /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro
      - /etc/localtime:/etc/localtime:ro
    depends_on:
      - redis
    ports:
      - 3000:8080
    environment:
      - 'USE_CUDA=true'
      - 'USE_CUDA_VER=rocm6.2'
      - 'USE_CUDA_DOCKER=true'
      - 'USE_CUDA_DOCKER_VER=rocm6.2'
      - 'WEBUI_NAME=ollama.xxx.xx'
      - 'WEBUI_URL=https://ollama.xxx.xx'
      - 'OLLAMA_BASE_URL=http://x.x.x.x:11434'
      - 'WEBUI_SECRET_KEY='
      - 'REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt'
      # only one ROCM-Device !!!       
      - 'ROCR_VISIBLE_DEVICES=1'
      - 'HSA_OVERRIDE_GFX_VERSION=11.0.0'
      - 'HSA_ENABLE_SDMA=0'
      - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual'
      - 'WEBUI_SESSION_COOKIE_SECURE=true'
      # only if redis enabled
      #- 'ENABLE_WEBSOCKET_SUPPORT=true'
      #- 'WEBSOCKET_MANAGER=redis'
      #- 'WEBSOCKET_REDIS_URL=redis://redis:6379/2'
      - 'GLOBAL_LOG_LEVEL=ERROR'
    cap_add:
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - 993
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

Dockerfile

FROM ghcr.io/open-webui/open-webui:main

# Start
RUN apt update
RUN apt install -y wget
RUN apt install -y gpg

RUN apt install -y python3-setuptools python3-wheel libpython3.11

WORKDIR /app/backend
RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb
RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb
RUN amdgpu-install -y --usecase=rocm,hip --no-dkms

# Install pytorch ROCM
# Prüfe Torch-Version und installiere ROCm, falls '+cpu' gefunden wird
RUN if pip show torch | grep 'Version' | grep -q '+cpu'; then \
        pip3 uninstall -y torch torchvision torchaudio && \
        pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir; \       
    else \
        echo "nothing to do..."; \
    fi
@Schwenn2002 commented on GitHub (Jan 20, 2025): Attached are my updated files, the Docker must then be started with docker-compose up -d --build! **Testing ROCm in a container:** ``` docker exec -it open-webui-rocm /bin/bash root@3ac111a1e730:/app/backend# python Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sentence_transformers import SentenceTransformer >>> import torch >>> print("CUDA aktiviert:", torch.cuda.is_available()) CUDA aktiviert: True >>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual') >>> print("Model runs on GPU:", next(model.parameters()).is_cuda) Model runs on GPU: True ``` Or try the ROCm commands in the container: ``` docker exec -it open-webui-rocm /bin/bash rocm-smi ``` **docker-compose.yaml** ``` open-webui-rocm: build: context: . dockerfile: ./open-webui-rocm/Dockerfile image: ghcr.io/open-webui/open-webui:main container_name: open-webui-rocm ipc: host volumes: - /data/open-webui-rocm/data:/app/backend/data - /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro - /etc/localtime:/etc/localtime:ro depends_on: - redis ports: - 3000:8080 environment: - 'USE_CUDA=true' - 'USE_CUDA_VER=rocm6.2' - 'USE_CUDA_DOCKER=true' - 'USE_CUDA_DOCKER_VER=rocm6.2' - 'WEBUI_NAME=ollama.xxx.xx' - 'WEBUI_URL=https://ollama.xxx.xx' - 'OLLAMA_BASE_URL=http://x.x.x.x:11434' - 'WEBUI_SECRET_KEY=' - 'REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt' # only one ROCM-Device !!! - 'ROCR_VISIBLE_DEVICES=1' - 'HSA_OVERRIDE_GFX_VERSION=11.0.0' - 'HSA_ENABLE_SDMA=0' - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual' - 'WEBUI_SESSION_COOKIE_SECURE=true' # only if redis enabled #- 'ENABLE_WEBSOCKET_SUPPORT=true' #- 'WEBSOCKET_MANAGER=redis' #- 'WEBSOCKET_REDIS_URL=redis://redis:6379/2' - 'GLOBAL_LOG_LEVEL=ERROR' cap_add: - SYS_PTRACE security_opt: - seccomp=unconfined devices: - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri group_add: - video - 993 extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped ``` **Dockerfile** ``` FROM ghcr.io/open-webui/open-webui:main # Start RUN apt update RUN apt install -y wget RUN apt install -y gpg RUN apt install -y python3-setuptools python3-wheel libpython3.11 WORKDIR /app/backend RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb RUN amdgpu-install -y --usecase=rocm,hip --no-dkms # Install pytorch ROCM # Prüfe Torch-Version und installiere ROCm, falls '+cpu' gefunden wird RUN if pip show torch | grep 'Version' | grep -q '+cpu'; then \ pip3 uninstall -y torch torchvision torchaudio && \ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir; \ else \ echo "nothing to do..."; \ fi ```
Author
Owner

@Eudox67 commented on GitHub (Jan 25, 2025):

Additional information:

For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. ...

Release notes from AMD ROCm Docs, say "If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, continue to use ROCm 6.2.3."

So does this mean the solution here will not work for those of us running a local install with a Radeon gfx1100 that is also used as a display?

@Eudox67 commented on GitHub (Jan 25, 2025): > **Additional information:** > > For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. ... Release notes from [AMD ROCm Docs](https://rocm.docs.amd.com/en/latest/about/release-notes.html), say "If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, continue to use ROCm 6.2.3." So does this mean the solution here will not work for those of us running a local install with a Radeon gfx1100 that is also used as a display?
Author
Owner

@Schwenn2002 commented on GitHub (Jan 26, 2025):

I am currently using gfx1100 (Radeon 7900xtx and Radeon Pro W7900 in multigpu setup) and the above configuration is working.

@Schwenn2002 commented on GitHub (Jan 26, 2025): I am currently using gfx1100 (Radeon 7900xtx and Radeon Pro W7900 in multigpu setup) and the above configuration is working.
Author
Owner

@Eudox67 commented on GitHub (Jan 26, 2025):

I only have one card, so I guess trying finds out. Thanks!

@Eudox67 commented on GitHub (Jan 26, 2025): I only have one card, so I guess trying finds out. Thanks!
Author
Owner

@Schwenn2002 commented on GitHub (Jan 26, 2025):

The host system must also have ROCm installed (test with rocm-smi), then change the following line for Docker, since there is only one GPU in the system:
- 'ROCR_VISIBLE_DEVICES=0'

@Schwenn2002 commented on GitHub (Jan 26, 2025): The host system must also have ROCm installed (test with rocm-smi), then change the following line for Docker, since there is only one GPU in the system: `` - 'ROCR_VISIBLE_DEVICES=0' ``
Author
Owner

@Eudox67 commented on GitHub (Jan 26, 2025):

I am using ROCm 6.2.3 currently, which is why I asked the question above. I also notice that you are using jammy. Is that a requirement, or just particular to your system? I am on noble.

@Eudox67 commented on GitHub (Jan 26, 2025): I am using ROCm 6.2.3 currently, which is why I asked the question above. I also notice that you are using jammy. Is that a requirement, or just particular to your system? I am on noble.
Author
Owner

@Schwenn2002 commented on GitHub (Jan 26, 2025):

My host is running Ubuntu 24.04 LTS (Noble Numbat) and ROCm 6.3.1; the Docker is Debian 12 (hence jammy in open-webui docker).

@Schwenn2002 commented on GitHub (Jan 26, 2025): My host is running Ubuntu 24.04 LTS (Noble Numbat) and ROCm 6.3.1; the Docker is Debian 12 (hence jammy in open-webui docker).
Author
Owner

@Eudox67 commented on GitHub (Jan 26, 2025):

Got it!

@Eudox67 commented on GitHub (Jan 26, 2025): Got it!
Author
Owner

@mrwsl commented on GitHub (Jan 30, 2025):

@Schwenn2002 Did you try using RAG_EMBEDDING_ENGINE: ollama to let ollama embed RAG?

@mrwsl commented on GitHub (Jan 30, 2025): @Schwenn2002 Did you try using `RAG_EMBEDDING_ENGINE: ollama` to let ollama embed RAG?
Author
Owner

@Schwenn2002 commented on GitHub (Jan 30, 2025):

Hi! Yes, I tried that; ollama is significantly slower when it comes to embeddings or searching in RAG. ROCm in Docker is the choice for best performance.

@Schwenn2002 commented on GitHub (Jan 30, 2025): Hi! Yes, I tried that; ollama is significantly slower when it comes to embeddings or searching in RAG. ROCm in Docker is the choice for best performance.
Author
Owner

@Schwenn2002 commented on GitHub (Feb 5, 2025):

Does anyone have an idea how to solve this warning?
/usr/local/lib/python3.11/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:310.)

When I include ROCm 6.3.2 with the latest Python libraries from the Radeon repository I also get the error:
RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!

The error "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" occurs when PyTorch tries to use the hipBLASLt library on an unsupported GPU architecture. This problem has been observed particularly on GPUs of the gfx1100 architecture, such as the AMD Radeon RX 7900 XTX, in conjunction with ROCm 6.2.2 and PyTorch versions 2.5 and above.

If anyone has an idea here, I would be grateful.

@Schwenn2002 commented on GitHub (Feb 5, 2025): Does anyone have an idea how to solve this warning? `/usr/local/lib/python3.11/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:310.)` When I include ROCm 6.3.2 with the latest Python libraries from the Radeon repository I also get the error: `RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!` The error "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" occurs when PyTorch tries to use the hipBLASLt library on an unsupported GPU architecture. This problem has been observed particularly on GPUs of the gfx1100 architecture, such as the AMD Radeon RX 7900 XTX, in conjunction with ROCm 6.2.2 and PyTorch versions 2.5 and above. If anyone has an idea here, I would be grateful.
Author
Owner

@jacazek commented on GitHub (Feb 9, 2025):

@Schwenn2002 Did you try disabling hipblaslt by setting the environment variable TORCH_BLAS_PREFER_HIPBLASLT=0 . While not technically solving the lack of hipblaslt support, should prevent attempts to use hipblaslt and fallback to hipblas.

@jacazek commented on GitHub (Feb 9, 2025): @Schwenn2002 Did you try disabling hipblaslt by setting the environment variable `TORCH_BLAS_PREFER_HIPBLASLT=0` . While not technically solving the lack of hipblaslt support, should prevent attempts to use hipblaslt and fallback to hipblas.
Author
Owner

@Schwenn2002 commented on GitHub (Feb 9, 2025):

Yes - I tried this: TORCH_BLAS_PREFER_HIPBLASLT=0

It just seems that the parameter doesn't work with the Radeon Repository.

I'll stick with the official pytorch release and then with ROCm 6.2.4:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir;

@Schwenn2002 commented on GitHub (Feb 9, 2025): Yes - I tried this: `TORCH_BLAS_PREFER_HIPBLASLT=0` It just seems that the parameter doesn't work with the Radeon Repository. I'll stick with the official pytorch release and then with ROCm 6.2.4: `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir; `
Author
Owner

@Schwenn2002 commented on GitHub (Feb 18, 2025):

audio transcription with ROCm:

The following error occurs during audio transcription with ROCm:

WARNI [python_multipart.multipart] Skipping data after last boundary
ERROR [open_webui.routers.audio] CUDA failed with error CUDA driver version is insufficient for CUDA runtime version
Traceback (most recent call last):
  File "/app/backend/open_webui/routers/audio.py", line 107, in set_faster_whisper_model
    whisper_model = WhisperModel(**faster_whisper_kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 647, in __init__
    self.model = ctranslate2.models.Whisper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

This is obviously because ctranslate2 does not work with the ROCm libraries.

It would be helpful for Docker to get its own variable for ROCm so that the audio transcription runs on the CPU, analogous to USE_CUDA_DOCKER=true with USE_ROCM_DOCKER=true. The variable DEVICE-TYPE must then be extended to ROCM accordingly.

In the file backend/open_webui/routers/audio.py the following would have to be changed in line 100: "device" = "cpu" if DEVICE_TYPE in ["rocm"] or DEVICE_TYPE not in ["cuda", "cpu"] else DEVICE_TYPE,

Overall it looks like this:

def set_faster_whisper_model(model: str, auto_update: bool = False):
    whisper_model = None
    if model:
        from faster_whisper import WhisperModel

        faster_whisper_kwargs = {
            "model_size_or_path": model,
            "device" = "cpu" if DEVICE_TYPE in ["rocm"] or DEVICE_TYPE not in ["cuda", "cpu"] else DEVICE_TYPE,
            "compute_type": "int8",
            "download_root": WHISPER_MODEL_DIR,
            "local_files_only": not auto_update,
        }

        try:
            whisper_model = WhisperModel(**faster_whisper_kwargs)
        except Exception:
            log.warning(
                "WhisperModel initialization failed, attempting download with local_files_only=False"
            )
            faster_whisper_kwargs["local_files_only"] = False
            whisper_model = WhisperModel(**faster_whisper_kwargs)
    return whisper_model

Currently I have adapted the code so that the CPU is always used: "device" = "cpu"

@Schwenn2002 commented on GitHub (Feb 18, 2025): ### audio transcription with ROCm: The following error occurs during audio transcription with ROCm: ``` WARNI [python_multipart.multipart] Skipping data after last boundary ERROR [open_webui.routers.audio] CUDA failed with error CUDA driver version is insufficient for CUDA runtime version Traceback (most recent call last): File "/app/backend/open_webui/routers/audio.py", line 107, in set_faster_whisper_model whisper_model = WhisperModel(**faster_whisper_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 647, in __init__ self.model = ctranslate2.models.Whisper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version ``` This is obviously because ctranslate2 does not work with the ROCm libraries. It would be helpful for Docker to get its own variable for ROCm so that the audio transcription runs on the CPU, analogous to `USE_CUDA_DOCKER=true` with `USE_ROCM_DOCKER=true`. The variable DEVICE-TYPE must then be extended to ROCM accordingly. In the file backend/open_webui/routers/audio.py the following would have to be changed in line 100: `"device" = "cpu" if DEVICE_TYPE in ["rocm"] or DEVICE_TYPE not in ["cuda", "cpu"] else DEVICE_TYPE,` Overall it looks like this: ``` def set_faster_whisper_model(model: str, auto_update: bool = False): whisper_model = None if model: from faster_whisper import WhisperModel faster_whisper_kwargs = { "model_size_or_path": model, "device" = "cpu" if DEVICE_TYPE in ["rocm"] or DEVICE_TYPE not in ["cuda", "cpu"] else DEVICE_TYPE, "compute_type": "int8", "download_root": WHISPER_MODEL_DIR, "local_files_only": not auto_update, } try: whisper_model = WhisperModel(**faster_whisper_kwargs) except Exception: log.warning( "WhisperModel initialization failed, attempting download with local_files_only=False" ) faster_whisper_kwargs["local_files_only"] = False whisper_model = WhisperModel(**faster_whisper_kwargs) return whisper_model ``` Currently I have adapted the code so that the CPU is always used: `"device" = "cpu"`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#3204