feat: ROCm /w RAG and SentenceTransformers #3204

https://rocm.blogs.amd.com/artificial-intelligence/sentence_transformers_amd/README.html

@Schwenn2002 commented on GitHub (Jan 7, 2025):

Another Link:

@Schwenn2002 commented on GitHub (Jan 7, 2025): Another Link: https://rocm.blogs.amd.com/artificial-intelligence/sentence_transformers_amd/README.html

GiteaMirror commented

@Schwenn2002 commented on GitHub (Jan 17, 2025):

If I rebuild the Docker with the attached Dockerfile (docker-compose up --build open-webui-rocm) and then call the embedding model via the console, it is loaded with GPU, obviously ROCm in torch also uses CUDA.

docker-compose.yaml:

services:

  open-webui-rocm:
    build:
      context: .
      dockerfile: ./open-webui-rocm/Dockerfile
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui-rocm
    ipc: host
    volumes:
      - /data/open-webui-rocm/data:/app/backend/data
    ports:
      - 4443:8080
    environment:
      - 'WEBUI_NAME=xxxx'
      - 'WEBUI_URL=https://xxxx'
      - 'OLLAMA_BASE_URL=http://x.x.x.x:11434'
      - 'WEBUI_SECRET_KEY='
      - 'ROCR_VISIBLE_DEVICES=1'
      - 'HSA_OVERRIDE_GFX_VERSION=11.0.0'
      - 'HSA_ENABLE_SDMA=0'
      - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual'
      - 'GLOBAL_LOG_LEVEL=DEBUG'
    cap_add:
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - 993
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

Dockerfile

FROM ghcr.io/open-webui/open-webui:main

RUN apt update
RUN apt install -y wget
RUN apt install -y gpg

RUN apt install -y python3-setuptools python3-wheel libpython3.11

WORKDIR /app/backend
RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb
RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb
RUN amdgpu-install -y --usecase=rocm,hip,opencl --no-dkms

RUN pip3 uninstall -y torch torchvision torchaudio
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 --no-cache-dir

RUN python -c "from datasets import load_dataset"
RUN python -c "from sentence_transformers import InputExample, util"
RUN python -c "from torch.utils.data import DataLoader"
RUN python -c "from torch import nn"
RUN python -c "from sentence_transformers import losses"
RUN python -c "from sentence_transformers import SentenceTransformer, models"

ROCm Test with torch and cuda:

docker exec -it open-webui-rocm /bin/bash

root@3ac111a1e730:/app/backend# python
Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> import torch
>>> print("CUDA aktiviert:", torch.cuda.is_available())
CUDA aktiviert: True
>>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual')
>>> print("Model runs on GPU:", next(model.parameters()).is_cuda)
Model runs on GPU: True

open-webui

The CPU is still used via open-webui (why?). Is it hard coded that the model runs on the CPU?

@Schwenn2002 commented on GitHub (Jan 17, 2025): If I rebuild the Docker with the attached Dockerfile (docker-compose up --build open-webui-rocm) and then call the embedding model via the console, it is loaded with GPU, obviously ROCm in torch also uses CUDA. ### **docker-compose.yaml:** ``` services: open-webui-rocm: build: context: . dockerfile: ./open-webui-rocm/Dockerfile image: ghcr.io/open-webui/open-webui:main container_name: open-webui-rocm ipc: host volumes: - /data/open-webui-rocm/data:/app/backend/data ports: - 4443:8080 environment: - 'WEBUI_NAME=xxxx' - 'WEBUI_URL=https://xxxx' - 'OLLAMA_BASE_URL=http://x.x.x.x:11434' - 'WEBUI_SECRET_KEY=' - 'ROCR_VISIBLE_DEVICES=1' - 'HSA_OVERRIDE_GFX_VERSION=11.0.0' - 'HSA_ENABLE_SDMA=0' - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual' - 'GLOBAL_LOG_LEVEL=DEBUG' cap_add: - SYS_PTRACE security_opt: - seccomp=unconfined devices: - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri group_add: - video - 993 extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped ``` ### **Dockerfile** ``` FROM ghcr.io/open-webui/open-webui:main RUN apt update RUN apt install -y wget RUN apt install -y gpg RUN apt install -y python3-setuptools python3-wheel libpython3.11 WORKDIR /app/backend RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb RUN amdgpu-install -y --usecase=rocm,hip,opencl --no-dkms RUN pip3 uninstall -y torch torchvision torchaudio RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 --no-cache-dir RUN python -c "from datasets import load_dataset" RUN python -c "from sentence_transformers import InputExample, util" RUN python -c "from torch.utils.data import DataLoader" RUN python -c "from torch import nn" RUN python -c "from sentence_transformers import losses" RUN python -c "from sentence_transformers import SentenceTransformer, models" ``` ### **ROCm Test with torch and cuda:** ``` docker exec -it open-webui-rocm /bin/bash root@3ac111a1e730:/app/backend# python Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sentence_transformers import SentenceTransformer >>> import torch >>> print("CUDA aktiviert:", torch.cuda.is_available()) CUDA aktiviert: True >>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual') >>> print("Model runs on GPU:", next(model.parameters()).is_cuda) Model runs on GPU: True ``` ![Image](https://github.com/user-attachments/assets/c853ee85-7170-4138-bcc6-bc893136eece) ![Image](https://github.com/user-attachments/assets/e1fe865d-cb07-43c2-8e59-b54a170ffc5f) ### open-webui **The CPU is still used via open-webui (why?).** Is it hard coded that the model runs on the CPU?

GiteaMirror commented

@Schwenn2002 commented on GitHub (Jan 18, 2025):

Additional information:

For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. This is also confirmed by the fact that the embedding model with CUDA support is loaded by SentenceTransformers.

How do I tell open-webui to use the GPU?

@Schwenn2002 commented on GitHub (Jan 18, 2025): **Additional information:** For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. This is also confirmed by the fact that the embedding model with CUDA support is loaded by SentenceTransformers. How do I tell open-webui to use the GPU?

GiteaMirror commented

2025-11-11 15:25:36 -06:00

@oatmealm commented on GitHub (Jan 18, 2025):

USE_DOCKER_CUDA worked for me as in open-WebUI acknowledged and checked if the gpu is enabled, but on my hardware it only worked once and stopped after restarting the server and simply hangs after attempting to load the libraries. You also need to make sure your pip installed the corresponding libraries of PyTorch that are built for rocm of course.

@oatmealm commented on GitHub (Jan 18, 2025): USE_DOCKER_CUDA worked for me as in open-WebUI acknowledged and checked if the gpu is enabled, but on my hardware it only worked once and stopped after restarting the server and simply hangs after attempting to load the libraries. You also need to make sure your pip installed the corresponding libraries of PyTorch that are built for rocm of course.

GiteaMirror commented

@Schwenn2002 commented on GitHub (Jan 18, 2025):

Thank you very much, now the Docker from open-webui actually runs with ROCm.

Perhaps an open-webui:rocm can be built?

The adjustments are specified in the Dockerfile above (usecase=rocm should be sufficient for amdgpu-install).

@Schwenn2002 commented on GitHub (Jan 18, 2025): Thank you very much, now the Docker from open-webui actually runs with ROCm. **Perhaps an open-webui:rocm can be built?** The adjustments are specified in the Dockerfile above (usecase=rocm should be sufficient for amdgpu-install).

GiteaMirror commented

2025-11-11 15:25:37 -06:00

@oatmealm commented on GitHub (Jan 18, 2025):

You can also use it outside docker. Works the same. But as mentioned for me it caused problems. I think you can simply adapt the existing Dockerfile for this use case. Provided you also make sure to install the PyTorch libraries (torch, torchvision ... ) from the alternative pip index.

@oatmealm commented on GitHub (Jan 18, 2025): You can also use it outside docker. Works the same. But as mentioned for me it caused problems. I think you can simply adapt the existing Dockerfile for this use case. Provided you also make sure to install the PyTorch libraries (torch, torchvision ... ) from the alternative pip index.

GiteaMirror commented

2025-11-11 15:25:37 -06:00

@Schwenn2002 commented on GitHub (Jan 18, 2025):

I have already customized Docker and integrated ROCm with the Dockerfile mentioned above.

It would only be good for updates if I didn't have to do a rebuild every time an update for open-webui is released.

With the Dockerfile above it should be easy to offer ROCm as a standard container (just like we do with CUDA like open-webui:rocm with a Tag). That would help everyone...

@Schwenn2002 commented on GitHub (Jan 18, 2025): I have already customized Docker and integrated ROCm with the Dockerfile mentioned above. It would only be good for updates if I didn't have to do a rebuild every time an update for open-webui is released. With the Dockerfile above it should be easy to offer ROCm as a standard container (just like we do with CUDA like open-webui:rocm with a Tag). That would help everyone...

GiteaMirror commented

2025-11-11 15:25:38 -06:00

@oatmealm commented on GitHub (Jan 19, 2025):

BTW, just for completion for people who run opwenwebui on bare metal, downgrading pytorch-rocm to 2.4.x solved the problem and gpu doesn't segfault.

@oatmealm commented on GitHub (Jan 19, 2025): BTW, just for completion for people who run opwenwebui on bare metal, downgrading pytorch-rocm to 2.4.x solved the problem and gpu doesn't segfault.

GiteaMirror commented

2025-11-11 15:25:38 -06:00

@Schwenn2002 commented on GitHub (Jan 20, 2025):

Attached are my updated files, the Docker must then be started with docker-compose up -d --build!

Testing ROCm in a container:

docker exec -it open-webui-rocm /bin/bash

root@3ac111a1e730:/app/backend# python
Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> import torch
>>> print("CUDA aktiviert:", torch.cuda.is_available())
CUDA aktiviert: True
>>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual')
>>> print("Model runs on GPU:", next(model.parameters()).is_cuda)
Model runs on GPU: True

Or try the ROCm commands in the container:

docker exec -it open-webui-rocm /bin/bash
rocm-smi

docker-compose.yaml

  open-webui-rocm:
    build:
      context: .
      dockerfile: ./open-webui-rocm/Dockerfile
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui-rocm
    ipc: host

    volumes:
      - /data/open-webui-rocm/data:/app/backend/data
      - /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro
      - /etc/localtime:/etc/localtime:ro
    depends_on:
      - redis
    ports:
      - 3000:8080
    environment:
      - 'USE_CUDA=true'
      - 'USE_CUDA_VER=rocm6.2'
      - 'USE_CUDA_DOCKER=true'
      - 'USE_CUDA_DOCKER_VER=rocm6.2'
      - 'WEBUI_NAME=ollama.xxx.xx'
      - 'WEBUI_URL=https://ollama.xxx.xx'
      - 'OLLAMA_BASE_URL=http://x.x.x.x:11434'
      - 'WEBUI_SECRET_KEY='
      - 'REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt'
      # only one ROCM-Device !!!       
      - 'ROCR_VISIBLE_DEVICES=1'
      - 'HSA_OVERRIDE_GFX_VERSION=11.0.0'
      - 'HSA_ENABLE_SDMA=0'
      - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual'
      - 'WEBUI_SESSION_COOKIE_SECURE=true'
      # only if redis enabled
      #- 'ENABLE_WEBSOCKET_SUPPORT=true'
      #- 'WEBSOCKET_MANAGER=redis'
      #- 'WEBSOCKET_REDIS_URL=redis://redis:6379/2'
      - 'GLOBAL_LOG_LEVEL=ERROR'
    cap_add:
      - SYS_PTRACE
    security_opt:
      - seccomp=unconfined
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - 993
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

Dockerfile

FROM ghcr.io/open-webui/open-webui:main

# Start
RUN apt update
RUN apt install -y wget
RUN apt install -y gpg

RUN apt install -y python3-setuptools python3-wheel libpython3.11

WORKDIR /app/backend
RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb
RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb
RUN amdgpu-install -y --usecase=rocm,hip --no-dkms

# Install pytorch ROCM
# Prüfe Torch-Version und installiere ROCm, falls '+cpu' gefunden wird
RUN if pip show torch | grep 'Version' | grep -q '+cpu'; then \
        pip3 uninstall -y torch torchvision torchaudio && \
        pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir; \       
    else \
        echo "nothing to do..."; \
    fi

@Schwenn2002 commented on GitHub (Jan 20, 2025): Attached are my updated files, the Docker must then be started with docker-compose up -d --build! **Testing ROCm in a container:** ``` docker exec -it open-webui-rocm /bin/bash root@3ac111a1e730:/app/backend# python Python 3.11.11 (main, Dec 24 2024, 22:24:26) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sentence_transformers import SentenceTransformer >>> import torch >>> print("CUDA aktiviert:", torch.cuda.is_available()) CUDA aktiviert: True >>> model = SentenceTransformer('ibm-granite/granite-embedding-278m-multilingual') >>> print("Model runs on GPU:", next(model.parameters()).is_cuda) Model runs on GPU: True ``` Or try the ROCm commands in the container: ``` docker exec -it open-webui-rocm /bin/bash rocm-smi ``` **docker-compose.yaml** ``` open-webui-rocm: build: context: . dockerfile: ./open-webui-rocm/Dockerfile image: ghcr.io/open-webui/open-webui:main container_name: open-webui-rocm ipc: host volumes: - /data/open-webui-rocm/data:/app/backend/data - /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro - /etc/localtime:/etc/localtime:ro depends_on: - redis ports: - 3000:8080 environment: - 'USE_CUDA=true' - 'USE_CUDA_VER=rocm6.2' - 'USE_CUDA_DOCKER=true' - 'USE_CUDA_DOCKER_VER=rocm6.2' - 'WEBUI_NAME=ollama.xxx.xx' - 'WEBUI_URL=https://ollama.xxx.xx' - 'OLLAMA_BASE_URL=http://x.x.x.x:11434' - 'WEBUI_SECRET_KEY=' - 'REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt' # only one ROCM-Device !!! - 'ROCR_VISIBLE_DEVICES=1' - 'HSA_OVERRIDE_GFX_VERSION=11.0.0' - 'HSA_ENABLE_SDMA=0' - 'RAG_EMBEDDING_MODEL=ibm-granite/granite-embedding-278m-multilingual' - 'WEBUI_SESSION_COOKIE_SECURE=true' # only if redis enabled #- 'ENABLE_WEBSOCKET_SUPPORT=true' #- 'WEBSOCKET_MANAGER=redis' #- 'WEBSOCKET_REDIS_URL=redis://redis:6379/2' - 'GLOBAL_LOG_LEVEL=ERROR' cap_add: - SYS_PTRACE security_opt: - seccomp=unconfined devices: - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri group_add: - video - 993 extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped ``` **Dockerfile** ``` FROM ghcr.io/open-webui/open-webui:main # Start RUN apt update RUN apt install -y wget RUN apt install -y gpg RUN apt install -y python3-setuptools python3-wheel libpython3.11 WORKDIR /app/backend RUN wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.60301-1_all.deb RUN apt install -y ./amdgpu-install_6.3.60301-1_all.deb RUN amdgpu-install -y --usecase=rocm,hip --no-dkms # Install pytorch ROCM # Prüfe Torch-Version und installiere ROCm, falls '+cpu' gefunden wird RUN if pip show torch | grep 'Version' | grep -q '+cpu'; then \ pip3 uninstall -y torch torchvision torchaudio && \ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir; \ else \ echo "nothing to do..."; \ fi ```

GiteaMirror commented

2025-11-11 15:25:38 -06:00

@Eudox67 commented on GitHub (Jan 25, 2025):

Additional information:

For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. ...

Release notes from AMD ROCm Docs, say "If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, continue to use ROCm 6.2.3."

So does this mean the solution here will not work for those of us running a local install with a Radeon gfx1100 that is also used as a display?

@Eudox67 commented on GitHub (Jan 25, 2025): > **Additional information:** > > For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. ... Release notes from [AMD ROCm Docs](https://rocm.docs.amd.com/en/latest/about/release-notes.html), say "If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, continue to use ROCm 6.2.3." So does this mean the solution here will not work for those of us running a local install with a Radeon gfx1100 that is also used as a display?

GiteaMirror commented

2025-11-11 15:25:39 -06:00

@Schwenn2002 commented on GitHub (Jan 26, 2025):

I am currently using gfx1100 (Radeon 7900xtx and Radeon Pro W7900 in multigpu setup) and the above configuration is working.

@Schwenn2002 commented on GitHub (Jan 26, 2025): I am currently using gfx1100 (Radeon 7900xtx and Radeon Pro W7900 in multigpu setup) and the above configuration is working.

GiteaMirror commented

2025-11-11 15:25:39 -06:00

@Eudox67 commented on GitHub (Jan 26, 2025):

I only have one card, so I guess trying finds out. Thanks!

@Eudox67 commented on GitHub (Jan 26, 2025): I only have one card, so I guess trying finds out. Thanks!

GiteaMirror commented

2025-11-11 15:25:39 -06:00

@Schwenn2002 commented on GitHub (Jan 26, 2025):

The host system must also have ROCm installed (test with rocm-smi), then change the following line for Docker, since there is only one GPU in the system:
- 'ROCR_VISIBLE_DEVICES=0'

@Schwenn2002 commented on GitHub (Jan 26, 2025): The host system must also have ROCm installed (test with rocm-smi), then change the following line for Docker, since there is only one GPU in the system: `` - 'ROCR_VISIBLE_DEVICES=0' ``

GiteaMirror commented

@Eudox67 commented on GitHub (Jan 26, 2025):

I am using ROCm 6.2.3 currently, which is why I asked the question above. I also notice that you are using jammy. Is that a requirement, or just particular to your system? I am on noble.

@Eudox67 commented on GitHub (Jan 26, 2025): I am using ROCm 6.2.3 currently, which is why I asked the question above. I also notice that you are using jammy. Is that a requirement, or just particular to your system? I am on noble.

GiteaMirror commented

@Schwenn2002 commented on GitHub (Jan 26, 2025):

My host is running Ubuntu 24.04 LTS (Noble Numbat) and ROCm 6.3.1; the Docker is Debian 12 (hence jammy in open-webui docker).

@Schwenn2002 commented on GitHub (Jan 26, 2025): My host is running Ubuntu 24.04 LTS (Noble Numbat) and ROCm 6.3.1; the Docker is Debian 12 (hence jammy in open-webui docker).

GiteaMirror commented

@Eudox67 commented on GitHub (Jan 26, 2025):

Got it!

@Eudox67 commented on GitHub (Jan 26, 2025): Got it!

GiteaMirror commented

@mrwsl commented on GitHub (Jan 30, 2025):

@Schwenn2002 Did you try using RAG_EMBEDDING_ENGINE: ollama to let ollama embed RAG?

@mrwsl commented on GitHub (Jan 30, 2025): @Schwenn2002 Did you try using `RAG_EMBEDDING_ENGINE: ollama` to let ollama embed RAG?

GiteaMirror commented

@Schwenn2002 commented on GitHub (Jan 30, 2025):

Hi! Yes, I tried that; ollama is significantly slower when it comes to embeddings or searching in RAG. ROCm in Docker is the choice for best performance.

@Schwenn2002 commented on GitHub (Jan 30, 2025): Hi! Yes, I tried that; ollama is significantly slower when it comes to embeddings or searching in RAG. ROCm in Docker is the choice for best performance.

GiteaMirror commented

@Schwenn2002 commented on GitHub (Feb 5, 2025):

Does anyone have an idea how to solve this warning?
/usr/local/lib/python3.11/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:310.)

When I include ROCm 6.3.2 with the latest Python libraries from the Radeon repository I also get the error:
RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!

The error "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" occurs when PyTorch tries to use the hipBLASLt library on an unsupported GPU architecture. This problem has been observed particularly on GPUs of the gfx1100 architecture, such as the AMD Radeon RX 7900 XTX, in conjunction with ROCm 6.2.2 and PyTorch versions 2.5 and above.

If anyone has an idea here, I would be grateful.

@Schwenn2002 commented on GitHub (Feb 5, 2025): Does anyone have an idea how to solve this warning? `/usr/local/lib/python3.11/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:310.)` When I include ROCm 6.3.2 with the latest Python libraries from the Radeon repository I also get the error: `RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!` The error "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" occurs when PyTorch tries to use the hipBLASLt library on an unsupported GPU architecture. This problem has been observed particularly on GPUs of the gfx1100 architecture, such as the AMD Radeon RX 7900 XTX, in conjunction with ROCm 6.2.2 and PyTorch versions 2.5 and above. If anyone has an idea here, I would be grateful.

GiteaMirror commented

@jacazek commented on GitHub (Feb 9, 2025):

@Schwenn2002 Did you try disabling hipblaslt by setting the environment variable TORCH_BLAS_PREFER_HIPBLASLT=0 . While not technically solving the lack of hipblaslt support, should prevent attempts to use hipblaslt and fallback to hipblas.

@jacazek commented on GitHub (Feb 9, 2025): @Schwenn2002 Did you try disabling hipblaslt by setting the environment variable `TORCH_BLAS_PREFER_HIPBLASLT=0` . While not technically solving the lack of hipblaslt support, should prevent attempts to use hipblaslt and fallback to hipblas.

GiteaMirror commented