mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 00:04:08 -05:00
feat: ROCm /w RAG and SentenceTransformers #3204
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Schwenn2002 on GitHub (Jan 6, 2025).
Unfortunately, ROCM is only supported for ollama, but that works fine.
For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU.
In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM.
PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html
@Schwenn2002 commented on GitHub (Jan 7, 2025):
Another Link:
https://rocm.blogs.amd.com/artificial-intelligence/sentence_transformers_amd/README.html
@Schwenn2002 commented on GitHub (Jan 17, 2025):
If I rebuild the Docker with the attached Dockerfile (docker-compose up --build open-webui-rocm) and then call the embedding model via the console, it is loaded with GPU, obviously ROCm in torch also uses CUDA.
docker-compose.yaml:
Dockerfile
ROCm Test with torch and cuda:
open-webui
The CPU is still used via open-webui (why?). Is it hard coded that the model runs on the CPU?
@Schwenn2002 commented on GitHub (Jan 18, 2025):
Additional information:
For python 3.11 you need the installation package for ROCm6.3. For torch2.5.1 there are only packages for ROCm6.2. According to my research, ROCm is compatible. This is also confirmed by the fact that the embedding model with CUDA support is loaded by SentenceTransformers.
How do I tell open-webui to use the GPU?
@oatmealm commented on GitHub (Jan 18, 2025):
USE_DOCKER_CUDA worked for me as in open-WebUI acknowledged and checked if the gpu is enabled, but on my hardware it only worked once and stopped after restarting the server and simply hangs after attempting to load the libraries. You also need to make sure your pip installed the corresponding libraries of PyTorch that are built for rocm of course.
@Schwenn2002 commented on GitHub (Jan 18, 2025):
Thank you very much, now the Docker from open-webui actually runs with ROCm.
Perhaps an open-webui:rocm can be built?
The adjustments are specified in the Dockerfile above (usecase=rocm should be sufficient for amdgpu-install).
@oatmealm commented on GitHub (Jan 18, 2025):
You can also use it outside docker. Works the same. But as mentioned for me it caused problems. I think you can simply adapt the existing Dockerfile for this use case. Provided you also make sure to install the PyTorch libraries (torch, torchvision ... ) from the alternative pip index.
@Schwenn2002 commented on GitHub (Jan 18, 2025):
I have already customized Docker and integrated ROCm with the Dockerfile mentioned above.
It would only be good for updates if I didn't have to do a rebuild every time an update for open-webui is released.
With the Dockerfile above it should be easy to offer ROCm as a standard container (just like we do with CUDA like open-webui:rocm with a Tag). That would help everyone...
@oatmealm commented on GitHub (Jan 19, 2025):
BTW, just for completion for people who run opwenwebui on bare metal, downgrading pytorch-rocm to 2.4.x solved the problem and gpu doesn't segfault.
@Schwenn2002 commented on GitHub (Jan 20, 2025):
Attached are my updated files, the Docker must then be started with docker-compose up -d --build!
Testing ROCm in a container:
Or try the ROCm commands in the container:
docker-compose.yaml
Dockerfile
@Eudox67 commented on GitHub (Jan 25, 2025):
Release notes from AMD ROCm Docs, say "If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, continue to use ROCm 6.2.3."
So does this mean the solution here will not work for those of us running a local install with a Radeon gfx1100 that is also used as a display?
@Schwenn2002 commented on GitHub (Jan 26, 2025):
I am currently using gfx1100 (Radeon 7900xtx and Radeon Pro W7900 in multigpu setup) and the above configuration is working.
@Eudox67 commented on GitHub (Jan 26, 2025):
I only have one card, so I guess trying finds out. Thanks!
@Schwenn2002 commented on GitHub (Jan 26, 2025):
The host system must also have ROCm installed (test with rocm-smi), then change the following line for Docker, since there is only one GPU in the system:
- 'ROCR_VISIBLE_DEVICES=0'@Eudox67 commented on GitHub (Jan 26, 2025):
I am using ROCm 6.2.3 currently, which is why I asked the question above. I also notice that you are using jammy. Is that a requirement, or just particular to your system? I am on noble.
@Schwenn2002 commented on GitHub (Jan 26, 2025):
My host is running Ubuntu 24.04 LTS (Noble Numbat) and ROCm 6.3.1; the Docker is Debian 12 (hence jammy in open-webui docker).
@Eudox67 commented on GitHub (Jan 26, 2025):
Got it!
@mrwsl commented on GitHub (Jan 30, 2025):
@Schwenn2002 Did you try using
RAG_EMBEDDING_ENGINE: ollamato let ollama embed RAG?@Schwenn2002 commented on GitHub (Jan 30, 2025):
Hi! Yes, I tried that; ollama is significantly slower when it comes to embeddings or searching in RAG. ROCm in Docker is the choice for best performance.
@Schwenn2002 commented on GitHub (Feb 5, 2025):
Does anyone have an idea how to solve this warning?
/usr/local/lib/python3.11/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:310.)When I include ROCm 6.3.2 with the latest Python libraries from the Radeon repository I also get the error:
RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!The error "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" occurs when PyTorch tries to use the hipBLASLt library on an unsupported GPU architecture. This problem has been observed particularly on GPUs of the gfx1100 architecture, such as the AMD Radeon RX 7900 XTX, in conjunction with ROCm 6.2.2 and PyTorch versions 2.5 and above.
If anyone has an idea here, I would be grateful.
@jacazek commented on GitHub (Feb 9, 2025):
@Schwenn2002 Did you try disabling hipblaslt by setting the environment variable
TORCH_BLAS_PREFER_HIPBLASLT=0. While not technically solving the lack of hipblaslt support, should prevent attempts to use hipblaslt and fallback to hipblas.@Schwenn2002 commented on GitHub (Feb 9, 2025):
Yes - I tried this:
TORCH_BLAS_PREFER_HIPBLASLT=0It just seems that the parameter doesn't work with the Radeon Repository.
I'll stick with the official pytorch release and then with ROCm 6.2.4:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4 --no-cache-dir;@Schwenn2002 commented on GitHub (Feb 18, 2025):
audio transcription with ROCm:
The following error occurs during audio transcription with ROCm:
This is obviously because ctranslate2 does not work with the ROCm libraries.
It would be helpful for Docker to get its own variable for ROCm so that the audio transcription runs on the CPU, analogous to
USE_CUDA_DOCKER=truewithUSE_ROCM_DOCKER=true. The variable DEVICE-TYPE must then be extended to ROCM accordingly.In the file backend/open_webui/routers/audio.py the following would have to be changed in line 100:
"device" = "cpu" if DEVICE_TYPE in ["rocm"] or DEVICE_TYPE not in ["cuda", "cpu"] else DEVICE_TYPE,Overall it looks like this:
Currently I have adapted the code so that the CPU is always used:
"device" = "cpu"