mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-22 14:13:08 -05:00
The rerank model cannot run on the GPU, causing it to be very slow. #2501
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @zero456 on GitHub (Oct 29, 2024).
Computer:
2X Intel(R) Xeon(R) Gold 6242R CPU
64.0 GB RAM
NVIDIA Quadro RTX 6000
Open-WebUI settings:
Engine: Ollama
Embedding Batch Size = 12
Hybrid Search: Enabled
Embed Model: bge-m3:latest
Rerank Model: baai/bge-reranker-v2-m3 (downloaded from Hugging Face)
After running the query, from the backend, we observed that the embedding process completes very quickly, with brief CUDA GPU utilization. Then, the CPU utilization increases significantly to 60%~100% and remains high for a prolonged period until the answer is generated.
Based on these observations, we suspect that the rerank model is running on the CPU. Is it possible to modify it to run on the GPU to improve the speed?