mirror of
https://github.com/open-webui/open-webui.git
synced 2026-06-08 02:01:07 -05:00
Degraded Nvidia Container Toolkit performance in v0.4.2 (compared to v0.3.35) #2735
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @unprofessional on GitHub (Nov 21, 2024).
Bug Report
Installation Method
I noticed a degradation in performance running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b)
Please refer to the steps to reproduce. The only difference here is that I simply installed the
nvidia-container-toolkitdirectly via instructions here and saw a gain from 15-16 t/s on v0.3.35 go to ~19 t/s after running it in the GPU accelerated Open WebUI + Ollama container.I would have expected the same performance after upgrading to v0.4.2 but this does not appear to be the case... it seems as if it's ignoring any GPU acceleration benefits provided by the toolkit and dropping it back to base performance (as if run on a fresh, locally deployed Ollama without any extra explicit CUDA optimizations/config).
Environment
Open WebUI Version:
Ollama (if applicable):
Operating System:
hardware info:
Confirmation:
Expected Behavior:
sudo docker pull ghcr.io/open-webui/open-webui:ollamadocker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollamaActual Behavior:
Description
Reproduction Details
Steps to Reproduce:
nvidia-container-toolkitis installed via instructions heredocker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollamallama3.1:70bthrough the browser successfullyresponse_token/s(in my case it was around 18-19 t/s with my setup)sudo docker pull ghcr.io/open-webui/open-webui:ollamasudo docker stop open-webui && sudo docker rm open-webuidocker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollamallama3.1:70bshould still be there... repeat step 5 (in my case it dropped to around 15-16 t/s)Logs and Screenshots
Docker Container Logs: