Degraded Nvidia Container Toolkit performance in v0.4.2 (compared to v0.3.35) #2735

Closed
opened 2025-11-11 15:13:19 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @unprofessional on GitHub (Nov 21, 2024).

Bug Report

Installation Method

Installing Open WebUI with Bundled Ollama Support
This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup:
With GPU Support: Utilize GPU resources by running the following command:
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

I noticed a degradation in performance running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b)

  • v0.3.35 — 19 t/s
  • v0.4.2 — 16 t/s

Please refer to the steps to reproduce. The only difference here is that I simply installed the nvidia-container-toolkit directly via instructions here and saw a gain from 15-16 t/s on v0.3.35 go to ~19 t/s after running it in the GPU accelerated Open WebUI + Ollama container.

I would have expected the same performance after upgrading to v0.4.2 but this does not appear to be the case... it seems as if it's ignoring any GPU acceleration benefits provided by the toolkit and dropping it back to base performance (as if run on a fresh, locally deployed Ollama without any extra explicit CUDA optimizations/config).

Environment

  • Open WebUI Version:

    • v0.4.2
  • Ollama (if applicable):

    • whichever one is embedded with v0.4.2
  • Operating System:

    • Ubuntu Server 22.04.5 LTS

hardware info:

  • Supermicro HS12SSL-i mobo
  • AMD Epyc 7402P
  • 256GB DDR4 ECC
  • Dual RTX 3090 Founders Edition each in a PCIe 4.0 x16 slot

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

  • I upgraded to v0.4.2 from v0.3.35 via sudo docker pull ghcr.io/open-webui/open-webui:ollama
  • I expected to run the same sort of performance with the same command as before:
  • docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
  • Before, I was getting ~19 t/s running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b)

Actual Behavior:

  • After updating to v0.4.2, I noticed a drop down to 15-16 t/s Before, running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b)
  • Switching back to the v0.3.35 image I still had brought it back up to 19 t/s verifying to me that this is a potential issue in the newer releases since v0.3.35

Description

  • Degraded Container Toolkit performance in v0.4.2 (compared to v0.3.35)

Reproduction Details

Steps to Reproduce:

  1. Start out on v0.3.35
  2. Make sure nvidia-container-toolkit is installed via instructions here
  3. Run via docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
  4. Install llama3.1:70b through the browser successfully
  5. Chat with the model with simple hello and test msgs in the UI and note the response_token/s (in my case it was around 18-19 t/s with my setup)
  6. Update to v0.4.2 via sudo docker pull ghcr.io/open-webui/open-webui:ollama
  7. Clean up via sudo docker stop open-webui && sudo docker rm open-webui
  8. Re-run via docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
  9. llama3.1:70b should still be there... repeat step 5 (in my case it dropped to around 15-16 t/s)

Logs and Screenshots

Docker Container Logs:

Originally created by @unprofessional on GitHub (Nov 21, 2024). # Bug Report ## Installation Method > Installing Open WebUI with Bundled Ollama Support > This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup: > With GPU Support: Utilize GPU resources by running the following command: > `docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama` I noticed a degradation in performance running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b) - v0.3.35 — 19 t/s - v0.4.2 — 16 t/s Please refer to the steps to reproduce. The only difference here is that I simply installed the `nvidia-container-toolkit` directly via [instructions here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and saw a gain from 15-16 t/s on v0.3.35 go to ~19 t/s after running it in the GPU accelerated Open WebUI + Ollama container. I would have expected the same performance after upgrading to v0.4.2 but this does not appear to be the case... it seems as if it's ignoring any GPU acceleration benefits provided by the toolkit and dropping it back to base performance (as if run on a fresh, locally deployed Ollama without any extra explicit CUDA optimizations/config). ## Environment - **Open WebUI Version:** - v0.4.2 - **Ollama (if applicable):** - whichever one is embedded with v0.4.2 - **Operating System:** - Ubuntu Server 22.04.5 LTS hardware info: - Supermicro HS12SSL-i mobo - AMD Epyc 7402P - 256GB DDR4 ECC - Dual RTX 3090 Founders Edition each in a PCIe 4.0 x16 slot **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: - I upgraded to v0.4.2 from v0.3.35 via `sudo docker pull ghcr.io/open-webui/open-webui:ollama` - I expected to run the same sort of performance with the same command as before: - `docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama` - Before, I was getting ~19 t/s running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b) ## Actual Behavior: - After updating to v0.4.2, I noticed a drop down to 15-16 t/s Before, running Llama 3.1 70b 4-bit quant (https://ollama.com/library/llama3.1:70b) - Switching back to the v0.3.35 image I still had brought it back up to 19 t/s verifying to me that this is a potential issue in the newer releases since v0.3.35 ## Description - Degraded Container Toolkit performance in v0.4.2 (compared to v0.3.35) ## Reproduction Details **Steps to Reproduce:** 1. Start out on v0.3.35 2. Make sure `nvidia-container-toolkit` is installed via [instructions here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) 3. Run via `docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama` 4. Install `llama3.1:70b` through the browser successfully 5. Chat with the model with simple hello and test msgs in the UI and note the `response_token/s` (in my case it was around 18-19 t/s with my setup) 6. Update to v0.4.2 via `sudo docker pull ghcr.io/open-webui/open-webui:ollama` 7. Clean up via `sudo docker stop open-webui && sudo docker rm open-webui` 8. Re-run via `docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama` 9. `llama3.1:70b` should still be there... repeat step 5 (in my case it dropped to around 15-16 t/s) ## Logs and Screenshots **Docker Container Logs:** - [open-webui_v0.4.2_container-825ee17d8bac_log.txt](https://github.com/user-attachments/files/17842675/open-webui_v0.4.2_container-825ee17d8bac_log.txt) - [open-webui_v0.3.35_container-3cadac2562e9_log.txt](https://github.com/user-attachments/files/17842676/open-webui_v0.3.35_container-3cadac2562e9_log.txt)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#2735