issue: GPU acceleration not working as expected on main branch #5703

New Issue

GiteaMirror · 2025-11-11T16:30:29-06:00

GiteaMirror commented

2025-11-11 16:30:29 -06:00

Originally created by @kaiiquetome on GitHub (Jul 4, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.5.16

Ollama Version (if applicable)

No response

Operating System

ubuntu

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When running the OpenWebUI container using the main branch with --gpus all, I expected GPU acceleration to be enabled and used, resulting in faster RAG processes, especially during document ingestion

Actual Behavior

Despite using the --gpus all flag, the performance remains slow when using the main branch image. It seems like the GPU is not being utilized properly. In contrast, using the older cuda branch (v0.5.16) significantly improves performance

Steps to Reproduce

Run the container using the main branch with GPU enabled:

docker run -d -p 3000:8080 --gpus all -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Upload one or more documents to a knowledge collection.

Observe the performance during ingestion and processing.

Repeat the same steps using the cuda branch image.

Compare the speed and GPU usage with nvidia-smi

Logs & Screenshots

With cuda

With Main

Additional Information

No response

Originally created by @kaiiquetome on GitHub (Jul 4, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.5.16 ### Ollama Version (if applicable) _No response_ ### Operating System ubuntu ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When running the OpenWebUI container using the main branch with --gpus all, I expected GPU acceleration to be enabled and used, resulting in faster RAG processes, especially during document ingestion ### Actual Behavior Despite using the --gpus all flag, the performance remains slow when using the main branch image. It seems like the GPU is not being utilized properly. In contrast, using the older cuda branch (v0.5.16) significantly improves performance ### Steps to Reproduce Run the container using the main branch with GPU enabled: docker run -d -p 3000:8080 --gpus all -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main Upload one or more documents to a knowledge collection. Observe the performance during ingestion and processing. Repeat the same steps using the cuda branch image. Compare the speed and GPU usage with nvidia-smi ### Logs & Screenshots With cuda <img width="647" height="332" alt="Image" src="https://github.com/user-attachments/assets/f6443dd9-53b9-4425-a341-336940062687" /> With Main <img width="636" height="327" alt="Image" src="https://github.com/user-attachments/assets/8016cd92-b715-4430-b68d-c52375f1656c" /> ### Additional Information _No response_

GiteaMirror added the bug label 2025-11-11 16:30:29 -06:00

GiteaMirror closed this issue

2025-11-11 16:30:29 -06:00

GiteaMirror commented

2025-11-11 16:30:31 -06:00

@tjbck commented on GitHub (Jul 4, 2025):

You need to use :cuda tags images, please read the docs and existing issues and discussions before creating a new one.

@tjbck commented on GitHub (Jul 4, 2025): You need to use `:cuda` tags images, please read the docs and existing issues and discussions before creating a new one.

GiteaMirror commented

2025-11-11 16:30:31 -06:00

@kaiiquetome commented on GitHub (Jul 7, 2025):

@tjbck I understand that the cuda branch should be used for GPU support, but it is currently outdated (v0.5.16). I've already started setting up my environment using the latest version (v0.6.15), which includes several recent integrations that are important for my setup. However, if I switch to the cuda branch, I lose access to these newer features.

@kaiiquetome commented on GitHub (Jul 7, 2025): @tjbck I understand that the cuda branch should be used for GPU support, but it is currently outdated (v0.5.16). I've already started setting up my environment using the latest version (v0.6.15), which includes several recent integrations that are important for my setup. However, if I switch to the cuda branch, I lose access to these newer features.

GiteaMirror commented

2025-11-11 16:30:31 -06:00

@CoolSpot commented on GitHub (Jul 24, 2025):

@tjbck instructions on the main page say to use :ollama tag image for GPU acceleration (which doesn't work):

@CoolSpot commented on GitHub (Jul 24, 2025): @tjbck instructions on the main page say to use `:ollama` tag image for GPU acceleration (which doesn't work): <img width="1867" height="753" alt="Image" src="https://github.com/user-attachments/assets/04dc3511-35de-4962-b849-85a3029779e6" />

GiteaMirror commented

2025-11-11 16:30:31 -06:00

@CoolSpot commented on GitHub (Jul 25, 2025):

Solution is to build your own image with both CUDA and Ollama enabled:

# check if your GPU is compatible
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu24.04 nvidia-smi
# clone open-webui repository
git clone https://github.com/open-webui/open-webui.git
# get in
cd open-webui
# build custom image (may take an hour or two); use --build-arg="USE_CUDA_VER=cu121" to downgrade CUDA version to the one compatible with your GPU/driver
docker build -t open-webui-ollama-cuda --build-arg="USE_CUDA=true" --build-arg="USE_OLLAMA=true" ./
# run it
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always open-webui-ollama-cuda

@CoolSpot commented on GitHub (Jul 25, 2025): Solution is to build your own image with both CUDA and Ollama enabled: ```bash # check if your GPU is compatible docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu24.04 nvidia-smi # clone open-webui repository git clone https://github.com/open-webui/open-webui.git # get in cd open-webui # build custom image (may take an hour or two); use --build-arg="USE_CUDA_VER=cu121" to downgrade CUDA version to the one compatible with your GPU/driver docker build -t open-webui-ollama-cuda --build-arg="USE_CUDA=true" --build-arg="USE_OLLAMA=true" ./ # run it docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always open-webui-ollama-cuda ```

GiteaMirror referenced this issue

2025-11-11 17:59:03 -06:00

[PR #5710] [MERGED] Fix: o1 input parameter must be max_completion_tokens #8538

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#5703