#18744 - [GH-ISSUE #19007] issue: Performance regression in Open WebUI v0.6.36 – noticeable slowdown with local models - open-webui

GiteaMirror commented

2026-04-20 00:57:12 -05:00

Owner

Originally created by @manhtv46k55 on GitHub (Nov 7, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19007

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Pip Install

Open WebUI Version

0.6.36

Ollama Version (if applicable)

0.12.10

Operating System

Windows 10

Browser (if applicable)

Chrome 142.0.7444.135

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

In Open WebUI v0.6.36, running a local model should maintain similar responsiveness as in v0.6.31.

Queries should be processed and streamed without noticeable delay compared to previous versions.

Actual Behavior

In v0.6.36, the same local model is significantly slower than in v0.6.31.

After sending a query, the system waits much longer before starting to stream responses.

The slowdown appears even on simple queries.

Steps to Reproduce

Start Open WebUI v0.6.36 and load a local model.
Ask a question via the chat interface.

The system takes noticeably longer to respond compared to v0.6.31.

Logs & Screenshots

Additional Information

Windows 10
Ram 64GB
RTX 4070s 12GB vRam
Run with CPU not u GPU
Python 3.11

Originally created by @manhtv46k55 on GitHub (Nov 7, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19007 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Pip Install ### Open WebUI Version 0.6.36 ### Ollama Version (if applicable) 0.12.10 ### Operating System Windows 10 ### Browser (if applicable) Chrome 142.0.7444.135 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior In Open WebUI v0.6.36, running a local model should maintain similar responsiveness as in v0.6.31. Queries should be processed and streamed without noticeable delay compared to previous versions. ### Actual Behavior In v0.6.36, the same local model is significantly slower than in v0.6.31. After sending a query, the system waits much longer before starting to stream responses. The slowdown appears even on simple queries. ### Steps to Reproduce 1. Start Open WebUI v0.6.36 and load a local model. 2. Ask a question via the chat interface. The system takes noticeably longer to respond compared to v0.6.31. ### Logs & Screenshots <img width="2549" height="1288" alt="Image" src="https://github.com/user-attachments/assets/cb449784-3889-41e2-8a7a-56bed287509c" /> <img width="2558" height="1297" alt="Image" src="https://github.com/user-attachments/assets/e0b451a3-f46c-4fa2-b4f9-bb94bdf06897" /> ### Additional Information Windows 10 Ram 64GB RTX 4070s 12GB vRam Run with CPU not u GPU Python 3.11