issue: Web UI performance issues #5155

New Issue

GiteaMirror · 2025-11-11T16:13:25-06:00

GiteaMirror commented

2025-11-11 16:13:25 -06:00

Originally created by @ErmakovDmitriy on GitHub (May 13, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.9

Ollama Version (if applicable)

0.6.8 (but likely not applicable)

Operating System

Fedora 43

Browser (if applicable)

Chromium 136.0.7103.92

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

The Web UI part should not use that much CPU and RAM.

Actual Behavior

While generating an answer (on a remote OLLAMA) the browser consumes 100% of one CPU core (I suspect that it could have consumed more but it is limited by the single thread of Javascript).
After a while (1-10 minutes) the RAM consumption becomes quite high which is not a thing I expect.

It impossible to get any response as the browser is essentially freezes after 1 minute of executing (rendering) response from an ML model.

In addition to that, it is unclear to me why /api/models request returns UUIDs of all the files in knowledge base. With the number of files in my knowledge base about 60000+, it makes the response about ~43-50MB which is a lot, if one wants to use the Open WebUI via mobile connection and a lot to Parse at the client browser side.

Steps to Reproduce

Open a browser with the only tab with Open WebUI.
Open new chat with QWEN3 ("thinking") model.
Run any request.
Wait until browser freezes.
Reproducible both in Chromium and Firefox 138.

Logs & Screenshots

chromium-console.log

No errors, warnings in server log.

Additional Information

Adding /nothink to the prompt or system prompt (to force the QWEN model to disable thinking mode and immediately generate response) somewhat reduces load but the Web UI is still slow-ish.

Originally created by @ErmakovDmitriy on GitHub (May 13, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.9 ### Ollama Version (if applicable) 0.6.8 (but likely not applicable) ### Operating System Fedora 43 ### Browser (if applicable) Chromium 136.0.7103.92 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior The Web UI part should not use that much CPU and RAM. ### Actual Behavior While generating an answer (on a remote OLLAMA) the browser consumes 100% of one CPU core (I suspect that it could have consumed more but it is limited by the single thread of Javascript). After a while (1-10 minutes) the RAM consumption becomes quite high which is not a thing I expect. It impossible to get any response as the browser is essentially freezes after 1 minute of executing (rendering) response from an ML model. In addition to that, it is unclear to me why `/api/models` request returns UUIDs of all the files in knowledge base. With the number of files in my knowledge base about 60000+, it makes the response about ~43-50MB which is a lot, if one wants to use the Open WebUI via mobile connection and a lot to Parse at the client browser side. ### Steps to Reproduce 1. Open a browser with the only tab with Open WebUI. 2. Open new chat with QWEN3 ("thinking") model. 3. Run any request. 4. Wait until browser freezes. 5. Reproducible both in Chromium and Firefox 138. ### Logs & Screenshots ![Image](https://github.com/user-attachments/assets/f26e39b1-442a-48da-8ef8-f011a523f3dc) ![Image](https://github.com/user-attachments/assets/04f4e951-fd0a-4336-9194-e8429166518d) [chromium-console.log](https://github.com/user-attachments/files/20182378/chromium-console.log) No errors, warnings in server log. ### Additional Information Adding `/nothink` to the prompt or system prompt (to force the QWEN model to disable `thinking` mode and immediately generate response) somewhat reduces load but the Web UI is still slow-ish.

GiteaMirror added the bug label 2025-11-11 16:13:25 -06:00

GiteaMirror closed this issue