mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-24 03:18:21 -05:00
[GH-ISSUE #18475] issue: When uploading a large document (full context) and asking a question, there is an issue with token generation streaming synchronization. #18607
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Cyp9715 on GitHub (Oct 21, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18475
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.34
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24.04
Browser (if applicable)
Firefox 144.0
Confirmation
README.md.Expected Behavior
After the user sends a prompt, if all tokens have been correctly decoded in the background, OpenWebUI should display them immediately.
Actual Behavior
However, after attaching a large document (200k token), enabling full-context mode, and receiving a question and answer, OpenWebUI displays tokens extremely slowly in the browser window. If you open a new tab and refresh the page, you can confirm that all tokens have already been generated and are present within OpenWebUI.
In other words, OpenWebUI is rendering the screen significantly slower than the actual generation speed—not by a minor delay of 1–2 seconds, but by a noticeable lag of at least 20 seconds or more, severely impacting the user experience.
Steps to Reproduce
All environments are running within Docker, using the following commands:
Logs & Screenshots
In the browser window on the right, the user entered the question immediately, and the response is still being generated.
However, when accessing the same page from the browser window on the left, it is evident that the full response has already been generated (confirmed also by OpenWebUI’s “response completed” notification).
Additional Information
This issue occurs not only in Firefox but also in Chrome. Monitoring via nvidia-smi confirms that vLLM has completed token generation well before the UI begins to display them. When opening a new tab and reloading the OpenWebUI page, all tokens are already fully rendered and visible—yet, in the original tab where the question was submitted, the token streaming is painfully slow. This discrepancy clearly indicates a client-side rendering or event-stream synchronization bug specific to the active chat tab, not a backend or model performance issue.
@rgaricano commented on GitHub (Oct 21, 2025):
Are you tried:
I think that the slowdown occurs after SSE parsing. When splitLargeDeltas is enabled:
With large context responses generating hundreds of tokens per SSE event, probably this chunking creates the perceived slowness you're experiencing.
Note: the delay is hardcoded here:
46ae3f4f5d/src/lib/apis/streaming/index.ts (L135)@Cyp9715 commented on GitHub (Oct 23, 2025):
Increasing the "Stream Delta Chunk Size" value has resolved the issue. Is this due to a performance issue with the client PC?
Thank you for your appreciation, and you may close this issue.
@hdnh2006 commented on GitHub (Nov 26, 2025):
Sorry mate, but I can't find that option in Settings/General, I am facing the same issue.
@Cyp9715 any idea?
@hdnh2006 commented on GitHub (Nov 26, 2025):
Ok, it looks like my problem was similar but I solved it just setting:
UVICORN_WORKERS=1. Previously it was set to 4.This solved my issue.