mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #15850] issue: missing tokens when streaming on fast inference providers #17692
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kalebwalton on GitHub (Jul 18, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15850
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.16
Ollama Version (if applicable)
No response
Operating System
Windows Sequoia
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Streaming output provides all streamed content and does not miss any parts
Actual Behavior
Streaming output occasionally misses a stream chunk (a few characters). It is often unnoticeable and you may assume it's a model issue or an inference provider issue, however, I have validated the issue occurs with multiple models from multiple inference providers.
Steps to Reproduce
2470da8336/backend/open_webui/utils/middleware.py (L2042)withlog.debug(f"Error: {e}")(this enables debug logging to print streaming errors properly)docker run -d --name openwebui -p 3000:8080 -e GLOBAL_LOG_LEVEL=debug -v /path/to/monkeypatched_middleware.py:/app/backend/open_webui/utils/middleware.py -v openwebui-data:/app/backend/data --restart unless-stopped ghcr.io/open-webui/open-webui:latest**NOTE: I believe this happens more frequently on faster streaming models like OpenAI gpt-4o-mini or Cerebras qwen-3-235b-a22b . **
Logs & Screenshots
Additional Information
I have investigated this at length. If you add do some debugging after line 1786 you'll find that 1. every so often a valid JSON event will be chunked into two
lineiterations, the first will contain some of the data including the beginning of the JSON string, and the second will contain the remaining part of the JSON string, and 2. eachlinedoes not contain line endings.I dug around and found
2470da8336/backend/open_webui/routers/openai.py (L865)which seems to use aiohttp.ClientSession, and then I try to follow that through and get a bit confused.I don't know whether the correct solution is to do buffering in openwebui's middleware.py where it's processing the lines (which won't work well because the line endings are not showing up and you can only key on something like
}), or if the solution is to do something lower level to prevent SSE JSON lines from ever being fragmented in the first place...@kalebwalton commented on GitHub (Jul 24, 2025):
I have done additional testing and believe that there's a direct correlation between the speed of the inference provider and the number of missing tokens. I believe the fragmentation is occurring OpenWebUIs use of dependent library such as aiohttp, and I am not certain if the fix needs to come in OpenWebUI or aiohttp or other area, but I think it needs to start at OpenWebUI.
I believe this will become more prevalent as inference providers run on faster AI hardware.
@tjbck commented on GitHub (Aug 4, 2025):
Unable to reproduce, keep us updated!