mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-21 17:13:00 -05:00
[PR #23735] [CLOSED] perf: emit text tokens as deltas instead of re-serializing the whole chat per SSE event #114652
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/23735
Author: @Classic298
Created: 4/14/2026
Status: ❌ Closed
Base:
dev← Head:perf/stream-delta-emit📝 Commits (3)
07ee26bdocs: add writeup for O(N^2) WebSocket payload bloat during LLM streamingd7b36beperf(stream): emit text tokens as deltas instead of re-serializing the whole chat per SSE event6155105style(stream): tighten comments on delta-emit change📊 Changes
1 file changed (+67 additions, -12 deletions)
View changed files
📝
backend/open_webui/utils/middleware.py(+67 -12)📄 Description
Description
Partially fixes https://github.com/open-webui/open-webui/issues/23733
The streaming hot path in
streaming_chat_response_handler.stream_body_handlerwas callingserialize_output(full_output())on every SSE event — rebuilding an HTML string of the entire accumulated output (text + reasoning + tool calls + images + citations) and emitting it via Socket.IO.For an N-token response this is O(N) work per token -> O(N^2) bytes across the WebSocket, then amplified again by Socket.IO's AsyncRedisManager pub/sub (x redis nodes x workers) and a third time by the frontend Markdown re-parser. A 2000-token reply at normal scale could push ~10,000x more bytes through Redis than the new tokens actually needed.
Fix: when an SSE event only appended text characters to the currently active
messageblock (the overwhelming common case), accumulate those chars into apending_text_deltabuffer and emit them as a lightweightchat:message:deltaevent. The frontend already handles that event via string append (Chat.svelte:472-473), so no frontend changes are needed.Any structural event - reasoning update, reasoning->message transition, tool call, image attachment, tag-block append, annotation source, etc. - still goes through the legacy full-serialize path, and when it does it clears the pending text delta buffer (because the full-content checkpoint already contains the accumulated text). Ordering is preserved by flushing pending deltas before each structural emit.
The final
done: truechat:completionemitted by the outer handler at line 4793 overwritesmessage.contentwith the canonicalserialize_output(output), reconciling the frontend to the exact backend state at the end of every response.Discovered and diagnosed by @shirasawasama; this is the Phase 1 minimum-viable landing. Phase 2 (typed block model, per-block rendering, Redis op log with resume-by-seq) is tracked as follow-up work.
Impact summary:
not full response size
existing
done: truefull-content checkpoint.Out of scope for this change:
(Phase 2 will incrementalize it via per-block deltas)
already emitted raw SSE deltas and is unchanged
Contributor License Agreement
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.