[PR #23735] [CLOSED] perf: emit text tokens as deltas instead of re-serializing the whole chat per SSE event #114652

Closed
opened 2026-05-18 15:27:48 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/23735
Author: @Classic298
Created: 4/14/2026
Status: Closed

Base: devHead: perf/stream-delta-emit


📝 Commits (3)

  • 07ee26b docs: add writeup for O(N^2) WebSocket payload bloat during LLM streaming
  • d7b36be perf(stream): emit text tokens as deltas instead of re-serializing the whole chat per SSE event
  • 6155105 style(stream): tighten comments on delta-emit change

📊 Changes

1 file changed (+67 additions, -12 deletions)

View changed files

📝 backend/open_webui/utils/middleware.py (+67 -12)

📄 Description

Description

Partially fixes https://github.com/open-webui/open-webui/issues/23733

The streaming hot path in streaming_chat_response_handler.stream_body_handler was calling serialize_output(full_output()) on every SSE event — rebuilding an HTML string of the entire accumulated output (text + reasoning + tool calls + images + citations) and emitting it via Socket.IO.

For an N-token response this is O(N) work per token -> O(N^2) bytes across the WebSocket, then amplified again by Socket.IO's AsyncRedisManager pub/sub (x redis nodes x workers) and a third time by the frontend Markdown re-parser. A 2000-token reply at normal scale could push ~10,000x more bytes through Redis than the new tokens actually needed.

Fix: when an SSE event only appended text characters to the currently active message block (the overwhelming common case), accumulate those chars into a pending_text_delta buffer and emit them as a lightweight chat:message:delta event. The frontend already handles that event via string append (Chat.svelte:472-473), so no frontend changes are needed.

Any structural event - reasoning update, reasoning->message transition, tool call, image attachment, tag-block append, annotation source, etc. - still goes through the legacy full-serialize path, and when it does it clears the pending text delta buffer (because the full-content checkpoint already contains the accumulated text). Ordering is preserved by flushing pending deltas before each structural emit.

The final done: true chat:completion emitted by the outer handler at line 4793 overwrites message.content with the canonical
serialize_output(output), reconciling the frontend to the exact backend state at the end of every response.

Discovered and diagnosed by @shirasawasama; this is the Phase 1 minimum-viable landing. Phase 2 (typed block model, per-block rendering, Redis op log with resume-by-seq) is tracked as follow-up work.

Impact summary:

  • Per-token WS payload: O(size_so_far) -> O(new_chars)
  • Redis pub/sub bytes: drops by ~10-100x on long responses
  • Frontend Markdown re-parse cost: now bounded by delta size per event,
    not full response size
  • Semantics: identical. Final message content reconciles via the
    existing done: true full-content checkpoint.

Out of scope for this change:

  • Reasoning-block content still takes the full-serialize path per token
    (Phase 2 will incrementalize it via per-block deltas)
  • Tool-call argument streaming still takes the full-serialize path
  • DB persistence path (REALTIME_CHAT_SAVE) is untouched — its WS path
    already emitted raw SSE deltas and is unchanged

Contributor License Agreement

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/23735 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 4/14/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `perf/stream-delta-emit` --- ### 📝 Commits (3) - [`07ee26b`](https://github.com/open-webui/open-webui/commit/07ee26b4f7de2a0d6e069f63972a9832af4bde63) docs: add writeup for O(N^2) WebSocket payload bloat during LLM streaming - [`d7b36be`](https://github.com/open-webui/open-webui/commit/d7b36be54dbf69a31f47de6c8e30445de1bc9093) perf(stream): emit text tokens as deltas instead of re-serializing the whole chat per SSE event - [`6155105`](https://github.com/open-webui/open-webui/commit/6155105971501a2dc6de9864a9e82553765bbf18) style(stream): tighten comments on delta-emit change ### 📊 Changes **1 file changed** (+67 additions, -12 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/utils/middleware.py` (+67 -12) </details> ### 📄 Description # Description Partially fixes **https://github.com/open-webui/open-webui/issues/23733** The streaming hot path in `streaming_chat_response_handler.stream_body_handler` was calling `serialize_output(full_output())` on every SSE event — rebuilding an HTML string of the entire accumulated output (text + reasoning + tool calls + images + citations) and emitting it via Socket.IO. For an N-token response this is O(N) work per token -> O(N^2) bytes across the WebSocket, then amplified again by Socket.IO's AsyncRedisManager pub/sub (x redis nodes x workers) and a third time by the frontend Markdown re-parser. A 2000-token reply at normal scale could push ~10,000x more bytes through Redis than the new tokens actually needed. Fix: when an SSE event only appended text characters to the currently active `message` block (the overwhelming common case), accumulate those chars into a `pending_text_delta` buffer and emit them as a lightweight `chat:message:delta` event. The frontend already handles that event via string append (`Chat.svelte:472-473`), so no frontend changes are needed. Any structural event - reasoning update, reasoning->message transition, tool call, image attachment, tag-block append, annotation source, etc. - still goes through the legacy full-serialize path, and when it does it clears the pending text delta buffer (because the full-content checkpoint already contains the accumulated text). Ordering is preserved by flushing pending deltas before each structural emit. The final `done: true` `chat:completion` emitted by the outer handler at line 4793 overwrites `message.content` with the canonical `serialize_output(output)`, reconciling the frontend to the exact backend state at the end of every response. Discovered and diagnosed by @shirasawasama; this is the Phase 1 minimum-viable landing. Phase 2 (typed block model, per-block rendering, Redis op log with resume-by-seq) is tracked as follow-up work. Impact summary: - Per-token WS payload: O(size_so_far) -> O(new_chars) - Redis pub/sub bytes: drops by ~10-100x on long responses - Frontend Markdown re-parse cost: now bounded by delta size per event, not full response size - Semantics: identical. Final message content reconciles via the existing `done: true` full-content checkpoint. Out of scope for this change: - Reasoning-block content still takes the full-serialize path per token (Phase 2 will incrementalize it via per-block deltas) - Tool-call argument streaming still takes the full-serialize path - DB persistence path (REALTIME_CHAT_SAVE) is untouched — its WS path already emitted raw SSE deltas and is unchanged ### Contributor License Agreement <!-- 🚨 DO NOT DELETE THE TEXT BELOW 🚨 Keep the "Contributor License Agreement" confirmation text intact. Deleting it will trigger the CLA-Bot to INVALIDATE your PR. Your PR will NOT be reviewed or merged until you check the box below confirming that you have read and agree to the terms of the CLA. --> - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-18 15:27:48 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#114652