[PR #23736] feat: resumable WS streaming via Redis log with seq-based replay #50391

New Issue

GiteaMirror · 2026-04-30T03:05:15-05:00

GiteaMirror commented

2026-04-30 03:05:15 -05:00

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/23736
Author: @Classic298
Created: 4/14/2026
Status: 🔄 Open

Base: dev ← Head: feat/stream-resume-redis

📝 Commits (10+)

8289ac7 feat(stream): resumable WS streaming via Redis log with seq-based replay
ff48c99 fix(stream): address code review findings on resume-stream
dec9176 fix(stream): resume every in-flight assistant message, not just the current
7dc4d84 fix(stream): harden done-detection and isolate resume seq from persisted state
860bde8 fix(stream): close replay race, cursor-efficient resume, bounded client bookkeeping
d9e2ffc fix(stream): truncate stale resume log at emitter creation
46d54c6 fix(stream): replace delayed-truncate task with TTL shortening on done
5533368 style+fix(stream): tighten comments and address two review findings
21ef755 fix(stream): close replay/live race and drop taskIds gate
d655395 fix(stream): unconditional completion, fence timeout, single-batch replay

📊 Changes

2 files changed (+567 additions, -5 deletions)

View changed files

📝 backend/open_webui/socket/main.py (+347 -5)
📝 src/lib/components/chat/Chat.svelte (+220 -0)

📄 Description

This PR SHOULDNT BE MERGED WITHOUT ALSO MERGING THIS https://github.com/open-webui/open-webui/pull/23859

Adds a bounded Redis stream log for every in-flight assistant message so clients can reconnect mid-stream (page refresh, network drop, device switch) and catch up on frames they missed without re-fetching the full chat from the database.

Problem this solves

With ENABLE_REALTIME_CHAT_SAVE=False (default), the backend does not write the assistant message to the DB until the stream finishes. If the client refreshes the page mid-stream, chat load from the DB returns nothing for the in-progress message and the response appears to vanish until the stream eventually completes. Users staring at an empty chat while the backend quietly keeps emitting tokens into the void.

Design

Every outbound WS envelope gets stamped with a monotonic per-message seq inside get_event_emitter and appended to a bounded Redis stream keyed {REDIS_KEY_PREFIX}:stream:{message_id}. MAXLEN ~ 2000 entries, TTL 1h as a safety net.
Clients track message.lastSeq in Chat.svelte as events arrive. On chat load (mid-stream refresh) and on socket reconnect they emit resume-stream {chat_id, message_id, last_seq}.
The server authenticates the session, verifies the user owns the chat, XRANGEs the log, filters by seq > last_seq, and emits the missed envelopes to THAT session only (via to=sid) so live listeners in the user room keep receiving their normal live stream unchanged.
The existing chat event handler drops any envelope with seq <= message.lastSeq, making replay idempotent against live frames that race the replay after a reconnect.
When an event with done: True fires, a background task truncates the log after a 30s grace window so late reconnects still catch the finalization; anything beyond that resumes from the now-up-to-date DB.

Orthogonality

Zero touches to middleware.py or the streaming hot path. The log stores whatever gets emitted; any future change to emit shape (chat:message:delta, per-block ops, JSON Patch, ...) is logged and replayed verbatim with no coupling.

Graceful degradation

No-op when Redis is not configured (WEBSOCKET_MANAGER != 'redis'). In that deployment mode, refresh during streaming retains the current behavior of waiting for the stream to complete.

Auth model

The log is keyed by message_id only. The resume handler must do a chat ownership check (Chats.get_chat_by_id_and_user_id) before replaying, so a malicious client cannot read another user's stream by guessing a message_id.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/23736 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 4/14/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `feat/stream-resume-redis` --- ### 📝 Commits (10+) - [`8289ac7`](https://github.com/open-webui/open-webui/commit/8289ac7de3b9602adae29759df6da51feeef4170) feat(stream): resumable WS streaming via Redis log with seq-based replay - [`ff48c99`](https://github.com/open-webui/open-webui/commit/ff48c99d6c88e50e5655750c2dfdbe6bc6f58e1b) fix(stream): address code review findings on resume-stream - [`dec9176`](https://github.com/open-webui/open-webui/commit/dec91764edb4fa5f7f8d01d002e763a868b71fc5) fix(stream): resume every in-flight assistant message, not just the current - [`7dc4d84`](https://github.com/open-webui/open-webui/commit/7dc4d843faa067e58fc33364d57ecf54b6c96cea) fix(stream): harden done-detection and isolate resume seq from persisted state - [`860bde8`](https://github.com/open-webui/open-webui/commit/860bde88427e5d686e4dcbcd6cf4b4913c938f23) fix(stream): close replay race, cursor-efficient resume, bounded client bookkeeping - [`d9e2ffc`](https://github.com/open-webui/open-webui/commit/d9e2ffc525951102c0dc09d4f4c22df9e69c2347) fix(stream): truncate stale resume log at emitter creation - [`46d54c6`](https://github.com/open-webui/open-webui/commit/46d54c6557be3382793ffe213fada9540e2c4612) fix(stream): replace delayed-truncate task with TTL shortening on done - [`5533368`](https://github.com/open-webui/open-webui/commit/55333686993cb9c4a78df6925933c42270e770f4) style+fix(stream): tighten comments and address two review findings - [`21ef755`](https://github.com/open-webui/open-webui/commit/21ef7557cdb39fb8c5fd1a7ab30cb7588bcaa926) fix(stream): close replay/live race and drop taskIds gate - [`d655395`](https://github.com/open-webui/open-webui/commit/d655395b094558f320f9467a54a5f6318147d597) fix(stream): unconditional completion, fence timeout, single-batch replay ### 📊 Changes **2 files changed** (+567 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/socket/main.py` (+347 -5) 📝 `src/lib/components/chat/Chat.svelte` (+220 -0) </details> ### 📄 Description # This PR SHOULDNT BE MERGED WITHOUT ALSO MERGING THIS https://github.com/open-webui/open-webui/pull/23859 Adds a bounded Redis stream log for every in-flight assistant message so clients can reconnect mid-stream (page refresh, network drop, device switch) and catch up on frames they missed without re-fetching the full chat from the database. Problem this solves ------------------- With ENABLE_REALTIME_CHAT_SAVE=False (default), the backend does not write the assistant message to the DB until the stream finishes. If the client refreshes the page mid-stream, chat load from the DB returns nothing for the in-progress message and the response appears to vanish until the stream eventually completes. Users staring at an empty chat while the backend quietly keeps emitting tokens into the void. Design ------ * Every outbound WS envelope gets stamped with a monotonic per-message `seq` inside `get_event_emitter` and appended to a bounded Redis stream keyed `{REDIS_KEY_PREFIX}:stream:{message_id}`. MAXLEN ~ 2000 entries, TTL 1h as a safety net. * Clients track `message.lastSeq` in Chat.svelte as events arrive. On chat load (mid-stream refresh) and on socket reconnect they emit `resume-stream {chat_id, message_id, last_seq}`. * The server authenticates the session, verifies the user owns the chat, XRANGEs the log, filters by `seq > last_seq`, and emits the missed envelopes to THAT session only (via `to=sid`) so live listeners in the user room keep receiving their normal live stream unchanged. * The existing chat event handler drops any envelope with `seq <= message.lastSeq`, making replay idempotent against live frames that race the replay after a reconnect. * When an event with `done: True` fires, a background task truncates the log after a 30s grace window so late reconnects still catch the finalization; anything beyond that resumes from the now-up-to-date DB. Orthogonality ------------- Zero touches to middleware.py or the streaming hot path. The log stores whatever gets emitted; any future change to emit shape (chat:message:delta, per-block ops, JSON Patch, ...) is logged and replayed verbatim with no coupling. Graceful degradation -------------------- No-op when Redis is not configured (WEBSOCKET_MANAGER != 'redis'). In that deployment mode, refresh during streaming retains the current behavior of waiting for the stream to complete. Auth model ---------- The log is keyed by message_id only. The resume handler must do a chat ownership check (Chats.get_chat_by_id_and_user_id) before replaying, so a malicious client cannot read another user's stream by guessing a message_id. ### Contributor License Agreement  - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2026-04-30 03:05:15 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#50391