issue: streaming event is slow when the user has multiple open webui tabs open in the browser #5940

Open
opened 2025-11-11 16:39:31 -06:00 by GiteaMirror · 10 comments
Owner

Originally created by @frdeng on GitHub (Aug 1, 2025).

Originally assigned to: @tjbck on GitHub.

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

k8s/pg/redis

Open WebUI Version

0.6.18

Ollama Version (if applicable)

No response

Operating System

k8s with multiple pods

Browser (if applicable)

chrome

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When the user has multiple open webui tab, the backend stores sessions for all these clients, when the backend starts streaming the data, it would try to emit event data to all these clients,

b8da4a8cd8/backend/open_webui/socket/main.py (L636-L645)

I don't know what the point is to include other sessions of the same user here, it becomes extremely slow if the user has many open webui tabs open in the browser.

If it's no necessary I would go ahead creating a PR to fix this.

Actual Behavior

the backend broadcasts streaming data to all session instead of the current session.

Steps to Reproduce

1, open webui deployment in k8s - multiple pods with redis.
2, open multiple open webui tabs, send request, the response is much slower than only one tab.

Logs & Screenshots

n/a

Additional Information

No response

Originally created by @frdeng on GitHub (Aug 1, 2025). Originally assigned to: @tjbck on GitHub. ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method k8s/pg/redis ### Open WebUI Version 0.6.18 ### Ollama Version (if applicable) _No response_ ### Operating System k8s with multiple pods ### Browser (if applicable) chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When the user has multiple open webui tab, the backend stores sessions for all these clients, when the backend starts streaming the data, it would try to emit event data to all these clients, https://github.com/open-webui/open-webui/blob/b8da4a8cd8257d4846f3608e299618a0b4f185ed/backend/open_webui/socket/main.py#L636-L645 I don't know what the point is to include other sessions of the same user here, it becomes extremely slow if the user has many open webui tabs open in the browser. If it's no necessary I would go ahead creating a PR to fix this. ### Actual Behavior the backend broadcasts streaming data to all session instead of the current session. ### Steps to Reproduce 1, open webui deployment in k8s - multiple pods with redis. 2, open multiple open webui tabs, send request, the response is much slower than only one tab. ### Logs & Screenshots n/a ### Additional Information _No response_
GiteaMirror added the bug label 2025-11-11 16:39:31 -06:00
Author
Owner

@rgaricano commented on GitHub (Aug 2, 2025):

If not, how I can use same user for work with different chats or client instances (e.g. browser, code helper, desktop helper,...)

Maybe it's enougth with some advertisment about opening many tabs can slowdown response time...

@rgaricano commented on GitHub (Aug 2, 2025): If not, how I can use same user for work with different chats or client instances (e.g. browser, code helper, desktop helper,...) Maybe it's enougth with some advertisment about opening many tabs can slowdown response time...
Author
Owner

@frdeng commented on GitHub (Aug 2, 2025):

If not, how I can use same user for work with different chats or client instances (e.g. browser, code helper, desktop helper,...)

Maybe it's enougth with some advertisment about opening many tabs can slowdown response time...

ok, it would make sense if the same user has multiple clients(browser, or other apps) interacting with the same chat.
but if the other clients are doing other stuff(other chats, etc), then it makes no sense to send data to them.
Is this correct? maybe I missed some use cases

@frdeng commented on GitHub (Aug 2, 2025): > If not, how I can use same user for work with different chats or client instances (e.g. browser, code helper, desktop helper,...) > > Maybe it's enougth with some advertisment about opening many tabs can slowdown response time... ok, it would make sense if the same user has multiple clients(browser, or other apps) interacting with the same chat. but if the other clients are doing other stuff(other chats, etc), then it makes no sense to send data to them. Is this correct? maybe I missed some use cases
Author
Owner

@frdeng commented on GitHub (Aug 2, 2025):

Another issue is USER_POOL.get(user_id, []) is called for each single stream data trunk, with redis, it would make a request, there is a little bit latency added, but it's not too bad compared to multiple concurrent sio.emit() calls.

@frdeng commented on GitHub (Aug 2, 2025): Another issue is `USER_POOL.get(user_id, []) ` is called for each single stream data trunk, with redis, it would make a request, there is a little bit latency added, but it's not too bad compared to multiple concurrent `sio.emit()` calls.
Author
Owner

@rgaricano commented on GitHub (Aug 2, 2025):

Yes, I think so, sync should be requested by client or by server just if it's needed.
Maybe a kind of sync requirement pool? (some clients could need to by synced, others maybe not & only when it requests....also it'll need a emitter manager to redirect to each client when synced or enqueue events waiting for request...
If you think it's feasible, give it a try.
Although I would wait for other opinions from people with more knowledge.

@rgaricano commented on GitHub (Aug 2, 2025): Yes, I think so, sync should be requested by client or by server just if it's needed. Maybe a kind of sync requirement pool? (some clients could need to by synced, others maybe not & only when it requests....also it'll need a emitter manager to redirect to each client when synced or enqueue events waiting for request... If you think it's feasible, give it a try. Although I would wait for other opinions from people with more knowledge.
Author
Owner

@frdeng commented on GitHub (Aug 2, 2025):

An idea is we probably can make use of the socketIO namespace and room? similar to how we handle channels, but it's for chats.

basically the backend would make a room for a chat, if there are multiple are in the same chat(room), it would emit the event to the room members.

@frdeng commented on GitHub (Aug 2, 2025): An idea is we probably can make use of the socketIO namespace and room? similar to how we handle channels, but it's for chats. basically the backend would make a room for a chat, if there are multiple are in the same chat(room), it would emit the event to the room members.
Author
Owner

@sihyeonn commented on GitHub (Aug 16, 2025):

Hi @frdeng! I was wondering if this commit (1a93891d97) actually helps address this issue. It seems that simply setting CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE to 2 or 3 might be more effective. Could you please clarify the intention behind this change, or share your thoughts? Thank you!

@sihyeonn commented on GitHub (Aug 16, 2025): Hi @frdeng! I was wondering if this commit (https://github.com/open-webui/open-webui/commit/1a93891d97aa7a2380055da139bc0adbbf1b9f42) actually helps address this issue. It seems that simply setting `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE` to 2 or 3 might be more effective. Could you please clarify the intention behind this change, or share your thoughts? Thank you!
Author
Owner

@frdeng commented on GitHub (Aug 16, 2025):

Hi @frdeng! I was wondering if this commit (1a93891) actually helps address this issue. It seems that simply setting CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE to 2 or 3 might be more effective. Could you please clarify the intention behind this change, or share your thoughts? Thank you!

This is new feature right? it would definitely help I think.
I tend to agree the stream events should be sent to all user sessions, maybe we extend this feature for the non current sessions, we only send one final event with stream data of all trunks combined?

Another related issue is the session pool cache. the user session count doesn't neccessarily match the actual browser tab count. I noticed over time, a user could end up having dozen of sessions due to ungraceful disconnect or other reason(pod restart, race condition on cache update between the pods etc). we should have a stale session detection and cleanup mechanism for the redis session pool and user pool cache,

@frdeng commented on GitHub (Aug 16, 2025): > Hi [@frdeng](https://github.com/frdeng)! I was wondering if this commit ([1a93891](https://github.com/open-webui/open-webui/commit/1a93891d97aa7a2380055da139bc0adbbf1b9f42)) actually helps address this issue. It seems that simply setting `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE` to 2 or 3 might be more effective. Could you please clarify the intention behind this change, or share your thoughts? Thank you! This is new feature right? it would definitely help I think. I tend to agree the stream events should be sent to all user sessions, maybe we extend this feature for the non current sessions, we only send one final event with stream data of all trunks combined? Another related issue is the session pool cache. the user session count doesn't neccessarily match the actual browser tab count. I noticed over time, a user could end up having dozen of sessions due to ungraceful disconnect or other reason(pod restart, race condition on cache update between the pods etc). we should have a stale session detection and cleanup mechanism for the redis session pool and user pool cache,
Author
Owner

@sihyeonn commented on GitHub (Aug 17, 2025):

@frdeng Your points regarding stream event handling and session cleanup are very insightful. This PR (#16693 ) directly addresses the session management concerns you raised with a TTL-based cleanup mechanism, ensuring SESSION_POOL and USER_POOL stay synchronized. The simplification of disconnect logic and the new abstraction layer are also positive steps.

@tjbck @rgaricano Regarding the stream events of sending a final combined event to non-current sessions, I think we could explore implementing this logic on top of the new TTL framework. For example, when a session expires, we could trigger a final aggregated event instead of streaming updates. Would it make sense to proceed with this? I’d be happy to take this on if you’re okay with it! 😊

@sihyeonn commented on GitHub (Aug 17, 2025): @frdeng Your points regarding stream event handling and session cleanup are very insightful. This PR (#16693 ) directly addresses the session management concerns you raised with a TTL-based cleanup mechanism, ensuring SESSION_POOL and USER_POOL stay synchronized. The simplification of disconnect logic and the new abstraction layer are also positive steps. @tjbck @rgaricano Regarding the stream events of sending a final combined event to non-current sessions, I think we could explore implementing this logic on top of the new TTL framework. For example, when a session expires, we could trigger a final aggregated event instead of streaming updates. Would it make sense to proceed with this? I’d be happy to take this on if you’re okay with it! 😊
Author
Owner

@rgaricano commented on GitHub (Aug 18, 2025):

@Sihyeon
About TTL PR,
In my opinion, from a simple contributor who hasn't delved into the details on this topic.
This type of implementation requires great care and extensive testing and monitoring. These procedures add additional layers of processing that can generate further unwanted bottlenecks, starting with the wide variety of configurations open-webui works with, both in terms of systems and the usage and load it can support.
Implementing a timer for the session pool without other types of verifications and limit checks doesn't seem like a good idea to me. It can result in orphaned processes or transactions that could have continued, for example, in complex configurations that use a session for data exchange, streaming, information processing, monitoring, etc.
I'm not saying it's not a good idea; I just think that, in the current state, an isolated proposal isn't the most appropriate, and that any new addition in this regard must respond to a very well-planned functionality and structure, with clear objectives and implementation steps. In any case, I would implement it in a lab version.

@rgaricano commented on GitHub (Aug 18, 2025): @Sihyeon About TTL PR, In my opinion, from a simple contributor who hasn't delved into the details on this topic. This type of implementation requires great care and extensive testing and monitoring. These procedures add additional layers of processing that can generate further unwanted bottlenecks, starting with the wide variety of configurations open-webui works with, both in terms of systems and the usage and load it can support. Implementing a timer for the session pool without other types of verifications and limit checks doesn't seem like a good idea to me. It can result in orphaned processes or transactions that could have continued, for example, in complex configurations that use a session for data exchange, streaming, information processing, monitoring, etc. I'm not saying it's not a good idea; I just think that, in the current state, an isolated proposal isn't the most appropriate, and that any new addition in this regard must respond to a very well-planned functionality and structure, with clear objectives and implementation steps. In any case, I would implement it in a lab version.
Author
Owner

@Ithanil commented on GitHub (Aug 23, 2025):

Hi @frdeng! I was wondering if this commit (1a93891) actually helps address this issue. It seems that simply setting CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE to 2 or 3 might be more effective. Could you please clarify the intention behind this change, or share your thoughts? Thank you!

This is new feature right? it would definitely help I think. I tend to agree the stream events should be sent to all user sessions, maybe we extend this feature for the non current sessions, we only send one final event with stream data of all trunks combined?

Another related issue is the session pool cache. the user session count doesn't neccessarily match the actual browser tab count. I noticed over time, a user could end up having dozen of sessions due to ungraceful disconnect or other reason(pod restart, race condition on cache update between the pods etc). we should have a stale session detection and cleanup mechanism for the redis session pool and user pool cache,

Yes, there is definitely a permanent upwards drift in active sessions / users, and yes for users with many active sessions models with high token/s ultimately kill the system and Redis. Which was what led me to the PR that led to 1a93891d97 , but we definitely need a solution for the root problem. Didn't delve into https://github.com/open-webui/open-webui/pull/16693 yet, but I very much hope it is the answer.

@Ithanil commented on GitHub (Aug 23, 2025): > > Hi [@frdeng](https://github.com/frdeng)! I was wondering if this commit ([1a93891](https://github.com/open-webui/open-webui/commit/1a93891d97aa7a2380055da139bc0adbbf1b9f42)) actually helps address this issue. It seems that simply setting `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE` to 2 or 3 might be more effective. Could you please clarify the intention behind this change, or share your thoughts? Thank you! > > This is new feature right? it would definitely help I think. I tend to agree the stream events should be sent to all user sessions, maybe we extend this feature for the non current sessions, we only send one final event with stream data of all trunks combined? > > Another related issue is the session pool cache. the user session count doesn't neccessarily match the actual browser tab count. I noticed over time, a user could end up having dozen of sessions due to ungraceful disconnect or other reason(pod restart, race condition on cache update between the pods etc). we should have a stale session detection and cleanup mechanism for the redis session pool and user pool cache, Yes, there is definitely a permanent upwards drift in active sessions / users, and yes for users with many active sessions models with high token/s ultimately kill the system and Redis. Which was what led me to the PR that led to https://github.com/open-webui/open-webui/commit/1a93891d97aa7a2380055da139bc0adbbf1b9f42 , but we definitely need a solution for the root problem. Didn't delve into https://github.com/open-webui/open-webui/pull/16693 yet, but I very much hope it is the answer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5940