[GH-ISSUE #14945] issue: Performance degradation as active users increase due to "Active Users" count & user-list emitters #17415

Closed
opened 2026-04-19 23:09:34 -05:00 by GiteaMirror · 22 comments
Owner

Originally created by @taylorwilsdon on GitHub (Jun 12, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/14945

Originally assigned to: @tjbck on GitHub.

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.14

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22

Browser (if applicable)

Chrome

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Load should scale in a somewhat linear manner with associated infrastructure when operating in a distributed / multi-node environment (ie double your users, double your hardware will accomplish similar performance)

Actual Behavior

When a certain volume of users is present and websockets + redis are being used, chat streaming performance begins to degrade substantially to the point that chat completions never finish no matter how much infra you throw at it. This appears in large part to be due to the fact that the amount of server sent websocket events to the client skyrockets under heavy user activity as each join and leave fires a user-list and usage event. Under low load it streams nicely, but as the frequency increases it gets to the point that even an m4 max with plenty of headroom starts to struggle and eventually becomes overwhelmed.

https://github.com/user-attachments/assets/e7ab5146-5f88-4579-9e8b-f201c4ca5a85

We commented out these bits in websocket.ts

	_socket.on('user-list', (data) => {
		console.log('user-list', data);
		activeUserIds.set(data.user_ids);
	});

	_socket.on('usage', (data) => {
		console.log('usage', data);
		USAGE_POOL.set(data['models']);
	});

These in socket/main.py

async def connect(sid, environ, auth):
...
await sio.emit("user-list", {"user_ids": list(USER_POOL.keys())})
await sio.emit("usage", {"models": get_models_in_use()})

async def user_join(sid, data):
...
    await sio.emit("user-list", {"user_ids": list(USER_POOL.keys())})

async def user_list(sid):
etc

Making that change resulted in an immediate and dramatic reduction in load on the backend hosts, and better performance on the clients:

Image

Steps to Reproduce

Have 1300 people use an open-webui instance

Logs & Screenshots

https://github.com/user-attachments/assets/e7ab5146-5f88-4579-9e8b-f201c4ca5a85

Image

Additional Information

My proposed solution (will have an associated PR @tjbck) is a new environment variable for ENABLE_USER_POOL_EVENTS that if disabled, basically skips the event emits and lose the "Active Users" count in the settings menu.

Also probably worth considering making the console.log statements ie console.log('user-list', data); not exist in general (even if you do want the counts streamed / enabled), because heavy console.log usage will degrade client performance at high volumes even in ideal circumstances.

Originally created by @taylorwilsdon on GitHub (Jun 12, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/14945 Originally assigned to: @tjbck on GitHub. ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.14 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22 ### Browser (if applicable) Chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Load should scale in a somewhat linear manner with associated infrastructure when operating in a distributed / multi-node environment (ie double your users, double your hardware will accomplish similar performance) ### Actual Behavior When a certain volume of users is present and websockets + redis are being used, chat streaming performance begins to degrade substantially to the point that chat completions never finish no matter how much infra you throw at it. This appears in large part to be due to the fact that the amount of server sent websocket events to the client skyrockets under heavy user activity as each join and leave fires a user-list and usage event. Under low load it streams nicely, but as the frequency increases it gets to the point that even an m4 max with plenty of headroom starts to struggle and eventually becomes overwhelmed. https://github.com/user-attachments/assets/e7ab5146-5f88-4579-9e8b-f201c4ca5a85 We commented out these bits in [websocket.ts](https://github.com/open-webui/open-webui/blob/main/src/lib/utils/websocket.ts#L43) ``` _socket.on('user-list', (data) => { console.log('user-list', data); activeUserIds.set(data.user_ids); }); _socket.on('usage', (data) => { console.log('usage', data); USAGE_POOL.set(data['models']); }); ``` These in [socket/main.py](https://github.com/open-webui/open-webui/blob/main/backend/open_webui/socket/main.py) ``` async def connect(sid, environ, auth): ... await sio.emit("user-list", {"user_ids": list(USER_POOL.keys())}) await sio.emit("usage", {"models": get_models_in_use()}) async def user_join(sid, data): ... await sio.emit("user-list", {"user_ids": list(USER_POOL.keys())}) async def user_list(sid): etc ``` Making that change resulted in an immediate and dramatic reduction in load on the backend hosts, and better performance on the clients: <img width="1117" alt="Image" src="https://github.com/user-attachments/assets/99476b61-7897-4c35-9f27-72131a908531" /> ### Steps to Reproduce Have 1300 people use an open-webui instance ### Logs & Screenshots https://github.com/user-attachments/assets/e7ab5146-5f88-4579-9e8b-f201c4ca5a85 <img width="1117" alt="Image" src="https://github.com/user-attachments/assets/99476b61-7897-4c35-9f27-72131a908531" /> ### Additional Information My proposed solution (will have an associated PR @tjbck) is a new environment variable for `ENABLE_USER_POOL_EVENTS` that if disabled, basically skips the event emits and lose the "Active Users" count in the settings menu. Also probably worth considering making the console.log statements ie `console.log('user-list', data);` not exist in general (even if you do want the counts streamed / enabled), because heavy console.log usage will degrade client performance at high volumes even in ideal circumstances.
GiteaMirror added the bug label 2026-04-19 23:09:34 -05:00
Author
Owner

@taylorwilsdon commented on GitHub (Jun 12, 2025):

relates to https://github.com/open-webui/open-webui/issues/13026

<!-- gh-comment-id:2968166580 --> @taylorwilsdon commented on GitHub (Jun 12, 2025): relates to https://github.com/open-webui/open-webui/issues/13026
Author
Owner

@Classic298 commented on GitHub (Jun 13, 2025):

So disabling the user count is leading to such drastic performance improvements already? Interesting

<!-- gh-comment-id:2969196051 --> @Classic298 commented on GitHub (Jun 13, 2025): So disabling the user count is leading to such drastic performance improvements already? Interesting
Author
Owner

@rgaricano commented on GitHub (Jun 13, 2025):

I was thinkning the same, but it's impossible that such memory use was due only for those functions.
We are speaking of more than a 50%, and in GB ranges!

unless it is due to a "blocking" effect that prevents memory discharge. (that I don't see)

<!-- gh-comment-id:2969373872 --> @rgaricano commented on GitHub (Jun 13, 2025): I was thinkning the same, but it's impossible that such memory use was due only for those functions. We are speaking of more than a 50%, and in GB ranges! unless it is due to a "blocking" effect that prevents memory discharge. (that I don't see)
Author
Owner

@rgaricano commented on GitHub (Jun 13, 2025):

a question,
why those await ?
63256136ef/backend/open_webui/socket/main.py (L174)
63256136ef/backend/open_webui/socket/main.py (L195)
it would not be enough with a send_usage = True set? (and leave the update to the pool loop )

and/or
Release memory after emits for both user-list & usage?
(call to release_func() after those awaits )

<!-- gh-comment-id:2969463959 --> @rgaricano commented on GitHub (Jun 13, 2025): a question, why those await ? https://github.com/open-webui/open-webui/blob/63256136ef8322210c01c2bb322097d1ccfb8c6f/backend/open_webui/socket/main.py#L174 https://github.com/open-webui/open-webui/blob/63256136ef8322210c01c2bb322097d1ccfb8c6f/backend/open_webui/socket/main.py#L195 it would not be enough with a `send_usage = True` set? (and leave the update to the pool loop ) and/or Release memory after emits for both user-list & usage? (call to `release_func()` after those awaits )
Author
Owner

@Ithanil commented on GitHub (Jun 13, 2025):

Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console.

<!-- gh-comment-id:2969786208 --> @Ithanil commented on GitHub (Jun 13, 2025): Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console.
Author
Owner

@Classic298 commented on GitHub (Jun 13, 2025):

@Ithanil no, not new. 0.6.14 works fine for me. This seems to be an existing issue, related to the user count on very large instances

<!-- gh-comment-id:2969798108 --> @Classic298 commented on GitHub (Jun 13, 2025): @Ithanil no, not new. 0.6.14 works fine for me. This seems to be an existing issue, related to the user count on very large instances
Author
Owner

@Ithanil commented on GitHub (Jun 13, 2025):

@Ithanil no, not new. 0.6.14 works fine for me. This seems to be an existing issue, related to the user count on very large instances

But he says 1000 users on a day, we had like 3000 new users within 2 hours after an official announcement mail.

Not saying this isn't an issues, I don't see any reason whatsoever for the full user list being logged to console.
Also the console.log of every streaming chunk seems unnecessary, right?

<!-- gh-comment-id:2969801362 --> @Ithanil commented on GitHub (Jun 13, 2025): > [@Ithanil](https://github.com/Ithanil) no, not new. 0.6.14 works fine for me. This seems to be an existing issue, related to the user count on very large instances But he says 1000 users on a day, we had like 3000 _new_ users within 2 hours after an official announcement mail. Not saying this isn't an issues, I don't see any reason whatsoever for the full user list being logged to console. Also the console.log of every streaming chunk seems unnecessary, right?
Author
Owner
<!-- gh-comment-id:2969885719 --> @Ithanil commented on GitHub (Jun 13, 2025): FYI https://github.com/jhubbardsf/vite-plugin-svelte-console-remover also: https://github.com/jhubbardsf/vite-plugin-svelte-console-remover/issues/1#issue-1240211980
Author
Owner

@Classic298 commented on GitHub (Jun 13, 2025):

Hm. Yeah console log for every chunk is only in debug mode i think

<!-- gh-comment-id:2969895714 --> @Classic298 commented on GitHub (Jun 13, 2025): Hm. Yeah console log for every chunk is only in debug mode i think
Author
Owner

@Ithanil commented on GitHub (Jun 13, 2025):

https://github.com/open-webui/open-webui/pull/14958

99% sure won't be merged, but maybe it's interesting for you

<!-- gh-comment-id:2970140764 --> @Ithanil commented on GitHub (Jun 13, 2025): https://github.com/open-webui/open-webui/pull/14958 99% sure won't be merged, but maybe it's interesting for you
Author
Owner

@taylorwilsdon commented on GitHub (Jun 13, 2025):

Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console.

This is 1200+ concurrent active users. It is not a visible problem under 1k or so. 200 is a breeze never seen any issues at that level, you can run one node no problem. 5k or so users, 16 nodes.

<!-- gh-comment-id:2970258375 --> @taylorwilsdon commented on GitHub (Jun 13, 2025): > Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console. This is 1200+ concurrent active users. It is not a visible problem under 1k or so. 200 is a breeze never seen any issues at that level, you can run one node no problem. 5k or so users, 16 nodes.
Author
Owner

@Ithanil commented on GitHub (Jun 13, 2025):

Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console.

This is 1200+ concurrent active users. It is not a visible problem under 1k or so. 200 is a breeze you can run one node. 5k or so user instance.

OK, I see. Thanks for clarification!

We have a fully HA multi-node setup and so far I was happy to see all components chilling, except for the GPU-Servers.

<!-- gh-comment-id:2970260907 --> @Ithanil commented on GitHub (Jun 13, 2025): > > Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console. > > This is 1200+ concurrent active users. It is not a visible problem under 1k or so. 200 is a breeze you can run one node. 5k or so user instance. OK, I see. Thanks for clarification! We have a fully HA multi-node setup and so far I was happy to see all components chilling, except for the GPU-Servers.
Author
Owner

@taylorwilsdon commented on GitHub (Jun 13, 2025):

Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console.

This is 1200+ concurrent active users. It is not a visible problem under 1k or so. 200 is a breeze you can run one node. 5k or so user instance.

OK, I see. Thanks for clarification!

We have a fully HA multi-node setup and so far I was happy to see all components chilling, except for the GPU-Servers.

Yeah same setup here, overall scales linearly well but this is definitely a bottleneck that will eventually hit because you have thousands of clients getting thousands of events every second (anytime the user count or usage list changes, which is constantly happening at scale) so commenting out the backend python there was indeed the sole change in the AWS resource usage graph. Reduced egress calls from the cluster by millions of requests.

The open websocket waiting for chat completion streaming chunks is competing with these nonstop updates and ends up falling behind in actual text response so it’s very visible on the client side even if you scale your backend infra to infinity. I got like 3000 console logs from user-info while waiting for a 300 token response. Will have a PR today to fix.

<!-- gh-comment-id:2970281163 --> @taylorwilsdon commented on GitHub (Jun 13, 2025): > > > Just a question, is this somehow a regression with 0.6.14? Because on 0.6.13 we didn't have issues serving 200+ active users concurrently, despite the user list logging to console. > > > > > > This is 1200+ concurrent active users. It is not a visible problem under 1k or so. 200 is a breeze you can run one node. 5k or so user instance. > > OK, I see. Thanks for clarification! > > We have a fully HA multi-node setup and so far I was happy to see all components chilling, except for the GPU-Servers. Yeah same setup here, overall scales linearly well but this is definitely a bottleneck that will eventually hit because you have thousands of clients getting thousands of events every second (anytime the user count or usage list changes, which is constantly happening at scale) so commenting out the backend python there was indeed the sole change in the AWS resource usage graph. Reduced egress calls from the cluster by millions of requests. The open websocket waiting for chat completion streaming chunks is competing with these nonstop updates and ends up falling behind in actual text response so it’s very visible on the client side even if you scale your backend infra to infinity. I got like 3000 console logs from user-info while waiting for a 300 token response. Will have a PR today to fix.
Author
Owner

@Classic298 commented on GitHub (Jun 13, 2025):

Good catch. Will love to see the pr

<!-- gh-comment-id:2970314207 --> @Classic298 commented on GitHub (Jun 13, 2025): Good catch. Will love to see the pr
Author
Owner

@tjbck commented on GitHub (Jun 16, 2025):

423a35782b may have already addressed the issue, will also follow up with additional PRs to disable the feature entirely!

<!-- gh-comment-id:2975291442 --> @tjbck commented on GitHub (Jun 16, 2025): 423a35782ba09267e2e8dfe2edfca808d0504e99 may have already addressed the issue, will also follow up with additional PRs to disable the feature entirely!
Author
Owner

@Ithanil commented on GitHub (Jun 16, 2025):

423a357 may have already addressed the issue, will also follow up with additional PRs to disable the feature entirely!

Losing the feature entirely is sad news ☹️

<!-- gh-comment-id:2975308356 --> @Ithanil commented on GitHub (Jun 16, 2025): > [423a357](https://github.com/open-webui/open-webui/commit/423a35782ba09267e2e8dfe2edfca808d0504e99) may have already addressed the issue, will also follow up with additional PRs to disable the feature entirely! Losing the feature entirely is sad news ☹️
Author
Owner

@tjbck commented on GitHub (Jun 16, 2025):

@Ithanil it should still work as intended? let me know if that's not the case for your deployment.

<!-- gh-comment-id:2975325686 --> @tjbck commented on GitHub (Jun 16, 2025): @Ithanil it should still work as intended? let me know if that's not the case for your deployment.
Author
Owner

@Ithanil commented on GitHub (Jun 16, 2025):

@Ithanil it should still work as intended? let me know if that's not the case for your deployment.

Oh, I understood "will also follow up with additional PRs to disable the feature entirely!" as if you were going to remove the user count display. Sorry, apparently a misunderstanding.

<!-- gh-comment-id:2975328556 --> @Ithanil commented on GitHub (Jun 16, 2025): > [@Ithanil](https://github.com/Ithanil) it should still work as intended? let me know if that's not the case for your deployment. Oh, I understood "will also follow up with additional PRs to disable the feature entirely!" as if you were going to remove the user count display. Sorry, apparently a misunderstanding.
Author
Owner

@tjbck commented on GitHub (Jun 16, 2025):

to provide an option to disable* 😅

<!-- gh-comment-id:2975390155 --> @tjbck commented on GitHub (Jun 16, 2025): to provide an option to disable* 😅
Author
Owner

@Classic298 commented on GitHub (Jun 16, 2025):

an option to disable is good. Save every % performance possible haha

<!-- gh-comment-id:2975418995 --> @Classic298 commented on GitHub (Jun 16, 2025): an option to disable is good. Save every % performance possible haha
Author
Owner

@Yash-Patidar commented on GitHub (Jun 16, 2025):

Is it possible to support both SSE and WebSocket, with the option to switch between them using an environment variable? Just curious — I know this might require significant changes, but it could be a good approach for scalability and better resource optimization. For example, even ChatGPT’s website uses SSE for handling streaming responses efficiently.
Image

<!-- gh-comment-id:2975428567 --> @Yash-Patidar commented on GitHub (Jun 16, 2025): Is it possible to support both SSE and WebSocket, with the option to switch between them using an environment variable? Just curious — I know this might require significant changes, but it could be a good approach for scalability and better resource optimization. For example, even ChatGPT’s website uses SSE for handling streaming responses efficiently. ![Image](https://github.com/user-attachments/assets/b77f8c8a-78e2-4db9-8e2d-4190a1ca2dc6)
Author
Owner

@taylorwilsdon commented on GitHub (Jun 16, 2025):

423a357 may have already addressed the issue, will also follow up with additional PRs to disable the feature entirely!

Amazing, appreciate it! This is a much more sensible solution than my bool on/off haha

<!-- gh-comment-id:2976800394 --> @taylorwilsdon commented on GitHub (Jun 16, 2025): > [423a357](https://github.com/open-webui/open-webui/commit/423a35782ba09267e2e8dfe2edfca808d0504e99) may have already addressed the issue, will also follow up with additional PRs to disable the feature entirely! Amazing, appreciate it! This is a much more sensible solution than my bool on/off haha
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#17415