issue: Unavailable direct connection chat with multiple workers due to WebSocket/API routing mismatch #5601

New Issue

GiteaMirror · 2025-11-11T16:25:56-06:00

GiteaMirror commented

2025-11-11 16:25:56 -06:00

Originally created by @ShirasawaSama on GitHub (Jun 20, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.6.15 (latest dev)

Ollama Version (if applicable)

No response

Operating System

MacOS 15.5

Browser (if applicable)

Edge 133

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

OpenWebUI with direct connection mode enabled
Multiple backend workers (workers > 1)
WebSocket + SSE communication flow

Actual Behavior

When using "direct connection" mode with multiple backend workers, chat requests may timeout due to WebSocket and API request routing to different worker instances.

Requests get stuck without responses
Eventually timeout with error messages
Issue occurs intermittently when multiple workers are deployed

Steps to Reproduce

Configure OpenWebUI with multiple workers (workers > 1)

.env

ENABLE_WEBSOCKET_SUPPORT=true
WEBSOCKET_MANAGER=redis
REDIS_URL=redis://default:password@localhost:6379/1
WEBSOCKET_REDIS_URL=redis://default:password@localhost:6379/1

$ uvicorn open_webui.main:app --host 0.0.0.0 --port 8080 --forwarded-allow-ips '*' --log-config ./uvicorn_config.json --workers 8

Enable direct connection mode
Send direct connection chat request
Observe that some requests timeout without responses

Logs & Screenshots

Additional Information

The current architecture has the following flow:

Frontend sends request to /api/v1/completions
Backend responds via WebSocket, instructing frontend to send SSE request to direct connection address /completions
Frontend forwards received data back through WebSocket
Backend starts listening for WebSocket messages from the initial request

Problem: When workers > 1, the WebSocket connection and the /api/v1/completions API request may be routed to different worker instances, causing the system to wait indefinitely for responses that will never arrive.

Originally created by @ShirasawaSama on GitHub (Jun 20, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.6.15 (latest dev) ### Ollama Version (if applicable) _No response_ ### Operating System MacOS 15.5 ### Browser (if applicable) Edge 133 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior 1. OpenWebUI with direct connection mode enabled 2. Multiple backend workers (workers > 1) 3. WebSocket + SSE communication flow ### Actual Behavior When using "direct connection" mode with multiple backend workers, chat requests may timeout due to WebSocket and API request routing to different worker instances. 1. Requests get stuck without responses 2. Eventually timeout with error messages 3. Issue occurs intermittently when multiple workers are deployed ### Steps to Reproduce 1. Configure OpenWebUI with multiple workers (workers > 1) .env ``` ENABLE_WEBSOCKET_SUPPORT=true WEBSOCKET_MANAGER=redis REDIS_URL=redis://default:password@localhost:6379/1 WEBSOCKET_REDIS_URL=redis://default:password@localhost:6379/1 ``` ```bash $ uvicorn open_webui.main:app --host 0.0.0.0 --port 8080 --forwarded-allow-ips '*' --log-config ./uvicorn_config.json --workers 8 ``` 2. Enable direct connection mode 3. Send direct connection chat request 4. Observe that some requests timeout without responses ### Logs & Screenshots ![Image](https://github.com/user-attachments/assets/9fbd0831-914e-4142-80dd-39580004088d) ### Additional Information The current architecture has the following flow: 1. Frontend sends request to `/api/v1/completions` 2. Backend responds via WebSocket, instructing frontend to send SSE request to direct connection address `/completions` 3. Frontend forwards received data back through WebSocket 4. Backend starts listening for WebSocket messages from the initial request **Problem**: When `workers > 1`, the WebSocket connection and the `/api/v1/completions` API request may be routed to different worker instances, causing the system to wait indefinitely for responses that will never arrive.

GiteaMirror added the bug label 2025-11-11 16:25:56 -06:00

GiteaMirror commented

2025-11-11 16:25:57 -06:00

@ShirasawaSama commented on GitHub (Jun 20, 2025):

We can add the following print statement to the generate_direct_chat_completion function in the backend/open_webui/utils/chat.py` file:

and to the backend/open_webui/socket/main.py file:

and to the src/routes/+layout.svelte file:

At this point, if the above problem occurs, we can observe that the pid of the two processes is completely different, and at the same time, the process of the http request does not receive websocket data at all!

If the request is normal and does not time out, the output is the following:

@ShirasawaSama commented on GitHub (Jun 20, 2025): We can add the following print statement to the `generate_direct_chat_completion function in the `backend/open_webui/utils/chat.py` file: ![Image](https://github.com/user-attachments/assets/4d86ace1-508e-4d05-8c6f-fd411154dc47) and to the `backend/open_webui/socket/main.py` file: ![Image](https://github.com/user-attachments/assets/2307d499-aaaf-44a1-9b74-1eab6a45f59a) and to the `src/routes/+layout.svelte` file: ![Image](https://github.com/user-attachments/assets/7997b1a4-b39b-4ef7-9a05-fa1c4fac9e77) At this point, if the above problem occurs, we can observe that the pid of the two processes is completely different, and at the same time, the process of the http request does not receive websocket data at all! ![Image](https://github.com/user-attachments/assets/ebc4dfca-0aac-436a-98ff-4d8fb174c959) If the request is normal and does not time out, the output is the following: ![Image](https://github.com/user-attachments/assets/bc9fbbda-5ed7-4367-81e0-9fbb8f0c56e2)

GiteaMirror commented

2025-11-11 16:25:58 -06:00

@Zyfax commented on GitHub (Jun 20, 2025):

When operating with multiple instances, WEBUI_SECRET_KEY has to be identical on all.
With docker, it random generates a key on boot and will therefor be a mismatch.

@Zyfax commented on GitHub (Jun 20, 2025): When operating with multiple instances, `WEBUI_SECRET_KEY` has to be identical on all. With docker, it random generates a key on boot and will therefor be a mismatch. ![Image](https://github.com/user-attachments/assets/77958811-6e62-4404-8f1c-5f0b39201652)

GiteaMirror commented

2025-11-11 16:25:58 -06:00

@ShirasawaSama commented on GitHub (Jun 20, 2025):

When operating with multiple instances, WEBUI_SECRET_KEY has to be identical on all. With docker, it random generates a key on boot and will therefor be a mismatch.

I have added this environment variable and it still behaves the same:

$ WEBUI_SECRET_KEY=ifehiofhsefh uvicorn open_webui.main:app --host 0.0.0.0 --port 8080 --forwarded-allow-ips '*' --log-config ./uvicorn_config.json --workers 8

@ShirasawaSama commented on GitHub (Jun 20, 2025): > When operating with multiple instances, `WEBUI_SECRET_KEY` has to be identical on all. With docker, it random generates a key on boot and will therefor be a mismatch. > > ![Image](https://github.com/user-attachments/assets/77958811-6e62-4404-8f1c-5f0b39201652) I have added this environment variable and it still behaves the same: ```bash $ WEBUI_SECRET_KEY=ifehiofhsefh uvicorn open_webui.main:app --host 0.0.0.0 --port 8080 --forwarded-allow-ips '*' --log-config ./uvicorn_config.json --workers 8 ``` ![Image](https://github.com/user-attachments/assets/bb802dff-d65b-42ea-b2cc-883847b086c6)

GiteaMirror commented

2025-11-11 16:25:58 -06:00

@tjbck commented on GitHub (Jun 20, 2025):

Any reason why you have to utilise multiple workers instead of multi-replica setup?

@tjbck commented on GitHub (Jun 20, 2025): Any reason why you have to utilise multiple workers instead of multi-replica setup?

GiteaMirror commented

2025-11-11 16:25:59 -06:00

@ShirasawaSama commented on GitHub (Jun 20, 2025):

Any reason why you have to utilise multiple workers instead of multi-replica setup?

yes, my .env file:

WEBUI_SECRET_KEY=hdjdn84kkwn
ENABLE_WEBSOCKET_SUPPORT=true
WEBSOCKET_MANAGER=redis
REDIS_URL=redis://default:password@localhost:6379/1
WEBSOCKET_REDIS_URL=redis://default:password@localhost:6379/1

@ShirasawaSama commented on GitHub (Jun 20, 2025): > Any reason why you have to utilise multiple workers instead of multi-replica setup? yes, my .env file: ``` WEBUI_SECRET_KEY=hdjdn84kkwn ENABLE_WEBSOCKET_SUPPORT=true WEBSOCKET_MANAGER=redis REDIS_URL=redis://default:password@localhost:6379/1 WEBSOCKET_REDIS_URL=redis://default:password@localhost:6379/1 ```

GiteaMirror commented

2025-11-11 16:25:59 -06:00

@ShirasawaSama commented on GitHub (Jun 20, 2025):

Any reason why you have to utilise multiple workers instead of multi-replica setup?

In fact, I've tried multiple k8s pod deployments as well as multiple worker deployments. However either way, as long as the number of instances is greater than 1, there is a probability that it will cause the chat to get stuck on a direct connection.

In addition, I read the official introduction of socket.io, and the official redis adapter seems to support only server-side push with multiple workers. As for receiving, client's data is not pushed to all workers.

https://socket.io/docs/v4/redis-adapter/

@ShirasawaSama commented on GitHub (Jun 20, 2025): > Any reason why you have to utilise multiple workers instead of multi-replica setup? In fact, I've tried multiple k8s pod deployments as well as multiple worker deployments. However either way, as long as the number of instances is greater than 1, there is a probability that it will cause the chat to get stuck on a direct connection. In addition, I read the official introduction of socket.io, and the official redis adapter seems to support only server-side push with multiple workers. As for receiving, client's data is not pushed to all workers. https://socket.io/docs/v4/redis-adapter/ ![](https://socket.io/images/redis-emitter.png)

GiteaMirror commented

2025-11-11 16:25:59 -06:00

@chemi392 commented on GitHub (Jun 23, 2025):

I am not using direct connections, but do have multiple workers running on my instance and am experiencing the same behavior as of v0.6.15

@chemi392 commented on GitHub (Jun 23, 2025): I am not using direct connections, but do have multiple workers running on my instance and am experiencing the same behavior as of v0.6.15

GiteaMirror commented

2025-11-11 16:25:59 -06:00

@hpavanatti commented on GitHub (Jun 24, 2025):

Same problem here, multiple workers running, and experiencing the same in version v0.6.15(in v0.6.11 it happens to!)

@hpavanatti commented on GitHub (Jun 24, 2025): Same problem here, multiple workers running, and experiencing the same in version v0.6.15(in v0.6.11 it happens to!)

GiteaMirror commented

2025-11-11 16:26:00 -06:00

@Simon-Stone commented on GitHub (Sep 15, 2025):

This is still a problem in v0.6.28.

@Simon-Stone commented on GitHub (Sep 15, 2025): This is still a problem in v0.6.28.

GiteaMirror commented

2025-11-11 16:26:00 -06:00

@Zyfax commented on GitHub (Sep 15, 2025):

This is still a problem in v0.6.28.

Enable direct connection under Admin Panel, Settings, Connections

@Zyfax commented on GitHub (Sep 15, 2025): > This is still a problem in v0.6.28. Enable direct connection under Admin Panel, Settings, Connections ![Image](https://github.com/user-attachments/assets/cf9390c1-8bb2-48f7-a0d1-2b8d9b3dcc66)

GiteaMirror commented

2025-11-11 16:26:00 -06:00

@ShirasawaSama commented on GitHub (Sep 15, 2025):

This is still a problem in v0.6.28.

Enable direct connection under Admin Panel, Settings, Connections

Yes, when you enable this option and use multiple workers, this issue arises.

@ShirasawaSama commented on GitHub (Sep 15, 2025): > > This is still a problem in v0.6.28. > > Enable direct connection under Admin Panel, Settings, Connections > > ![Image](https://github.com/user-attachments/assets/cf9390c1-8bb2-48f7-a0d1-2b8d9b3dcc66) Yes, when you enable this option and use multiple workers, this issue arises.

GiteaMirror commented

2025-11-11 16:26:01 -06:00

@Simon-Stone commented on GitHub (Sep 15, 2025):

It's not that the option to make a Direct Connection does not show up in the user settings. That works fine.

The issue is that requests made using a client-side Direct Connection do not succeed. The message just shows the placeholder indefinitely.

I believe that I narrowed the issue down a little bit further: This only seems to happen when streaming responses. When Stream Chat Response is set to off, the requests seem to go through just fine.

Would be nice if others with the same issue here could confirm that.

@Simon-Stone commented on GitHub (Sep 15, 2025): It's not that the option to make a Direct Connection does not show up in the user settings. That works fine. The issue is that requests made using a client-side Direct Connection do not succeed. The message just shows the placeholder indefinitely. I believe that I narrowed the issue down a little bit further: This only seems to happen when streaming responses. When Stream Chat Response is set to off, the requests seem to go through just fine. Would be nice if others with the same issue here could confirm that.

GiteaMirror commented

2025-11-11 16:26:01 -06:00

@ShirasawaSama commented on GitHub (Sep 15, 2025):

It's not that the option to make a Direct Connection does not show up in the user settings. That works fine.

The issue is that requests made using a client-side Direct Connection do not succeed. The message just shows the placeholder indefinitely.

I believe that I narrowed the issue down a little bit further: This only seems to happen when streaming responses. When Stream Chat Response is set to off, the requests seem to go through just fine.

Would be nice if others with the same issue here could confirm that.

Because non streaming requests do not require OpenWebUI for WebSocket broadcasting, there are no issues.

The root cause of the problem now is actually very clear, as I mentioned earlier, socket.io does not broadcast the websocket data sent by the client to all servers. It may be necessary to refactor the entire directly connected logic.

@ShirasawaSama commented on GitHub (Sep 15, 2025): > It's not that the option to make a Direct Connection does not show up in the user settings. That works fine. > > The issue is that requests made using a client-side Direct Connection do not succeed. The message just shows the placeholder indefinitely. > > I believe that I narrowed the issue down a little bit further: This only seems to happen when streaming responses. When Stream Chat Response is set to off, the requests seem to go through just fine. > > Would be nice if others with the same issue here could confirm that. Because non streaming requests do not require OpenWebUI for WebSocket broadcasting, there are no issues. The root cause of the problem now is actually very clear, as I mentioned earlier, socket.io does not broadcast the websocket data sent by the client to all servers. It may be necessary to refactor the entire directly connected logic.

GiteaMirror commented

2025-11-11 16:26:01 -06:00

@Simon-Stone commented on GitHub (Sep 18, 2025):

Still an issue in v0.6.30

@Simon-Stone commented on GitHub (Sep 18, 2025): Still an issue in v0.6.30

GiteaMirror commented

2025-11-11 16:26:01 -06:00

@jasonpnnl commented on GitHub (Oct 28, 2025):

Root Cause Analysis

The issue is caused by dynamic event handler registration in backend/open_webui/utils/chat.py:95 that only registers handlers in the local worker process, not globally across all workers.

Current Broken Flow

backend/open_webui/utils/chat.py:85-98:

channel = f"{user_id}:{session_id}:{request_id}"

if form_data.get("stream"):
    q = asyncio.Queue()

    async def message_listener(sid, data):
        await q.put(data)

    # BUG: This only registers the handler in the CURRENT worker!
    sio.on(channel, message_listener)

    res = await event_caller({...})  # Send RPC to browser

What Happens with Multiple Workers

HTTP request POST /api/v1/completions → Worker A (random routing)
Worker A registers handler sio.on(channel, message_listener) locally in Worker A's memory
Worker A sends sio.call() RPC to browser → routes through Redis to Worker B (where WebSocket lives)
Browser responds {'status': True} ✅ (RPC succeeds, returns to Worker A)
Browser starts streaming chunks, emits to channel user_id:session_id:request_id
Streaming chunks arrive at Worker B (where browser WebSocket is connected)
Worker B searches for handler for event user_id:session_id:request_id
❌ Handler only exists in Worker A!
Events are silently dropped
Worker A's await q.get() waits forever
Request times out

Why Other Socket.IO Events Work

All other socket.io handlers use static registration at module load time:

backend/open_webui/socket/main.py:

@sio.on("usage")          # Registered globally when module loads
@sio.on("user-join")      # Registered globally when module loads
@sio.on("events:channel") # Registered globally when module loads

These handlers exist in all workers, so events can be processed regardless of which worker receives them.

Proposed Fix: Use Global Handler + Redis Pub/Sub Routing

Replace per-request dynamic handlers with a global handler that routes messages via Redis.

Implementation Steps

1. Add Global Data Structures

File: backend/open_webui/socket/main.py

Add at module level, after sio initialization:

# Global registry for direct chat queues (per-worker, in-memory)
DIRECT_CHAT_QUEUES = {}  # {channel: asyncio.Queue}

# Redis-backed distributed tracking (optional, for monitoring)
if WEBSOCKET_MANAGER == "redis":
    DIRECT_CHAT_CHANNELS = RedisDict(
        redis=REDIS,
        prefix=f"{REDIS_KEY_PREFIX}:direct_chat_channels",
        fallback={}
    )
else:
    DIRECT_CHAT_CHANNELS = {}

2. Add Global Socket.IO Handler

File: backend/open_webui/socket/main.py

Add new global handler:

@sio.on("direct-chat-stream")
async def handle_direct_chat_stream(sid, data):
    """
    Global handler for direct chat streaming chunks.
    Routes messages to the correct queue based on channel.
    Works across multiple workers via Redis pub/sub.
    """
    channel = data.get("channel")
    if not channel:
        log.warning("Received direct-chat-stream without channel")
        return

    # Check if this worker has the queue for this channel
    if channel in DIRECT_CHAT_QUEUES:
        try:
            await DIRECT_CHAT_QUEUES[channel].put(data)
            log.debug(f"Queued data for channel {channel} on local worker")
        except Exception as e:
            log.error(f"Error queueing data for channel {channel}: {e}")
    else:
        # Queue is on another worker, forward via Redis pub/sub
        if WEBSOCKET_MANAGER == "redis":
            try:
                await REDIS.publish(
                    f"direct_chat_channel:{channel}",
                    json.dumps(data)
                )
                log.debug(f"Published data for channel {channel} to Redis")
            except Exception as e:
                log.error(f"Error publishing to Redis for channel {channel}: {e}")
        else:
            log.warning(f"Channel {channel} not found and Redis not available")

3. Add Redis Pub/Sub Listener

File: backend/open_webui/socket/main.py

Add new async function:

async def redis_direct_chat_listener():
    """
    Listen for direct chat messages published from other workers.
    Routes them to local queues if present.
    """
    if WEBSOCKET_MANAGER != "redis":
        return

    try:
        pubsub = REDIS.pubsub()
        await pubsub.psubscribe("direct_chat_channel:*")
        log.info("Started Redis direct chat listener")

        async for message in pubsub.listen():
            if message["type"] != "pmessage":
                continue

            try:
                # Extract channel from Redis key
                redis_channel = message["channel"]
                if isinstance(redis_channel, bytes):
                    redis_channel = redis_channel.decode()

                channel = redis_channel.replace("direct_chat_channel:", "")

                # Check if this worker has the queue
                if channel in DIRECT_CHAT_QUEUES:
                    data = json.loads(message["data"])
                    await DIRECT_CHAT_QUEUES[channel].put(data)
                    log.debug(f"Received and queued Redis message for channel {channel}")
            except Exception as e:
                log.error(f"Error processing Redis message: {e}")
    except Exception as e:
        log.error(f"Redis direct chat listener error: {e}")
    finally:
        try:
            await pubsub.unsubscribe()
        except:
            pass

4. Start Redis Listener on Startup

File: backend/open_webui/main.py

In the startup event handler:

@app.on_event("startup")
async def startup_event():
    # ... existing startup code ...

    # Start Redis listener for direct chat routing
    if WEBSOCKET_MANAGER == "redis":
        asyncio.create_task(redis_direct_chat_listener())
        log.info("Started Redis direct chat listener task")

5. Modify Direct Chat Completion Function

File: backend/open_webui/utils/chat.py

Import the global registry at the top:

from open_webui.socket.main import (
    sio,
    get_event_call,
    get_event_emitter,
    DIRECT_CHAT_QUEUES,  # Add this import
)

Replace lines 85-108 (the streaming handler setup):

if form_data.get("stream"):
    q = asyncio.Queue()
    channel = f"{user_id}:{session_id}:{request_id}"

    # Register queue in local registry (NO dynamic handler registration!)
    DIRECT_CHAT_QUEUES[channel] = q

    # Optional: Track in Redis for monitoring/debugging
    if WEBSOCKET_MANAGER == "redis":
        try:
            await REDIS.hset(
                f"direct_chat_active:{channel}",
                "worker_pid",
                str(os.getpid()),
                ex=300  # 5 minute TTL
            )
        except Exception as e:
            log.warning(f"Failed to track channel in Redis: {e}")

    # Start processing chat completion in background
    res = await event_caller(
        {
            "type": "request:chat:completion",
            "data": {
                "form_data": form_data,
                "model": models[form_data["model"]],
                "channel": channel,
                "session_id": session_id,
            },
        }
    )

    log.info(f"res: {res}")

    if res.get("status", False):
        # Define a generator to stream responses
        async def event_generator():
            nonlocal q
            try:
                while True:
                    data = await q.get()  # Wait for new messages
                    if isinstance(data, dict):
                        if "done" in data and data["done"]:
                            break  # Stop streaming when 'done' is received

                        yield f"data: {json.dumps(data)}\n\n"
                    elif isinstance(data, str):
                        if "data:" in data:
                            yield f"{data}\n\n"
                        else:
                            yield f"data: {data}\n\n"
            except Exception as e:
                log.debug(f"Error in event generator: {e}")
                pass

        # Define a background task to clean up
        async def background():
            try:
                # Clean up queue registry
                if channel in DIRECT_CHAT_QUEUES:
                    del DIRECT_CHAT_QUEUES[channel]
                    log.debug(f"Cleaned up queue for channel {channel}")

                # Clean up Redis tracking
                if WEBSOCKET_MANAGER == "redis":
                    try:
                        await REDIS.delete(f"direct_chat_active:{channel}")
                    except Exception as e:
                        log.warning(f"Failed to clean up Redis tracking: {e}")
            except Exception as e:
                log.error(f"Error in cleanup: {e}")

        # Return the streaming response
        return StreamingResponse(
            event_generator(), media_type="text/event-stream", background=background
        )
    else:
        # Clean up on failure
        if channel in DIRECT_CHAT_QUEUES:
            del DIRECT_CHAT_QUEUES[channel]
        raise Exception(str(res))

6. Update Frontend to Use Global Handler

Frontend changes needed:

Find where the frontend emits streaming chunks for direct connections and update:

// OLD:
socket.emit(channel, data)

// NEW:
socket.emit("direct-chat-stream", {
    channel: channel,
    ...data
})

Example location to check:

Frontend socket.io client code handling direct connection responses
Look for socket.emit() calls with dynamic channel names

Testing the Fix

1. Deploy with Multiple Workers

# Set environment variables
export ENABLE_WEBSOCKET_SUPPORT=true
export WEBSOCKET_MANAGER=redis
export REDIS_URL=redis://localhost:6379/0
export WEBSOCKET_REDIS_URL=redis://localhost:6379/0

# Start with multiple workers
uvicorn open_webui.main:app --host 0.0.0.0 --port 8080 --workers 4

2. Test Direct Connection Streaming

Enable direct connection mode in user settings
Configure a direct connection to a LiteLLM or other OpenAI-compatible endpoint
Send multiple concurrent chat requests
Verify all requests complete successfully without timeouts

3. Verify Cross-Worker Routing

Add temporary debug logging to confirm routing works:

# In handle_direct_chat_stream
log.info(f"Worker {os.getpid()} received stream data for channel {channel}")

# In redis_direct_chat_listener
log.info(f"Worker {os.getpid()} received Redis message for channel {channel}")

Check logs to ensure:

Messages arrive at correct worker
Redis pub/sub forwards messages when needed
No timeout errors occur

4. Load Testing

# Use Apache Bench or similar
ab -n 100 -c 10 -p request.json -T application/json \
   http://localhost:8080/api/v1/completions

Verify all 100 requests complete successfully.

References

Summary

The bug occurs because dynamic sio.on() handlers are registered per-request in the worker that handles the HTTP request, but streaming responses come through the worker that holds the WebSocket connection. With multiple workers, these are often different processes, causing messages to be silently dropped.

The fix uses a global static handler (@sio.on("direct-chat-stream")) combined with Redis pub/sub to route messages to the correct worker's queue, ensuring all workers can handle streaming responses regardless of which worker holds the WebSocket connection.

@jasonpnnl commented on GitHub (Oct 28, 2025): ## Root Cause Analysis The issue is caused by **dynamic event handler registration in `backend/open_webui/utils/chat.py:95`** that only registers handlers in the local worker process, not globally across all workers. ### Current Broken Flow **`backend/open_webui/utils/chat.py:85-98`:** ```python channel = f"{user_id}:{session_id}:{request_id}" if form_data.get("stream"): q = asyncio.Queue() async def message_listener(sid, data): await q.put(data) # BUG: This only registers the handler in the CURRENT worker! sio.on(channel, message_listener) res = await event_caller({...}) # Send RPC to browser ``` ### What Happens with Multiple Workers 1. HTTP request `POST /api/v1/completions` → **Worker A** (random routing) 2. Worker A registers handler `sio.on(channel, message_listener)` **locally in Worker A's memory** 3. Worker A sends `sio.call()` RPC to browser → routes through Redis to **Worker B** (where WebSocket lives) 4. Browser responds `{'status': True}` ✅ (RPC succeeds, returns to Worker A) 5. Browser starts streaming chunks, emits to channel `user_id:session_id:request_id` 6. Streaming chunks arrive at **Worker B** (where browser WebSocket is connected) 7. Worker B searches for handler for event `user_id:session_id:request_id` 8. ❌ **Handler only exists in Worker A!** 9. Events are silently dropped 10. Worker A's `await q.get()` waits forever 11. Request times out ### Why Other Socket.IO Events Work All other socket.io handlers use **static registration** at module load time: **`backend/open_webui/socket/main.py`:** ```python @sio.on("usage") # Registered globally when module loads @sio.on("user-join") # Registered globally when module loads @sio.on("events:channel") # Registered globally when module loads ``` These handlers exist in **all workers**, so events can be processed regardless of which worker receives them. --- ## Proposed Fix: Use Global Handler + Redis Pub/Sub Routing Replace per-request dynamic handlers with a global handler that routes messages via Redis. ### Implementation Steps #### 1. Add Global Data Structures **File: `backend/open_webui/socket/main.py`** Add at module level, after `sio` initialization: ```python # Global registry for direct chat queues (per-worker, in-memory) DIRECT_CHAT_QUEUES = {} # {channel: asyncio.Queue} # Redis-backed distributed tracking (optional, for monitoring) if WEBSOCKET_MANAGER == "redis": DIRECT_CHAT_CHANNELS = RedisDict( redis=REDIS, prefix=f"{REDIS_KEY_PREFIX}:direct_chat_channels", fallback={} ) else: DIRECT_CHAT_CHANNELS = {} ``` #### 2. Add Global Socket.IO Handler **File: `backend/open_webui/socket/main.py`** Add new global handler: ```python @sio.on("direct-chat-stream") async def handle_direct_chat_stream(sid, data): """ Global handler for direct chat streaming chunks. Routes messages to the correct queue based on channel. Works across multiple workers via Redis pub/sub. """ channel = data.get("channel") if not channel: log.warning("Received direct-chat-stream without channel") return # Check if this worker has the queue for this channel if channel in DIRECT_CHAT_QUEUES: try: await DIRECT_CHAT_QUEUES[channel].put(data) log.debug(f"Queued data for channel {channel} on local worker") except Exception as e: log.error(f"Error queueing data for channel {channel}: {e}") else: # Queue is on another worker, forward via Redis pub/sub if WEBSOCKET_MANAGER == "redis": try: await REDIS.publish( f"direct_chat_channel:{channel}", json.dumps(data) ) log.debug(f"Published data for channel {channel} to Redis") except Exception as e: log.error(f"Error publishing to Redis for channel {channel}: {e}") else: log.warning(f"Channel {channel} not found and Redis not available") ``` #### 3. Add Redis Pub/Sub Listener **File: `backend/open_webui/socket/main.py`** Add new async function: ```python async def redis_direct_chat_listener(): """ Listen for direct chat messages published from other workers. Routes them to local queues if present. """ if WEBSOCKET_MANAGER != "redis": return try: pubsub = REDIS.pubsub() await pubsub.psubscribe("direct_chat_channel:*") log.info("Started Redis direct chat listener") async for message in pubsub.listen(): if message["type"] != "pmessage": continue try: # Extract channel from Redis key redis_channel = message["channel"] if isinstance(redis_channel, bytes): redis_channel = redis_channel.decode() channel = redis_channel.replace("direct_chat_channel:", "") # Check if this worker has the queue if channel in DIRECT_CHAT_QUEUES: data = json.loads(message["data"]) await DIRECT_CHAT_QUEUES[channel].put(data) log.debug(f"Received and queued Redis message for channel {channel}") except Exception as e: log.error(f"Error processing Redis message: {e}") except Exception as e: log.error(f"Redis direct chat listener error: {e}") finally: try: await pubsub.unsubscribe() except: pass ``` #### 4. Start Redis Listener on Startup **File: `backend/open_webui/main.py`** In the startup event handler: ```python @app.on_event("startup") async def startup_event(): # ... existing startup code ... # Start Redis listener for direct chat routing if WEBSOCKET_MANAGER == "redis": asyncio.create_task(redis_direct_chat_listener()) log.info("Started Redis direct chat listener task") ``` #### 5. Modify Direct Chat Completion Function **File: `backend/open_webui/utils/chat.py`** **Import the global registry at the top:** ```python from open_webui.socket.main import ( sio, get_event_call, get_event_emitter, DIRECT_CHAT_QUEUES, # Add this import ) ``` **Replace lines 85-108 (the streaming handler setup):** ```python if form_data.get("stream"): q = asyncio.Queue() channel = f"{user_id}:{session_id}:{request_id}" # Register queue in local registry (NO dynamic handler registration!) DIRECT_CHAT_QUEUES[channel] = q # Optional: Track in Redis for monitoring/debugging if WEBSOCKET_MANAGER == "redis": try: await REDIS.hset( f"direct_chat_active:{channel}", "worker_pid", str(os.getpid()), ex=300 # 5 minute TTL ) except Exception as e: log.warning(f"Failed to track channel in Redis: {e}") # Start processing chat completion in background res = await event_caller( { "type": "request:chat:completion", "data": { "form_data": form_data, "model": models[form_data["model"]], "channel": channel, "session_id": session_id, }, } ) log.info(f"res: {res}") if res.get("status", False): # Define a generator to stream responses async def event_generator(): nonlocal q try: while True: data = await q.get() # Wait for new messages if isinstance(data, dict): if "done" in data and data["done"]: break # Stop streaming when 'done' is received yield f"data: {json.dumps(data)}\n\n" elif isinstance(data, str): if "data:" in data: yield f"{data}\n\n" else: yield f"data: {data}\n\n" except Exception as e: log.debug(f"Error in event generator: {e}") pass # Define a background task to clean up async def background(): try: # Clean up queue registry if channel in DIRECT_CHAT_QUEUES: del DIRECT_CHAT_QUEUES[channel] log.debug(f"Cleaned up queue for channel {channel}") # Clean up Redis tracking if WEBSOCKET_MANAGER == "redis": try: await REDIS.delete(f"direct_chat_active:{channel}") except Exception as e: log.warning(f"Failed to clean up Redis tracking: {e}") except Exception as e: log.error(f"Error in cleanup: {e}") # Return the streaming response return StreamingResponse( event_generator(), media_type="text/event-stream", background=background ) else: # Clean up on failure if channel in DIRECT_CHAT_QUEUES: del DIRECT_CHAT_QUEUES[channel] raise Exception(str(res)) ``` #### 6. Update Frontend to Use Global Handler **Frontend changes needed:** Find where the frontend emits streaming chunks for direct connections and update: ```javascript // OLD: socket.emit(channel, data) // NEW: socket.emit("direct-chat-stream", { channel: channel, ...data }) ``` **Example location to check:** - Frontend socket.io client code handling direct connection responses - Look for `socket.emit()` calls with dynamic channel names --- ## Testing the Fix ### 1. Deploy with Multiple Workers ```bash # Set environment variables export ENABLE_WEBSOCKET_SUPPORT=true export WEBSOCKET_MANAGER=redis export REDIS_URL=redis://localhost:6379/0 export WEBSOCKET_REDIS_URL=redis://localhost:6379/0 # Start with multiple workers uvicorn open_webui.main:app --host 0.0.0.0 --port 8080 --workers 4 ``` ### 2. Test Direct Connection Streaming 1. Enable direct connection mode in user settings 2. Configure a direct connection to a LiteLLM or other OpenAI-compatible endpoint 3. Send multiple concurrent chat requests 4. Verify all requests complete successfully without timeouts ### 3. Verify Cross-Worker Routing Add temporary debug logging to confirm routing works: ```python # In handle_direct_chat_stream log.info(f"Worker {os.getpid()} received stream data for channel {channel}") # In redis_direct_chat_listener log.info(f"Worker {os.getpid()} received Redis message for channel {channel}") ``` Check logs to ensure: - Messages arrive at correct worker - Redis pub/sub forwards messages when needed - No timeout errors occur ### 4. Load Testing ```bash # Use Apache Bench or similar ab -n 100 -c 10 -p request.json -T application/json \ http://localhost:8080/api/v1/completions ``` Verify all 100 requests complete successfully. --- ## References - [Socket.IO Multi-Worker Documentation](https://socket.io/docs/v4/using-multiple-nodes/) - [Python Socket.IO with Redis](https://python-socketio.readthedocs.io/en/latest/server.html#emitting-from-external-processes) - [Redis Pub/Sub Pattern](https://redis.io/docs/manual/pubsub/) --- ## Summary The bug occurs because dynamic `sio.on()` handlers are registered per-request in the worker that handles the HTTP request, but streaming responses come through the worker that holds the WebSocket connection. With multiple workers, these are often different processes, causing messages to be silently dropped. The fix uses a global static handler (`@sio.on("direct-chat-stream")`) combined with Redis pub/sub to route messages to the correct worker's queue, ensuring all workers can handle streaming responses regardless of which worker holds the WebSocket connection.

GiteaMirror commented

2025-11-11 16:26:02 -06:00

@jasonpnnl commented on GitHub (Oct 28, 2025):

This is an AI suggestion for what is causing the problem that seems plausible from what I understand. I can confirm that having multiple uvicorn workers causes the described issue, but have not verified the above root cause or suggested solution. I posted it in case it helps a dev find and fix the issue.

@jasonpnnl commented on GitHub (Oct 28, 2025): This is an AI suggestion for what is causing the problem that seems plausible from what I understand. I can confirm that having multiple uvicorn workers causes the described issue, but have not verified the above root cause or suggested solution. I posted it in case it helps a dev find and fix the issue.

GiteaMirror referenced this issue

2025-11-11 17:58:25 -06:00

[PR #5601] [MERGED] fix: Extend num predict param #8510

GiteaMirror referenced this issue

2026-04-20 03:40:31 -05:00

[PR #5601] [MERGED] fix: Extend num predict param #21714

GiteaMirror referenced this issue

2026-04-25 10:52:16 -05:00

[PR #5601] [MERGED] fix: Extend num predict param #37344

GiteaMirror referenced this issue

2026-04-29 18:57:34 -05:00

[PR #5601] [MERGED] fix: Extend num predict param #44762