[GH-ISSUE #23751] perf: MCP tool server reconnects on every message causing 15-20s silent delay #58727

Closed
opened 2026-05-05 23:47:38 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @DSavaliya-gh on GitHub (Apr 15, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23751

Bug Description

When an MCP (Model Context Protocol) tool server is enabled in Open WebUI, every chat message—including follow-ups in the same conversation—triggers a full MCP connection lifecycle from scratch:

  1. MCPClient() — new instance allocated
  2. .connect() — TCP handshake + TLS + HTTP session + MCP initialize() protocol exchange (10 s timeout cap)
  3. .list_tool_specs()list_tools() round-trip to enumerate all tool definitions

This entire sequence runs before the model API call is made and no event_emitter status event is emitted during the wait. The user sees a blinking dot with zero feedback for 15–20 seconds, then the model starts thinking.

Steps to Reproduce

  1. Add any MCP tool server in Admin Panel → Settings → Tools (bearer auth)
  2. Assign that server to a model
  3. Start a chat and send any message
  4. Send a second message in the same conversation — same 15-20 s delay repeats

Expected Behavior

  • On warm follow-up messages, reuse the existing connection — delay should be <1 s
  • During any unavoidable connection wait, show a visible status like "Connecting to [server]..."

Actual Behavior

  • 15–20 second silent blinking dot on every message, including follow-ups
  • No user-visible progress indicator during the MCP handshake phase

Root Cause

backend/open_webui/utils/middleware.py inside process_chat_payload:

mcp_clients[server_id] = MCPClient()          # always a brand-new instance
await mcp_clients[server_id].connect(...)     # full handshake on every message
tool_specs = await mcp_clients[server_id].list_tool_specs()  # list_tools round-trip

No event_emitter call is made before this block — the frontend has nothing to render.

Proposed Fix

Two complementary changes to middleware.py:

1. Status events (UX — visible immediately)

Emit type: 'status' events bracketing the connect call so the existing StatusHistory component shows activity:

await event_emitter({"type": "status", "data": {"action": "mcp_connect", "description": f"Connecting to '{server_id}'...", "done": False}})
# ... connect + list_tool_specs ...
await event_emitter({"type": "status", "data": {"action": "mcp_connect", "description": f"Connected to '{server_id}'", "done": True}})

2. Connection pool for bearer-auth servers (performance)

Cache the MCPClient on app.state.mcp_client_pool, keyed by server_id, when auth_type == 'bearer' (static credentials that do not change per-user). Per-user auth types (session, oauth_2.1) are explicitly excluded from pooling to preserve security.

Stale pool entries are evicted automatically on connection errors.

Security Consideration

Only static bearer credentials are pooled. Per-user OAuth and session tokens are never cached — those paths always get a fresh connection.

Linked Fix

A ready-to-review PR is available at: https://github.com/DSavaliya-gh/open-webui/tree/fix/mcp-connection-pool-status-events

Originally created by @DSavaliya-gh on GitHub (Apr 15, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23751 ## Bug Description When an MCP (Model Context Protocol) tool server is enabled in Open WebUI, every chat message—including follow-ups in the same conversation—triggers a **full MCP connection lifecycle from scratch**: 1. `MCPClient()` — new instance allocated 2. `.connect()` — TCP handshake + TLS + HTTP session + MCP `initialize()` protocol exchange (10 s timeout cap) 3. `.list_tool_specs()` — `list_tools()` round-trip to enumerate all tool definitions This entire sequence runs **before the model API call is made** and **no `event_emitter` status event is emitted** during the wait. The user sees a blinking dot with zero feedback for **15–20 seconds**, then the model starts thinking. ## Steps to Reproduce 1. Add any MCP tool server in **Admin Panel → Settings → Tools** (bearer auth) 2. Assign that server to a model 3. Start a chat and send any message 4. Send a second message in the same conversation — same 15-20 s delay repeats ## Expected Behavior - On warm follow-up messages, reuse the existing connection — delay should be <1 s - During any unavoidable connection wait, show a visible status like *"Connecting to [server]..."* ## Actual Behavior - 15–20 second silent blinking dot on **every** message, including follow-ups - No user-visible progress indicator during the MCP handshake phase ## Root Cause `backend/open_webui/utils/middleware.py` inside `process_chat_payload`: ```python mcp_clients[server_id] = MCPClient() # always a brand-new instance await mcp_clients[server_id].connect(...) # full handshake on every message tool_specs = await mcp_clients[server_id].list_tool_specs() # list_tools round-trip ``` No `event_emitter` call is made before this block — the frontend has nothing to render. ## Proposed Fix Two complementary changes to `middleware.py`: ### 1. Status events (UX — visible immediately) Emit `type: 'status'` events bracketing the connect call so the existing `StatusHistory` component shows activity: ```python await event_emitter({"type": "status", "data": {"action": "mcp_connect", "description": f"Connecting to '{server_id}'...", "done": False}}) # ... connect + list_tool_specs ... await event_emitter({"type": "status", "data": {"action": "mcp_connect", "description": f"Connected to '{server_id}'", "done": True}}) ``` ### 2. Connection pool for bearer-auth servers (performance) Cache the `MCPClient` on `app.state.mcp_client_pool`, keyed by `server_id`, when `auth_type == 'bearer'` (static credentials that do not change per-user). Per-user auth types (`session`, `oauth_2.1`) are explicitly excluded from pooling to preserve security. Stale pool entries are evicted automatically on connection errors. ## Security Consideration Only **static** bearer credentials are pooled. Per-user OAuth and session tokens are never cached — those paths always get a fresh connection. ## Linked Fix A ready-to-review PR is available at: https://github.com/DSavaliya-gh/open-webui/tree/fix/mcp-connection-pool-status-events
Author
Owner

@0xbrainkid commented on GitHub (Apr 15, 2026):

Reconnecting on every message is a significant latency cost — 15-20s per message is non-interactive even with a fast LLM. The root cause is worth diagnosing: is it a missing session keep-alive (the server closes idle connections), a client that does not reuse connections (always creates fresh), or a load balancer that does not support persistent connections?

From an agent identity perspective, reconnect-on-every-message is also an identity cost. Each reconnect involves a fresh MCP initialize handshake, which re-establishes the client-server session but does not carry any identity context from the previous session. An MCP server that performs identity verification at session initialization (checking X-Agent-ID, verifying attestation, looking up behavioral trust score) repeats this work on every message — which is wasteful and also prevents building a within-session behavioral track record.

Two improvements that would help:

1. Persistent session with identity binding at init. If the session is kept alive, the identity verification happens once at initialize and the result is cached for the session lifetime. The session ID becomes an identity token — holding the session is equivalent to holding the identity.

2. Session ID as identity-bound token. When a fresh session is created, the server binds the verified agent identity to the session ID. If the connection drops and the client reconnects with the same session ID (within a TTL), the server restores the identity context without re-verification. This makes reconnect fast while preserving identity continuity — the session ID carries the identity proof.

The 15-20s delay suggests the reconnect is doing a full initialization including any tool discovery or capability negotiation — persisting that state across reconnects would help too.

<!-- gh-comment-id:4253232568 --> @0xbrainkid commented on GitHub (Apr 15, 2026): Reconnecting on every message is a significant latency cost — 15-20s per message is non-interactive even with a fast LLM. The root cause is worth diagnosing: is it a missing session keep-alive (the server closes idle connections), a client that does not reuse connections (always creates fresh), or a load balancer that does not support persistent connections? From an agent identity perspective, reconnect-on-every-message is also an identity cost. Each reconnect involves a fresh MCP `initialize` handshake, which re-establishes the client-server session but does not carry any identity context from the previous session. An MCP server that performs identity verification at session initialization (checking `X-Agent-ID`, verifying attestation, looking up behavioral trust score) repeats this work on every message — which is wasteful and also prevents building a within-session behavioral track record. Two improvements that would help: **1. Persistent session with identity binding at init.** If the session is kept alive, the identity verification happens once at `initialize` and the result is cached for the session lifetime. The session ID becomes an identity token — holding the session is equivalent to holding the identity. **2. Session ID as identity-bound token.** When a fresh session is created, the server binds the verified agent identity to the session ID. If the connection drops and the client reconnects with the same session ID (within a TTL), the server restores the identity context without re-verification. This makes reconnect fast while preserving identity continuity — the session ID carries the identity proof. The 15-20s delay suggests the reconnect is doing a full initialization including any tool discovery or capability negotiation — persisting that state across reconnects would help too.
Author
Owner

@Classic298 commented on GitHub (Apr 15, 2026):

STOP OPENING / SPAMMING THE ISSUES SECTION

Final warning

<!-- gh-comment-id:4253233342 --> @Classic298 commented on GitHub (Apr 15, 2026): STOP OPENING / SPAMMING THE ISSUES SECTION Final warning
Author
Owner

@Classic298 commented on GitHub (Apr 15, 2026):

@0xbrainkid bot?

<!-- gh-comment-id:4253236434 --> @Classic298 commented on GitHub (Apr 15, 2026): @0xbrainkid bot?
Author
Owner

@Classic298 commented on GitHub (Apr 15, 2026):

@DSavaliya-gh besides my warning from stop spamming the issues, also stop spamming PRs and READ THE PR TEMPLATE. We will agressively close all automated PRs or PRs that do not follow the PR template

<!-- gh-comment-id:4253245196 --> @Classic298 commented on GitHub (Apr 15, 2026): @DSavaliya-gh besides my warning from stop spamming the issues, also stop spamming PRs and READ THE PR TEMPLATE. We will agressively close all automated PRs or PRs that do not follow the PR template
Author
Owner

@Classic298 commented on GitHub (Apr 15, 2026):

Also my slop alarm is going off here. 15-20 seconds for one MCP connection is widly absurd.

The initialize call has a fail_after(10) timeout (line 73 in client.py), so the theoretical maximum for a single connection is ~10 seconds before it'd hard-fail

So 15-20 seconds is totally absolutely theoretically and practically impossible

<!-- gh-comment-id:4253276384 --> @Classic298 commented on GitHub (Apr 15, 2026): Also my slop alarm is going off here. 15-20 seconds for one MCP connection is widly absurd. The initialize call has a fail_after(10) timeout (line 73 in client.py), so the theoretical maximum for a single connection is ~10 seconds before it'd hard-fail So 15-20 seconds is totally absolutely theoretically and practically impossible
Author
Owner

@Classic298 commented on GitHub (Apr 15, 2026):

HTTP-based MCP streamable connection to a bearer-auth server should take well under 1 second. It's an HTTP request, not a TCP+TLS+MCP handshake taking 15-20s

<!-- gh-comment-id:4253285954 --> @Classic298 commented on GitHub (Apr 15, 2026): HTTP-based MCP streamable connection to a bearer-auth server should take well under 1 second. It's an HTTP request, not a TCP+TLS+MCP handshake taking 15-20s
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58727