[PR #24104] fix(mcp): fix verify endpoint 500 — replace asyncio.wait_for/shield with BaseException handler in MCPClient.disconnect() #43136

Open
opened 2026-04-25 14:49:23 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/24104
Author: @looselyhuman
Created: 4/24/2026
Status: 🔄 Open

Base: devHead: gaia-patch-1


📝 Commits (10+)

📊 Changes

1 file changed (+22 additions, -17 deletions)

View changed files

📝 backend/open_webui/utils/mcp/client.py (+22 -17)

📄 Description

Pull Request Checklist

  • Target branch: This PR targets the dev branch.
  • Description: Provided below.
  • Changelog: Included below.
  • Documentation: No user-facing behavior, environment variables, or public API changes requiring docs updates.
  • Dependencies: No new or upgraded dependencies.
  • Testing: Manual testing performed and described below.
  • Agentic AI Code: I personally reviewed and manually tested all changes in this PR.
  • Code review: Self-review completed; changes follow project coding standards.
  • Design & Architecture: Single focused bug fix; no new settings or architectural changes.
  • Git Hygiene: Single logical change, rebased on dev.
  • Title Prefix: fix prefix used.

Problem

Calling the MCP verify endpoint (or any path that triggers MCPClient.disconnect()) intermittently returns HTTP 500 with an empty/partial manifest. The error trace shows a BaseExceptionGroup escaping, which Starlette catches and converts to a 500 response.

Root Cause

The previous disconnect() implementation wrapped exit_stack.aclose() with asyncio.wait_for(asyncio.shield(...)). Both asyncio.wait_for() and asyncio.shield() create a new asyncio Task to run the coroutine. The MCPClient's exit stack holds anyio resources (the streamable_http transport) that use anyio cancel scopes.

anyio cancel scopes are owned by the task that entered them. When exit_stack.aclose() runs in a different task (the one created by wait_for/shield), anyio raises:

Attempted to exit a cancel scope that isn't the current task's current
cancel scope

This is a BaseException, not an Exception, so it bypasses the except Exception handler and escapes as a BaseExceptionGroup → Starlette returns 500.

Fix

  • Remove asyncio.wait_for / asyncio.shield wrapper — call exit_stack.aclose() directly in the original task.
  • Change except Exception to except BaseException so any remaining errors from the anyio transport internals (cancelled generators, cancel scope mismatches) are caught and logged rather than propagated.
  • Update the docstring to document why asyncio.wait_for, anyio.fail_after, and asyncio.shield are all unsafe in this context.

Changelog Entry

Description

Bug fix for MCPClient.disconnect() causing 500 errors on the MCP verify endpoint when using anyio-based transports.

Fixed

  • MCPClient.disconnect() no longer spawns a child task via asyncio.wait_for/asyncio.shield, which caused anyio cancel scope ownership violations and intermittent 500 responses on the verify endpoint.
  • Broadened exception handler from except Exception to except BaseException to catch anyio transport cleanup errors that would otherwise escape.

Testing

I tested this on my self-hosted OpenWebUI instance running behind a Cloudflare reverse proxy tunnel. I connected an MCP server using FastMCP with stateless HTTP transport, authenticated via Bearer token, through the Admin → Tool Servers UI. Before this fix, clicking "Verify" on the tool server would intermittently return 500 with an empty or partial manifest. After applying this fix, the verify endpoint consistently returns 200 with all tool schemas populated. I repeated the verify operation multiple times to confirm the fix was stable.


Contributor License Agreement


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/24104 **Author:** [@looselyhuman](https://github.com/looselyhuman) **Created:** 4/24/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `gaia-patch-1` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`6f1486f`](https://github.com/open-webui/open-webui/commit/6f1486ffd0cb288d0e21f41845361924e0d742b3) Merge pull request #19466 from open-webui/dev - [`d95f533`](https://github.com/open-webui/open-webui/commit/d95f533214e3fe5beb5e41ec1f349940bc4c7043) Merge pull request #19729 from open-webui/dev - [`a727153`](https://github.com/open-webui/open-webui/commit/a7271532f8a38da46785afcaa7e65f9a45e7d753) 0.6.43 (#20093) - [`6adde20`](https://github.com/open-webui/open-webui/commit/6adde203cd292a9e3af9c64a2ae36b603fed096a) Merge pull request #20394 from open-webui/dev - [`f9b0534`](https://github.com/open-webui/open-webui/commit/f9b0534e0c442631d1cb7205169588b9b6204179) Merge pull request #20522 from open-webui/dev ### 📊 Changes **1 file changed** (+22 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/utils/mcp/client.py` (+22 -17) </details> ### 📄 Description # Pull Request Checklist - [x] **Target branch:** This PR targets the `dev` branch. - [x] **Description:** Provided below. - [x] **Changelog:** Included below. - [ ] **Documentation:** No user-facing behavior, environment variables, or public API changes requiring docs updates. - [ ] **Dependencies:** No new or upgraded dependencies. - [x] **Testing:** Manual testing performed and described below. - [x] **Agentic AI Code:** I personally reviewed and manually tested all changes in this PR. - [x] **Code review:** Self-review completed; changes follow project coding standards. - [x] **Design & Architecture:** Single focused bug fix; no new settings or architectural changes. - [x] **Git Hygiene:** Single logical change, rebased on `dev`. - [x] **Title Prefix:** `fix` prefix used. --- ## Problem Calling the MCP verify endpoint (or any path that triggers `MCPClient.disconnect()`) intermittently returns HTTP 500 with an empty/partial manifest. The error trace shows a `BaseExceptionGroup` escaping, which Starlette catches and converts to a 500 response. ## Root Cause The previous `disconnect()` implementation wrapped `exit_stack.aclose()` with `asyncio.wait_for(asyncio.shield(...))`. Both `asyncio.wait_for()` and `asyncio.shield()` create a **new asyncio Task** to run the coroutine. The `MCPClient`'s exit stack holds anyio resources (the `streamable_http` transport) that use anyio cancel scopes. anyio cancel scopes are owned by the task that entered them. When `exit_stack.aclose()` runs in a **different task** (the one created by `wait_for`/`shield`), anyio raises: ``` Attempted to exit a cancel scope that isn't the current task's current cancel scope ``` This is a `BaseException`, not an `Exception`, so it bypasses the `except Exception` handler and escapes as a `BaseExceptionGroup` → Starlette returns 500. ## Fix - Remove `asyncio.wait_for` / `asyncio.shield` wrapper — call `exit_stack.aclose()` directly in the original task. - Change `except Exception` to `except BaseException` so any remaining errors from the anyio transport internals (cancelled generators, cancel scope mismatches) are caught and logged rather than propagated. - Update the docstring to document why `asyncio.wait_for`, `anyio.fail_after`, and `asyncio.shield` are all unsafe in this context. --- # Changelog Entry ### Description Bug fix for `MCPClient.disconnect()` causing 500 errors on the MCP verify endpoint when using anyio-based transports. ### Fixed - `MCPClient.disconnect()` no longer spawns a child task via `asyncio.wait_for`/`asyncio.shield`, which caused anyio cancel scope ownership violations and intermittent 500 responses on the verify endpoint. - Broadened exception handler from `except Exception` to `except BaseException` to catch anyio transport cleanup errors that would otherwise escape. --- ## Testing I tested this on my self-hosted OpenWebUI instance running behind a Cloudflare reverse proxy tunnel. I connected an MCP server using FastMCP with stateless HTTP transport, authenticated via Bearer token, through the Admin → Tool Servers UI. Before this fix, clicking "Verify" on the tool server would intermittently return 500 with an empty or partial manifest. After applying this fix, the verify endpoint consistently returns 200 with all tool schemas populated. I repeated the verify operation multiple times to confirm the fix was stable. --- ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 14:49:23 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#43136