[PR #24105] fix(mcp): fix response discarded when MCP cleanup crashes in process_chat finally block #43137

Open
opened 2026-04-25 14:49:26 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/24105
Author: @looselyhuman
Created: 4/24/2026
Status: 🔄 Open

Base: devHead: gaia-patch-2


📝 Commits (10+)

📊 Changes

1 file changed (+15 additions, -12 deletions)

View changed files

📝 backend/open_webui/main.py (+15 -12)

📄 Description

Pull Request Checklist

  • Target branch: This PR targets the dev branch.
  • Description: Provided below.
  • Changelog: Included below.
  • Documentation: No user-facing behavior, environment variables, or public API changes requiring docs updates.
  • Dependencies: No new or upgraded dependencies.
  • Testing: Manual testing performed and described below.
  • Agentic AI Code: I personally reviewed and manually tested all changes in this PR.
  • Code review: Self-review completed; changes follow project coding standards.
  • Design & Architecture: Single focused bug fix; no new settings or architectural changes.
  • Git Hygiene: Single logical change, rebased on dev.
  • Title Prefix: fix prefix used.

Problem

When native MCP function calling completes successfully, the chat endpoint sometimes returns 500 Internal Server Error with "No response returned." — despite the LLM having produced a valid response. The completed response is silently discarded.

Root Cause

The finally block in chat_completion cleaned up MCP clients using:

await asyncio.wait_for(asyncio.shield(_cleanup_mcp()), timeout=10.0)

asyncio.wait_for() and asyncio.shield() both create new asyncio Tasks. The MCPClient's exit stack contains anyio resources (streamable_http transport) that use anyio cancel scopes. anyio cancel scopes are owned by the task that entered them — exiting them from a different task raises:

Attempted to exit a cancel scope that isn't the current task's current cancel scope

This is a BaseException, not an Exception. It propagates through the finally block, overwrites the return value of the already-completed process_chat coroutine, and surfaces as a 500 with an empty body.

Fix

Replace the asyncio.wait_for(asyncio.shield(...)) wrapper with a plain loop that calls client.disconnect() directly in the current task. MCPClient.disconnect() already catches BaseException internally (see companion PR #24104 fixing client.py), so no wrapper is needed here. An outer except BaseException guards against any unexpected escapes.


Changelog Entry

Description

Bug fix for MCP client cleanup in process_chat discarding a valid LLM response and returning 500 due to anyio cancel scope violations in the finally block.

Fixed

  • MCP cleanup in the process_chat finally block no longer uses asyncio.wait_for/asyncio.shield, which spawned child tasks that violated anyio cancel scope ownership and caused BaseException to propagate through the finally block, overwriting a valid completed response with a 500.

Testing

I tested this on my self-hosted OpenWebUI instance running behind a Cloudflare reverse proxy tunnel, with an MCP server (FastMCP, stateless HTTP transport, Bearer token auth) connected via the Tool Servers UI. With the model set to Gemma4-26b via local Ollama and function_calling: native, I sent chat messages with a tool selected. Before this fix (but after applying PR #24104), successful tool calls would occasionally result in a 500 response being returned to the client even though the LLM had produced a valid answer. After applying both PR #24104 and this fix, the chat endpoint returns the correct response consistently. Tested on the non-streaming path (stream: false via direct API call) specifically.


Contributor License Agreement


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/24105 **Author:** [@looselyhuman](https://github.com/looselyhuman) **Created:** 4/24/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `gaia-patch-2` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`6f1486f`](https://github.com/open-webui/open-webui/commit/6f1486ffd0cb288d0e21f41845361924e0d742b3) Merge pull request #19466 from open-webui/dev - [`d95f533`](https://github.com/open-webui/open-webui/commit/d95f533214e3fe5beb5e41ec1f349940bc4c7043) Merge pull request #19729 from open-webui/dev - [`a727153`](https://github.com/open-webui/open-webui/commit/a7271532f8a38da46785afcaa7e65f9a45e7d753) 0.6.43 (#20093) - [`6adde20`](https://github.com/open-webui/open-webui/commit/6adde203cd292a9e3af9c64a2ae36b603fed096a) Merge pull request #20394 from open-webui/dev - [`f9b0534`](https://github.com/open-webui/open-webui/commit/f9b0534e0c442631d1cb7205169588b9b6204179) Merge pull request #20522 from open-webui/dev ### 📊 Changes **1 file changed** (+15 additions, -12 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/main.py` (+15 -12) </details> ### 📄 Description # Pull Request Checklist - [x] **Target branch:** This PR targets the `dev` branch. - [x] **Description:** Provided below. - [x] **Changelog:** Included below. - [ ] **Documentation:** No user-facing behavior, environment variables, or public API changes requiring docs updates. - [ ] **Dependencies:** No new or upgraded dependencies. - [x] **Testing:** Manual testing performed and described below. - [x] **Agentic AI Code:** I personally reviewed and manually tested all changes in this PR. - [x] **Code review:** Self-review completed; changes follow project coding standards. - [x] **Design & Architecture:** Single focused bug fix; no new settings or architectural changes. - [x] **Git Hygiene:** Single logical change, rebased on `dev`. - [x] **Title Prefix:** `fix` prefix used. --- ## Problem When native MCP function calling completes successfully, the chat endpoint sometimes returns `500 Internal Server Error` with `"No response returned."` — despite the LLM having produced a valid response. The completed response is silently discarded. ## Root Cause The `finally` block in `chat_completion` cleaned up MCP clients using: ```python await asyncio.wait_for(asyncio.shield(_cleanup_mcp()), timeout=10.0) ``` `asyncio.wait_for()` and `asyncio.shield()` both create **new asyncio Tasks**. The `MCPClient`'s exit stack contains anyio resources (`streamable_http` transport) that use anyio cancel scopes. anyio cancel scopes are owned by the task that entered them — exiting them from a different task raises: ``` Attempted to exit a cancel scope that isn't the current task's current cancel scope ``` This is a `BaseException`, not an `Exception`. It propagates through the `finally` block, overwrites the return value of the already-completed `process_chat` coroutine, and surfaces as a 500 with an empty body. ## Fix Replace the `asyncio.wait_for(asyncio.shield(...))` wrapper with a plain loop that calls `client.disconnect()` directly in the current task. `MCPClient.disconnect()` already catches `BaseException` internally (see companion PR #24104 fixing `client.py`), so no wrapper is needed here. An outer `except BaseException` guards against any unexpected escapes. --- # Changelog Entry ### Description Bug fix for MCP client cleanup in `process_chat` discarding a valid LLM response and returning 500 due to anyio cancel scope violations in the `finally` block. ### Fixed - MCP cleanup in the `process_chat` `finally` block no longer uses `asyncio.wait_for`/`asyncio.shield`, which spawned child tasks that violated anyio cancel scope ownership and caused `BaseException` to propagate through the `finally` block, overwriting a valid completed response with a 500. --- ## Testing I tested this on my self-hosted OpenWebUI instance running behind a Cloudflare reverse proxy tunnel, with an MCP server (FastMCP, stateless HTTP transport, Bearer token auth) connected via the Tool Servers UI. With the model set to Gemma4-26b via local Ollama and `function_calling: native`, I sent chat messages with a tool selected. Before this fix (but after applying PR #24104), successful tool calls would occasionally result in a 500 response being returned to the client even though the LLM had produced a valid answer. After applying both PR #24104 and this fix, the chat endpoint returns the correct response consistently. Tested on the non-streaming path (`stream: false` via direct API call) specifically. --- ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 14:49:26 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#43137