[PR #24106] fix(mcp): execute MCP tool calls server-side in non-streaming native function calling mode #43138

Open
opened 2026-04-25 14:49:28 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/24106
Author: @looselyhuman
Created: 4/24/2026
Status: 🔄 Open

Base: devHead: gaia-patch-3


📝 Commits (10+)

📊 Changes

1 file changed (+170 additions, -0 deletions)

View changed files

📝 backend/open_webui/utils/middleware.py (+170 -0)

📄 Description

Pull Request Checklist

  • Target branch: This PR targets the dev branch.
  • Description: Provided below.
  • Changelog: Included below.
  • Documentation: No user-facing behavior, environment variables, or public API changes requiring docs updates.
  • Dependencies: No new or upgraded dependencies.
  • Testing: Manual testing performed and described below.
  • Agentic AI Code: I personally reviewed and manually tested all changes in this PR.
  • Code review: Self-review completed; changes follow project coding standards.
  • Design & Architecture: Single focused bug fix; no new settings or architectural changes.
  • Git Hygiene: Single logical change, rebased on dev.
  • Title Prefix: fix prefix used.

Problem

When using native function calling (function_calling=native) with a non-streaming LLM response (e.g. Ollama/Gemma), MCP tool calls are never executed. The model returns finish_reason=tool_calls but the tools are never invoked — the MCP server only ever receives ListToolsRequest, never CallToolRequest. The model's final answer is therefore ungrounded and ignores the tool results entirely.

Additionally, when Ollama prefixes tool names with the server name (e.g. "oracle:oracle_vault_ask"), the tools_dict lookup fails silently in the streaming path as well.

Root Cause

The tool-execution loop in middleware.py lives entirely inside streaming_chat_response_handler. non_streaming_chat_response_handler only inspects choices[0].message.content and has no code path for finish_reason=tool_calls. Tool calls in non-streaming responses are silently dropped.

Fix

  • Add _execute_tool_calls_non_streaming() helper that:
    1. Extracts tool_calls from the non-streaming LLM response.
    2. Resolves each tool name from metadata['tools'], stripping Ollama's servername: prefix if present.
    3. Executes the tool — either mcp_client.call_tool() for MCP tools or calls the builtin callable directly.
    4. Appends the tool results as role=tool messages and makes a second LLM call to produce the final grounded answer.
  • Call _execute_tool_calls_non_streaming() from non_streaming_chat_response_handler whenever function_calling==native and tool_calls are present.
  • Apply the same Ollama prefix-normalization (servername:toolnametoolname) in the streaming path's tool-execution loop so both paths handle prefixed names consistently.

Changelog Entry

Description

Bug fix for native function calling in non-streaming mode: MCP tool calls returned by the LLM were silently dropped because non_streaming_chat_response_handler had no tool-execution path.

Fixed

  • Native function calling now works in non-streaming mode: when finish_reason=tool_calls is present in a non-streaming response, tool calls are executed and a second LLM call is made to ground the final answer in the tool results.
  • Tool name prefix normalization (stripping Ollama's servername:toolname prefix) is now applied in both the streaming and non-streaming paths.

Testing

I tested this end-to-end on my self-hosted OpenWebUI instance running behind a Cloudflare reverse proxy tunnel. Setup: MCP server using FastMCP with stateless HTTP transport, authenticated via Bearer token, connected through Admin → Tool Servers UI. Model: Gemma4-26b via local Ollama. Settings: function_calling: native, reasoning_effort: none.

I sent chat messages with an MCP tool selected. Before this fix, the MCP server logs showed only ListToolsRequest — tools were never called. After applying this fix (along with PRs #24104 and #24105), I confirmed via the MCP server logs that CallToolRequest was received after the model emitted a tool call, and the model's final reply was grounded in the actual tool results. Tested specifically on the non-streaming path (stream: false via direct API call).


Contributor License Agreement


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/24106 **Author:** [@looselyhuman](https://github.com/looselyhuman) **Created:** 4/24/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `gaia-patch-3` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`6f1486f`](https://github.com/open-webui/open-webui/commit/6f1486ffd0cb288d0e21f41845361924e0d742b3) Merge pull request #19466 from open-webui/dev - [`d95f533`](https://github.com/open-webui/open-webui/commit/d95f533214e3fe5beb5e41ec1f349940bc4c7043) Merge pull request #19729 from open-webui/dev - [`a727153`](https://github.com/open-webui/open-webui/commit/a7271532f8a38da46785afcaa7e65f9a45e7d753) 0.6.43 (#20093) - [`6adde20`](https://github.com/open-webui/open-webui/commit/6adde203cd292a9e3af9c64a2ae36b603fed096a) Merge pull request #20394 from open-webui/dev - [`f9b0534`](https://github.com/open-webui/open-webui/commit/f9b0534e0c442631d1cb7205169588b9b6204179) Merge pull request #20522 from open-webui/dev ### 📊 Changes **1 file changed** (+170 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/utils/middleware.py` (+170 -0) </details> ### 📄 Description # Pull Request Checklist - [x] **Target branch:** This PR targets the `dev` branch. - [x] **Description:** Provided below. - [x] **Changelog:** Included below. - [ ] **Documentation:** No user-facing behavior, environment variables, or public API changes requiring docs updates. - [ ] **Dependencies:** No new or upgraded dependencies. - [x] **Testing:** Manual testing performed and described below. - [x] **Agentic AI Code:** I personally reviewed and manually tested all changes in this PR. - [x] **Code review:** Self-review completed; changes follow project coding standards. - [x] **Design & Architecture:** Single focused bug fix; no new settings or architectural changes. - [x] **Git Hygiene:** Single logical change, rebased on `dev`. - [x] **Title Prefix:** `fix` prefix used. --- ## Problem When using native function calling (`function_calling=native`) with a non-streaming LLM response (e.g. Ollama/Gemma), MCP tool calls are **never executed**. The model returns `finish_reason=tool_calls` but the tools are never invoked — the MCP server only ever receives `ListToolsRequest`, never `CallToolRequest`. The model's final answer is therefore ungrounded and ignores the tool results entirely. Additionally, when Ollama prefixes tool names with the server name (e.g. `"oracle:oracle_vault_ask"`), the `tools_dict` lookup fails silently in the streaming path as well. ## Root Cause The tool-execution loop in `middleware.py` lives entirely inside `streaming_chat_response_handler`. `non_streaming_chat_response_handler` only inspects `choices[0].message.content` and has no code path for `finish_reason=tool_calls`. Tool calls in non-streaming responses are silently dropped. ## Fix - Add `_execute_tool_calls_non_streaming()` helper that: 1. Extracts `tool_calls` from the non-streaming LLM response. 2. Resolves each tool name from `metadata['tools']`, stripping Ollama's `servername:` prefix if present. 3. Executes the tool — either `mcp_client.call_tool()` for MCP tools or calls the builtin callable directly. 4. Appends the tool results as `role=tool` messages and makes a second LLM call to produce the final grounded answer. - Call `_execute_tool_calls_non_streaming()` from `non_streaming_chat_response_handler` whenever `function_calling==native` and `tool_calls` are present. - Apply the same Ollama prefix-normalization (`servername:toolname` → `toolname`) in the streaming path's tool-execution loop so both paths handle prefixed names consistently. --- # Changelog Entry ### Description Bug fix for native function calling in non-streaming mode: MCP tool calls returned by the LLM were silently dropped because `non_streaming_chat_response_handler` had no tool-execution path. ### Fixed - Native function calling now works in non-streaming mode: when `finish_reason=tool_calls` is present in a non-streaming response, tool calls are executed and a second LLM call is made to ground the final answer in the tool results. - Tool name prefix normalization (stripping Ollama's `servername:toolname` prefix) is now applied in both the streaming and non-streaming paths. --- ## Testing I tested this end-to-end on my self-hosted OpenWebUI instance running behind a Cloudflare reverse proxy tunnel. Setup: MCP server using FastMCP with stateless HTTP transport, authenticated via Bearer token, connected through Admin → Tool Servers UI. Model: Gemma4-26b via local Ollama. Settings: `function_calling: native`, `reasoning_effort: none`. I sent chat messages with an MCP tool selected. Before this fix, the MCP server logs showed only `ListToolsRequest` — tools were never called. After applying this fix (along with PRs #24104 and #24105), I confirmed via the MCP server logs that `CallToolRequest` was received after the model emitted a tool call, and the model's final reply was grounded in the actual tool results. Tested specifically on the non-streaming path (`stream: false` via direct API call). --- ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 14:49:28 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#43138