mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #12154] issue: Native tool calling with streaming mode off does nothing with returned tool calls #55151
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Column01 on GitHub (Mar 28, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12154
Check Existing Issues
Installation Method
Pip Install
Open WebUI Version
dev (or v0.5.20)
Ollama Version (if applicable)
llama-server b4942 (llama.cpp cuda backend)
Operating System
Windows 10
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
When running a model locally using "Native" tool calling mode, the model should return valid tool calls (it does) and open-webui should execute those tool calls with extra inference steps with their results as needed.
Actual Behavior
The backend returns valid tool calls for the prompt, but the webui never executes the tool calls as expected. The
finish_reasonistool_calls, but the frontend and backend treat it as if the conversation turn is done when it is not; the model is expecting tool calls to be run and an additional inference step to happen to show the result of the tool callsIn this case, it's trying to find the latitude and longitude of Toronto using the web search tool I made, but the webui never runs it. The logs attached below are for this conversation
Steps to Reproduce
Function CallingtoNative, andStream Chat Responseto False (required for tool calling with llama.cpp)Logs & Screenshots
llama-server.log
webUI.log
Additional Information
I tried to reach out for assistance on discord, posting a troubleshooting thread asking for help, but I don't know if anyone else could help me with this, it seems like a bug.
b03fc97e28/backend/open_webui/utils/middleware.py (L787-L803)This code should have something inside the
if metadata.get("function_calling") == "native":similar to the non native mode below it where it runs the tool calling inference steps. As far as I can tell poking around, the native mode never runs the tool calls, but I could be mistaken as I do not fully understand the codebase@Column01 commented on GitHub (Mar 28, 2025):
Linked issue is relevant, I have streaming mode off as it's required to be off for my backend. This seems like a silly thing to restrict in my opinion