[GH-ISSUE #12135] issue: Non-streaming toolcall ignored, never resolves #16480

Closed
opened 2026-04-19 22:23:29 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @Arokha on GitHub (Mar 27, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12135

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

n/a

Operating System

Ubuntu 22.04

Browser (if applicable)

Firefox

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have listed steps to reproduce the bug in detail.

Expected Behavior

Model configured with native tool call functionality performs tool calls normally and they are resolved normally by open-webui invoking the tool functions and returning the data regardless of streaming response setting.

Actual Behavior

When streaming responses are disabled, tool calls fail to resolve. The response is returned to open-webui but seems to just be ignored and the chat remains in 'waiting for response' state forever.

Steps to Reproduce

(See my configuration below)
In a chat, set Stream Chat Response: Off

Make a chat message that will cause the (native tool-calling) LLM to invoke a tool. In my case, I'm using the example get date/time tools.

Chat spins forever. Click stop button to have it give up.
(Note: OpenAI models don't like to reply AND invoke tools in the same response. Performing this test against an Anthropic model results in a 'Sure, let me do that for you.' type response and the tool call, but again, it is dropped in the same way as an OpenAI model with streaming disabled.)

Change chat setting 'Stream Chat Response: On', and click 'Regenerate' on the LLM's response (which is just the ghost-text placeholder stuff... I forget the web dev term for that).

The tool executes, and is returned to the LLM, the LLM takes it and produces a reply to it.

Logs & Screenshots

I start a new chat with: Check the date and tell it to me with streaming disabled.

After my request to GPT-4o-mini, it replies with:

{"id": "chatcmpl-<redacted>", "choices": [{"finish_reason": "tool_calls", "index": 0, "logprobs": null, "message": {"content": null, "refusal": null, "role": "assistant", "annotations": [], "audio": null, "function_call": null, "tool_calls": [{"id": "call_<redacted", "function": {"arguments": "{}", "name": "get_current_date"}, "type": "function"}]}}], "created": 1743114572, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": "default", "system_fingerprint": "fp_<redacted>", "usage": {"completion_tokens": 12, "prompt_tokens": 2744, "total_tokens": 2756, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}}

You can see this appears to be a normal attempt by the LLM to invoke a tool. But, nothing is ever done with it. The chat remains spinning forever. Open-WebUI docker's only relevant log (how can I turn up logging?) is the 200 status result from the above:

open-webui | 2025-03-27 22:29:33.990 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 2600:<ipv6 redacted> - "POST /api/chat/completions HTTP/1.1" 200 - {}

If I perform it with streaming ENabled:

GPT-4o Mini

Tool Executed
    get_current_date: Current Date: Thursday, March 27, 2025 
Today's date is Thursday, March 27, 2025.

Works fine.

Additional Information

I'm using litellm as a proxy server to OpenAI, Anthropic, etc.

My setting on the admin-level model config is streaming: yes.

I did my testing to both OpenAI and Anthropic models, with the only difference being whether they reply with a 'preface' message as they invoke tools (OpenAI does not, Anthropic does. This is normal behavior for their models).

I realize I don't have many interesting logs from open-webui about this... so if you can let me know how to generate more detailed logs from open-webui's docker image I'll do that and produce more logs.

Originally created by @Arokha on GitHub (Mar 27, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/12135 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) n/a ### Operating System Ubuntu 22.04 ### Browser (if applicable) Firefox ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior Model configured with native tool call functionality performs tool calls normally and they are resolved normally by open-webui invoking the tool functions and returning the data regardless of streaming response setting. ### Actual Behavior When streaming responses are disabled, tool calls fail to resolve. The response is returned to open-webui but seems to just be ignored and the chat remains in 'waiting for response' state forever. ### Steps to Reproduce (See my configuration below) In a chat, set Stream Chat Response: Off Make a chat message that will cause the (native tool-calling) LLM to invoke a tool. In my case, I'm using the example get date/time tools. Chat spins forever. Click stop button to have it give up. (Note: OpenAI models don't like to reply AND invoke tools in the same response. Performing this test against an Anthropic model results in a 'Sure, let me do that for you.' type response and the tool call, but again, it is dropped in the same way as an OpenAI model with streaming disabled.) Change chat setting 'Stream Chat Response: On', and click 'Regenerate' on the LLM's response (which is just the ghost-text placeholder stuff... I forget the web dev term for that). The tool executes, and is returned to the LLM, the LLM takes it and produces a reply to it. ### Logs & Screenshots I start a new chat with: `Check the date and tell it to me` with streaming disabled. After my request to GPT-4o-mini, it replies with: ``` {"id": "chatcmpl-<redacted>", "choices": [{"finish_reason": "tool_calls", "index": 0, "logprobs": null, "message": {"content": null, "refusal": null, "role": "assistant", "annotations": [], "audio": null, "function_call": null, "tool_calls": [{"id": "call_<redacted", "function": {"arguments": "{}", "name": "get_current_date"}, "type": "function"}]}}], "created": 1743114572, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": "default", "system_fingerprint": "fp_<redacted>", "usage": {"completion_tokens": 12, "prompt_tokens": 2744, "total_tokens": 2756, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}} ``` You can see this appears to be a normal attempt by the LLM to invoke a tool. But, nothing is ever done with it. The chat remains spinning forever. Open-WebUI docker's only relevant log (how can I turn up logging?) is the 200 status result from the above: `open-webui | 2025-03-27 22:29:33.990 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 2600:<ipv6 redacted> - "POST /api/chat/completions HTTP/1.1" 200 - {}` If I perform it with streaming ENabled: ``` GPT-4o Mini Tool Executed get_current_date: Current Date: Thursday, March 27, 2025 Today's date is Thursday, March 27, 2025. ``` Works fine. ### Additional Information I'm using litellm as a proxy server to OpenAI, Anthropic, etc. My setting on the admin-level model config is streaming: yes. I did my testing to both OpenAI and Anthropic models, with the only difference being whether they reply with a 'preface' message as they invoke tools (OpenAI does not, Anthropic does. This is normal behavior for their models). I realize I don't have many interesting logs from open-webui about this... so if you can let me know how to generate more detailed logs from open-webui's docker image I'll do that and produce more logs.
GiteaMirror added the bug label 2026-04-19 22:23:29 -05:00
Author
Owner

@tjbck commented on GitHub (Mar 28, 2025):

Intended behaviour here, streaming must be enabled for tool calls to be invoked.

<!-- gh-comment-id:2760012966 --> @tjbck commented on GitHub (Mar 28, 2025): Intended behaviour here, streaming must be enabled for tool calls to be invoked.
Author
Owner

@Column01 commented on GitHub (Mar 28, 2025):

I take issue with this, Non-streaming mode is required for many backends to return tool calls. This makes native tool calling useless in that case (for example, llama.cpp's llama-server needs streaming mode off for the --jinja flag which is required for tool calling, and the front end will give you an error if you try with it on).

I opened a new issue, thinking this to be unintended behavior, I didn't expect it to be intended to not work

#12154

<!-- gh-comment-id:2761459637 --> @Column01 commented on GitHub (Mar 28, 2025): I take issue with this, Non-streaming mode is **required** for many backends to return tool calls. This makes native tool calling useless in that case (for example, llama.cpp's llama-server needs streaming mode off for the `--jinja` flag which is required for tool calling, and the front end will give you an error if you try with it on). I opened a new issue, thinking this to be unintended behavior, I didn't expect it to be intended to not work #12154
Author
Owner

@Column01 commented on GitHub (Mar 28, 2025):

at the very least, a message should be displayed indicating this when native mode is on with streaming mode off. I spent a few days troubleshooting because of this!

<!-- gh-comment-id:2761668860 --> @Column01 commented on GitHub (Mar 28, 2025): at the very least, a message should be displayed indicating this when native mode is on with streaming mode off. I spent a few days troubleshooting because of this!
Author
Owner

@Arokha commented on GitHub (Mar 29, 2025):

If this is the case, then it should not even pass tools to the API at all! What is the point of offering tools to the API in that case?!

<!-- gh-comment-id:2764279208 --> @Arokha commented on GitHub (Mar 29, 2025): If this is the case, then it should not even pass tools to the API at all! What is the point of offering tools to the API in that case?!
Author
Owner

@Frank-Schiro commented on GitHub (Jun 20, 2025):

Thanks seriously two days of going crazy trying to debug this.

<!-- gh-comment-id:2991914730 --> @Frank-Schiro commented on GitHub (Jun 20, 2025): Thanks seriously two days of going crazy trying to debug this.
Author
Owner

@schematical commented on GitHub (Jun 25, 2025):

How exactly is this intended behavior? Where is that documented?

If it is intended behavior why show the toggle button UI at all if its not going to send it?

I don't mean to be harsh but this is pretty confusing behavior.

<!-- gh-comment-id:3006368625 --> @schematical commented on GitHub (Jun 25, 2025): How exactly is this intended behavior? Where is that documented? If it is intended behavior why show the toggle button UI at all if its not going to send it? I don't mean to be harsh but this is pretty confusing behavior.
Author
Owner

@Column01 commented on GitHub (Jun 25, 2025):

How exactly is this intended behavior? Where is that documented?

If it is intended behavior why show the toggle button UI at all if its not going to send it?

I don't mean to be harsh but this is pretty confusing behavior.

The answer is it's intentionally not complete kek

The code to run the tool calls exists, it's just only run when a streaming response is used.

<!-- gh-comment-id:3006384026 --> @Column01 commented on GitHub (Jun 25, 2025): > How exactly is this intended behavior? Where is that documented? > > If it is intended behavior why show the toggle button UI at all if its not going to send it? > > I don't mean to be harsh but this is pretty confusing behavior. The answer is it's intentionally not complete kek The code to run the tool calls exists, it's just only run when a streaming response is used.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#16480