[GH-ISSUE #12135] issue: Non-streaming toolcall ignored, never resolves #16480

New Issue

GiteaMirror · 2026-04-19T22:23:29-05:00

GiteaMirror commented

2026-04-19 22:23:29 -05:00

Originally created by @Arokha on GitHub (Mar 27, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12135

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

n/a

Operating System

Ubuntu 22.04

Browser (if applicable)

Firefox

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

Model configured with native tool call functionality performs tool calls normally and they are resolved normally by open-webui invoking the tool functions and returning the data regardless of streaming response setting.

Actual Behavior

When streaming responses are disabled, tool calls fail to resolve. The response is returned to open-webui but seems to just be ignored and the chat remains in 'waiting for response' state forever.

Steps to Reproduce

(See my configuration below)
In a chat, set Stream Chat Response: Off

Make a chat message that will cause the (native tool-calling) LLM to invoke a tool. In my case, I'm using the example get date/time tools.

Chat spins forever. Click stop button to have it give up.
(Note: OpenAI models don't like to reply AND invoke tools in the same response. Performing this test against an Anthropic model results in a 'Sure, let me do that for you.' type response and the tool call, but again, it is dropped in the same way as an OpenAI model with streaming disabled.)

Change chat setting 'Stream Chat Response: On', and click 'Regenerate' on the LLM's response (which is just the ghost-text placeholder stuff... I forget the web dev term for that).

The tool executes, and is returned to the LLM, the LLM takes it and produces a reply to it.

Logs & Screenshots

I start a new chat with: Check the date and tell it to me with streaming disabled.

After my request to GPT-4o-mini, it replies with:

{"id": "chatcmpl-<redacted>", "choices": [{"finish_reason": "tool_calls", "index": 0, "logprobs": null, "message": {"content": null, "refusal": null, "role": "assistant", "annotations": [], "audio": null, "function_call": null, "tool_calls": [{"id": "call_<redacted", "function": {"arguments": "{}", "name": "get_current_date"}, "type": "function"}]}}], "created": 1743114572, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": "default", "system_fingerprint": "fp_<redacted>", "usage": {"completion_tokens": 12, "prompt_tokens": 2744, "total_tokens": 2756, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}}

You can see this appears to be a normal attempt by the LLM to invoke a tool. But, nothing is ever done with it. The chat remains spinning forever. Open-WebUI docker's only relevant log (how can I turn up logging?) is the 200 status result from the above:

open-webui | 2025-03-27 22:29:33.990 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 2600:<ipv6 redacted> - "POST /api/chat/completions HTTP/1.1" 200 - {}

If I perform it with streaming ENabled:

GPT-4o Mini

Tool Executed
    get_current_date: Current Date: Thursday, March 27, 2025 
Today's date is Thursday, March 27, 2025.

Works fine.

Additional Information

I'm using litellm as a proxy server to OpenAI, Anthropic, etc.

My setting on the admin-level model config is streaming: yes.

I did my testing to both OpenAI and Anthropic models, with the only difference being whether they reply with a 'preface' message as they invoke tools (OpenAI does not, Anthropic does. This is normal behavior for their models).

I realize I don't have many interesting logs from open-webui about this... so if you can let me know how to generate more detailed logs from open-webui's docker image I'll do that and produce more logs.

Originally created by @Arokha on GitHub (Mar 27, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/12135 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) n/a ### Operating System Ubuntu 22.04 ### Browser (if applicable) Firefox ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior Model configured with native tool call functionality performs tool calls normally and they are resolved normally by open-webui invoking the tool functions and returning the data regardless of streaming response setting. ### Actual Behavior When streaming responses are disabled, tool calls fail to resolve. The response is returned to open-webui but seems to just be ignored and the chat remains in 'waiting for response' state forever. ### Steps to Reproduce (See my configuration below) In a chat, set Stream Chat Response: Off Make a chat message that will cause the (native tool-calling) LLM to invoke a tool. In my case, I'm using the example get date/time tools. Chat spins forever. Click stop button to have it give up. (Note: OpenAI models don't like to reply AND invoke tools in the same response. Performing this test against an Anthropic model results in a 'Sure, let me do that for you.' type response and the tool call, but again, it is dropped in the same way as an OpenAI model with streaming disabled.) Change chat setting 'Stream Chat Response: On', and click 'Regenerate' on the LLM's response (which is just the ghost-text placeholder stuff... I forget the web dev term for that). The tool executes, and is returned to the LLM, the LLM takes it and produces a reply to it. ### Logs & Screenshots I start a new chat with: `Check the date and tell it to me` with streaming disabled. After my request to GPT-4o-mini, it replies with: ``` {"id": "chatcmpl-<redacted>", "choices": [{"finish_reason": "tool_calls", "index": 0, "logprobs": null, "message": {"content": null, "refusal": null, "role": "assistant", "annotations": [], "audio": null, "function_call": null, "tool_calls": [{"id": "call_<redacted", "function": {"arguments": "{}", "name": "get_current_date"}, "type": "function"}]}}], "created": 1743114572, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": "default", "system_fingerprint": "fp_<redacted>", "usage": {"completion_tokens": 12, "prompt_tokens": 2744, "total_tokens": 2756, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}} ``` You can see this appears to be a normal attempt by the LLM to invoke a tool. But, nothing is ever done with it. The chat remains spinning forever. Open-WebUI docker's only relevant log (how can I turn up logging?) is the 200 status result from the above: `open-webui | 2025-03-27 22:29:33.990 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 2600:<ipv6 redacted> - "POST /api/chat/completions HTTP/1.1" 200 - {}` If I perform it with streaming ENabled: ``` GPT-4o Mini Tool Executed get_current_date: Current Date: Thursday, March 27, 2025 Today's date is Thursday, March 27, 2025. ``` Works fine. ### Additional Information I'm using litellm as a proxy server to OpenAI, Anthropic, etc. My setting on the admin-level model config is streaming: yes. I did my testing to both OpenAI and Anthropic models, with the only difference being whether they reply with a 'preface' message as they invoke tools (OpenAI does not, Anthropic does. This is normal behavior for their models). I realize I don't have many interesting logs from open-webui about this... so if you can let me know how to generate more detailed logs from open-webui's docker image I'll do that and produce more logs.

GiteaMirror added the bug label 2026-04-19 22:23:29 -05:00

GiteaMirror closed this issue

2026-04-19 22:23:30 -05:00

GiteaMirror commented

2026-04-19 22:23:30 -05:00

@tjbck commented on GitHub (Mar 28, 2025):

Intended behaviour here, streaming must be enabled for tool calls to be invoked.

@tjbck commented on GitHub (Mar 28, 2025): Intended behaviour here, streaming must be enabled for tool calls to be invoked.

GiteaMirror commented

2026-04-19 22:23:31 -05:00

@Column01 commented on GitHub (Mar 28, 2025):

I take issue with this, Non-streaming mode is required for many backends to return tool calls. This makes native tool calling useless in that case (for example, llama.cpp's llama-server needs streaming mode off for the --jinja flag which is required for tool calling, and the front end will give you an error if you try with it on).

I opened a new issue, thinking this to be unintended behavior, I didn't expect it to be intended to not work

#12154

@Column01 commented on GitHub (Mar 28, 2025): I take issue with this, Non-streaming mode is **required** for many backends to return tool calls. This makes native tool calling useless in that case (for example, llama.cpp's llama-server needs streaming mode off for the `--jinja` flag which is required for tool calling, and the front end will give you an error if you try with it on). I opened a new issue, thinking this to be unintended behavior, I didn't expect it to be intended to not work #12154

GiteaMirror commented

2026-04-19 22:23:31 -05:00

@Column01 commented on GitHub (Mar 28, 2025):

at the very least, a message should be displayed indicating this when native mode is on with streaming mode off. I spent a few days troubleshooting because of this!

@Column01 commented on GitHub (Mar 28, 2025): at the very least, a message should be displayed indicating this when native mode is on with streaming mode off. I spent a few days troubleshooting because of this!

GiteaMirror commented

2026-04-19 22:23:32 -05:00

@Arokha commented on GitHub (Mar 29, 2025):

If this is the case, then it should not even pass tools to the API at all! What is the point of offering tools to the API in that case?!

@Arokha commented on GitHub (Mar 29, 2025): If this is the case, then it should not even pass tools to the API at all! What is the point of offering tools to the API in that case?!

GiteaMirror commented

2026-04-19 22:23:32 -05:00

@Frank-Schiro commented on GitHub (Jun 20, 2025):

Thanks seriously two days of going crazy trying to debug this.

@Frank-Schiro commented on GitHub (Jun 20, 2025): Thanks seriously two days of going crazy trying to debug this.

GiteaMirror commented

2026-04-19 22:23:33 -05:00

@schematical commented on GitHub (Jun 25, 2025):

How exactly is this intended behavior? Where is that documented?

If it is intended behavior why show the toggle button UI at all if its not going to send it?

I don't mean to be harsh but this is pretty confusing behavior.

@schematical commented on GitHub (Jun 25, 2025): How exactly is this intended behavior? Where is that documented? If it is intended behavior why show the toggle button UI at all if its not going to send it? I don't mean to be harsh but this is pretty confusing behavior.

GiteaMirror commented

2026-04-19 22:23:34 -05:00

@Column01 commented on GitHub (Jun 25, 2025):

How exactly is this intended behavior? Where is that documented?

If it is intended behavior why show the toggle button UI at all if its not going to send it?

I don't mean to be harsh but this is pretty confusing behavior.

The answer is it's intentionally not complete kek

The code to run the tool calls exists, it's just only run when a streaming response is used.

@Column01 commented on GitHub (Jun 25, 2025): > How exactly is this intended behavior? Where is that documented? > > If it is intended behavior why show the toggle button UI at all if its not going to send it? > > I don't mean to be harsh but this is pretty confusing behavior. The answer is it's intentionally not complete kek The code to run the tool calls exists, it's just only run when a streaming response is used.

GiteaMirror referenced this issue

2026-04-19 23:49:00 -05:00

[GH-ISSUE #16480] issue: Chat window level user valve values not persistent #17922

GiteaMirror referenced this issue

2026-04-25 07:22:24 -05:00

[GH-ISSUE #16480] issue: Chat window level user valve values not persistent #33451

GiteaMirror referenced this issue

2026-05-05 19:44:59 -05:00

[GH-ISSUE #16480] issue: Chat window level user valve values not persistent #56588

Sign in to join this conversation.