[GH-ISSUE #21768] issue: OpenAI-compatible streaming: finish_reason incorrectly returned as "stop" after streaming tool_calls #90302

New Issue

2026-05-15T15:30:42-05:00

GiteaMirror commented

2026-05-15 15:30:42 -05:00

Originally created by @Sechma on GitHub (Feb 23, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21768

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.8.4

Ollama Version (if applicable)

No response

Operating System

Windows 11

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When using OpenWebUI’s OpenAI-compatible endpoint (/api/v1/chat/completions) with stream: true and function/tool calling enabled, the streaming response should preserve correct OpenAI-compatible semantics.

If the model emits a tool call during streaming (i.e., delta.tool_calls is present in one or more chunks), the final streamed chunk should include:

finish_reason: "tool_calls"

This is required so agent frameworks (such as OpenCode or Vercel AI SDK) can correctly detect that a tool execution step is required and continue the agent loop.

This behavior is correctly observed when:

Using stream: false via OpenWebUI

Streaming directly against Ollama’s OpenAI-compatible endpoint

Actual Behavior

When using stream: true through OpenWebUI:

The server correctly streams delta.tool_calls chunks.

However, the final chunk incorrectly sets:

finish_reason: "stop"

instead of:

finish_reason: "tool_calls"

As a result, agent frameworks interpret the response as a completed generation rather than a tool invocation, and the agent loop terminates prematurely.

Notably, this issue does not occur in non-streaming mode (stream: false), where the correct finish_reason: "tool_calls" is returned.

Steps to Reproduce

Run following comand:
(fill your API TOKEN)

  curl -N http://localhost:3000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-7547f767d86047588cea1afe7ac3e22c" \
  -d '{
    "model":"qwen3:8b",
    "stream": true,
    "messages":[{"role":"user","content":"Call tool  get_time city Brno."}],
    "tools":[{"type":"function","function":{"name":"get_time","description":"return time for city","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "tool_choice":"auto"
  }'

Logs & Screenshots

Response:

data: {"id": "qwen3:8b-1a45d77e-6704-4386-a0cb-4be8a1707ccd", "created": 1771833945, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": " City"}}], "object": "chat.completion.chunk"}

data: {"id": "qwen3:8b-a9b590e3-0aa5-40c2-b2c2-5f0a28f3f6a9", "created": 1771833945, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": "\".\n"}}], "object": "chat.completion.chunk"}

data: {"id": "qwen3:8b-c90ef64e-8cef-4278-9a56-015bb19c4357", "created": 1771833947, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"tool_calls": [{"index": 0, "id": "call_tyvbvhga", "type": "function", "function": {"name": "get_time", "arguments": "{\"city\": \"New York City\"}"}}]}}], "object": "chat.completion.chunk"}

data: {"id": "qwen3:8b-7f14644d-820d-4b98-bcbb-0568404749fd", "created": 1771833947, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": "stop", "delta": {}}], "object": "chat.completion.chunk", "usage": {"input_tokens": 138, "output_tokens": 113, "total_tokens": 251, "prompt_tokens": 138, "completion_tokens": 113, "response_token/s": 10.35, "prompt_token/s": 106.66, "total_duration": 12431268390, "load_duration": 173481661, "prompt_eval_count": 138, "prompt_eval_duration": 1293868611, "eval_count": 113, "eval_duration": 10919403663, "approximate_total": "0h0m12s", "completion_tokens_details": {"reasoning_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}}

So the final finish_reason is "stop"  even though a tool call was emitted.
After emitting a tool call in the stream, the final chunk should set:
finish_reason = "tool_calls"

Additional Information

Non-streaming variant works fine, but lots of AI agents using just stream variant:

 curl -s http://localhost:3000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer  TOKEN" \
  -d '{
    "model": "qwen3:8b",
    "stream": false,
    "messages": [
      {"role": "user", "content": "Call the get_time tool for the city New York."}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_time",
          "description": "Returns time for a city",
          "parameters": {
            "type": "object",
            "properties": { "city": { "type": "string" } },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }' | jq .
{
  "id": "qwen3:8b-d4ab1adc-c816-4ae3-91e1-66cf7f83f131",
  "created": 1771834148,
  "model": "qwen3:8b",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": "",
        "reasoning_content": "Okay, the user wants me to call the get_time function for New York. Let me check the tools provided. The function is called get_time and it takes a city parameter. The arguments should be a JSON object with the city key. So I need to structure the tool call correctly. The city is New York, so the arguments would be {\"city\": \"New York\"}. I'll make sure to use the proper JSON format and enclose it in the tool_call tags. Let me double-check the syntax to avoid any errors.\n",
        "tool_calls": [
          {
            "index": 0,
            "id": "call_2pqu7pen",
            "type": "function",
            "function": {
              "name": "get_time",
              "arguments": "{\"city\": \"New York\"}"
            }
          }
        ]
      }
    }
  ],
  "object": "chat.completion",
  "usage": {
    "input_tokens": 140,
    "output_tokens": 133,
    "total_tokens": 273,
    "prompt_tokens": 140,
    "completion_tokens": 133,
    "response_token/s": 10.55,
    "prompt_token/s": 814.45,
    "total_duration": 13007054614,
    "load_duration": 191915323,
    "prompt_eval_count": 140,
    "prompt_eval_duration": 171894673,
    "eval_count": 133,
    "eval_duration": 12612171664,
    "approximate_total": "0h0m13s",
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Originally created by @Sechma on GitHub (Feb 23, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/21768 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.8.4 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 11 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When using OpenWebUI’s OpenAI-compatible endpoint (/api/v1/chat/completions) with stream: true and function/tool calling enabled, the streaming response should preserve correct OpenAI-compatible semantics. If the model emits a tool call during streaming (i.e., delta.tool_calls is present in one or more chunks), the final streamed chunk should include: finish_reason: "tool_calls" This is required so agent frameworks (such as OpenCode or Vercel AI SDK) can correctly detect that a tool execution step is required and continue the agent loop. This behavior is correctly observed when: Using stream: false via OpenWebUI Streaming directly against Ollama’s OpenAI-compatible endpoint ### Actual Behavior When using stream: true through OpenWebUI: The server correctly streams delta.tool_calls chunks. However, the final chunk incorrectly sets: finish_reason: "stop" instead of: finish_reason: "tool_calls" As a result, agent frameworks interpret the response as a completed generation rather than a tool invocation, and the agent loop terminates prematurely. Notably, this issue does not occur in non-streaming mode (stream: false), where the correct finish_reason: "tool_calls" is returned. ### Steps to Reproduce Run following comand: (fill your API TOKEN) ``` curl -N http://localhost:3000/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-7547f767d86047588cea1afe7ac3e22c" \ -d '{ "model":"qwen3:8b", "stream": true, "messages":[{"role":"user","content":"Call tool get_time city Brno."}], "tools":[{"type":"function","function":{"name":"get_time","description":"return time for city","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}], "tool_choice":"auto" }' ``` ### Logs & Screenshots Response: ``` data: {"id": "qwen3:8b-1a45d77e-6704-4386-a0cb-4be8a1707ccd", "created": 1771833945, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": " City"}}], "object": "chat.completion.chunk"} data: {"id": "qwen3:8b-a9b590e3-0aa5-40c2-b2c2-5f0a28f3f6a9", "created": 1771833945, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": "\".\n"}}], "object": "chat.completion.chunk"} data: {"id": "qwen3:8b-c90ef64e-8cef-4278-9a56-015bb19c4357", "created": 1771833947, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"tool_calls": [{"index": 0, "id": "call_tyvbvhga", "type": "function", "function": {"name": "get_time", "arguments": "{\"city\": \"New York City\"}"}}]}}], "object": "chat.completion.chunk"} data: {"id": "qwen3:8b-7f14644d-820d-4b98-bcbb-0568404749fd", "created": 1771833947, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": "stop", "delta": {}}], "object": "chat.completion.chunk", "usage": {"input_tokens": 138, "output_tokens": 113, "total_tokens": 251, "prompt_tokens": 138, "completion_tokens": 113, "response_token/s": 10.35, "prompt_token/s": 106.66, "total_duration": 12431268390, "load_duration": 173481661, "prompt_eval_count": 138, "prompt_eval_duration": 1293868611, "eval_count": 113, "eval_duration": 10919403663, "approximate_total": "0h0m12s", "completion_tokens_details": {"reasoning_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}} So the final finish_reason is "stop" even though a tool call was emitted. After emitting a tool call in the stream, the final chunk should set: finish_reason = "tool_calls" ``` ### Additional Information Non-streaming variant works fine, but lots of AI agents using just stream variant: ``` curl -s http://localhost:3000/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer TOKEN" \ -d '{ "model": "qwen3:8b", "stream": false, "messages": [ {"role": "user", "content": "Call the get_time tool for the city New York."} ], "tools": [ { "type": "function", "function": { "name": "get_time", "description": "Returns time for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } } } ], "tool_choice": "auto" }' | jq . { "id": "qwen3:8b-d4ab1adc-c816-4ae3-91e1-66cf7f83f131", "created": 1771834148, "model": "qwen3:8b", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "tool_calls", "message": { "role": "assistant", "content": "", "reasoning_content": "Okay, the user wants me to call the get_time function for New York. Let me check the tools provided. The function is called get_time and it takes a city parameter. The arguments should be a JSON object with the city key. So I need to structure the tool call correctly. The city is New York, so the arguments would be {\"city\": \"New York\"}. I'll make sure to use the proper JSON format and enclose it in the tool_call tags. Let me double-check the syntax to avoid any errors.\n", "tool_calls": [ { "index": 0, "id": "call_2pqu7pen", "type": "function", "function": { "name": "get_time", "arguments": "{\"city\": \"New York\"}" } } ] } } ], "object": "chat.completion", "usage": { "input_tokens": 140, "output_tokens": 133, "total_tokens": 273, "prompt_tokens": 140, "completion_tokens": 133, "response_token/s": 10.55, "prompt_token/s": 814.45, "total_duration": 13007054614, "load_duration": 191915323, "prompt_eval_count": 140, "prompt_eval_duration": 171894673, "eval_count": 133, "eval_duration": 12612171664, "approximate_total": "0h0m13s", "completion_tokens_details": { "reasoning_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 } } } ```

GiteaMirror added the bug label 2026-05-15 15:30:42 -05:00

GiteaMirror closed this issue

2026-05-15 15:30:43 -05:00

GiteaMirror commented

2026-05-15 15:30:44 -05:00

@Classic298 commented on GitHub (Feb 24, 2026):

can you confirm if this is solved by https://github.com/open-webui/open-webui/commit/0b867590a8487c69c676d85fef9241623614410c ?

@Classic298 commented on GitHub (Feb 24, 2026): can you confirm if this is solved by https://github.com/open-webui/open-webui/commit/0b867590a8487c69c676d85fef9241623614410c ?

GiteaMirror commented

2026-05-15 15:30:45 -05:00

@tnndclub commented on GitHub (Feb 25, 2026):

I have the same issue where tool_calls is not possible after upgrading to release 0.8.5
No such issue on release 0.8.3.

@tnndclub commented on GitHub (Feb 25, 2026): I have the same issue where tool_calls is not possible after upgrading to release 0.8.5 No such issue on release 0.8.3.

GiteaMirror commented

2026-05-15 15:30:45 -05:00

@Classic298 commented on GitHub (Feb 25, 2026):

Confirmation is wanted if this is fixed in dev please. Thanks.

A fix might be in place - so additional error reports are not pushing this forward. We need to know if the error is still the case in dev for you or not. -- Thanks

https://docs.openwebui.com/getting-started/development

@Classic298 commented on GitHub (Feb 25, 2026): Confirmation is wanted if this is fixed in dev please. Thanks. A fix might be in place - so additional error reports are not pushing this forward. We need to know if the error is still the case in `dev` for you or not. -- Thanks https://docs.openwebui.com/getting-started/development

GiteaMirror commented

2026-05-15 15:30:45 -05:00

@Classic298 commented on GitHub (Mar 1, 2026):

nobody willing to confirm?

@Classic298 commented on GitHub (Mar 1, 2026): nobody willing to confirm?

GiteaMirror commented

2026-05-15 15:30:45 -05:00

@Sechma commented on GitHub (Mar 2, 2026):

I'm confirming it helps and the issue is fixed

`curl -N http://localhost:3000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-7547f767d86047588cea1afe7ac3e22c" \
  -d '{
    "model":"qwen3:8b",
    "stream": true,
    "messages":[{"role":"user","content":"Call tool  get_time city Brno."}],
    "tools":[{"type":"function","function":{"name":"get_time","description":"return time for city","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "tool_choice":"auto"
  }'
data: {"id": "qwen3:8b-03001b1f-4b29-424d-97f4-b52c9cc9b203", "created": 1772461452, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": "Okay"}}], "object": "chat.completion.chunk"}

data: {"id": "qwen3:8b-17e7697f-18ce-4c46-8d40-f0d725fc25c7", "created": 1772461452, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": ","}}], "object": "chat.completion.chunk"}
...
data: {"id": "qwen3:8b-e8bc31e7-d1e7-4722-83f3-0e44a7d8e17e", "created": 1772461467, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"tool_calls": [{"index": 0, "id": "call_i6hnhc9u", "type": "function", "function": {"name": "get_time", "arguments": "{\"city\": \"Brno\"}"}}]}}], "object": "chat.completion.chunk"}

data: {"id": "qwen3:8b-d7d3bc73-2bb6-4f94-b4b0-987b005192f2", "created": 1772461467, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": "tool_calls", "delta": {}}], "object": "chat.completion.chunk", "usage": {"input_tokens": 137, "output_tokens": 134, "total_tokens": 271, "prompt_tokens": 137, "completion_tokens": 134, "response_token/s": 9.72, "prompt_token/s": 71.45, "total_duration": 23843435103, "load_duration": 8043917737, "prompt_eval_count": 137, "prompt_eval_duration": 1917491251, "eval_count": 134, "eval_duration": 13789469847, "approximate_total": "0h0m23s", "completion_tokens_details": {"reasoning_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}}

`

@Sechma commented on GitHub (Mar 2, 2026): I'm confirming it helps and the issue is fixed ``` `curl -N http://localhost:3000/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-7547f767d86047588cea1afe7ac3e22c" \ -d '{ "model":"qwen3:8b", "stream": true, "messages":[{"role":"user","content":"Call tool get_time city Brno."}], "tools":[{"type":"function","function":{"name":"get_time","description":"return time for city","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}], "tool_choice":"auto" }' data: {"id": "qwen3:8b-03001b1f-4b29-424d-97f4-b52c9cc9b203", "created": 1772461452, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": "Okay"}}], "object": "chat.completion.chunk"} data: {"id": "qwen3:8b-17e7697f-18ce-4c46-8d40-f0d725fc25c7", "created": 1772461452, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"reasoning_content": ","}}], "object": "chat.completion.chunk"} ... data: {"id": "qwen3:8b-e8bc31e7-d1e7-4722-83f3-0e44a7d8e17e", "created": 1772461467, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": null, "delta": {"tool_calls": [{"index": 0, "id": "call_i6hnhc9u", "type": "function", "function": {"name": "get_time", "arguments": "{\"city\": \"Brno\"}"}}]}}], "object": "chat.completion.chunk"} data: {"id": "qwen3:8b-d7d3bc73-2bb6-4f94-b4b0-987b005192f2", "created": 1772461467, "model": "qwen3:8b", "choices": [{"index": 0, "logprobs": null, "finish_reason": "tool_calls", "delta": {}}], "object": "chat.completion.chunk", "usage": {"input_tokens": 137, "output_tokens": 134, "total_tokens": 271, "prompt_tokens": 137, "completion_tokens": 134, "response_token/s": 9.72, "prompt_token/s": 71.45, "total_duration": 23843435103, "load_duration": 8043917737, "prompt_eval_count": 137, "prompt_eval_duration": 1917491251, "eval_count": 134, "eval_duration": 13789469847, "approximate_total": "0h0m23s", "completion_tokens_details": {"reasoning_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0}}} ` ```

GiteaMirror commented

2026-05-15 15:30:46 -05:00

@tnndclub commented on GitHub (Mar 2, 2026):

@Classic298
It doesn't help.
Here is what I did,

download 0.8.5
replace response.py
tool calling is still failing.

I also tried latest release, 0.8.7, and the tool calling works.

@tnndclub commented on GitHub (Mar 2, 2026): @Classic298 It doesn't help. Here is what I did, - download 0.8.5 - replace response.py tool calling is still failing. I also tried latest release, 0.8.7, and the tool calling works.

GiteaMirror commented

2026-05-15 15:30:47 -05:00

@Classic298 commented on GitHub (Mar 2, 2026):

@Sechma so for you it works

@tnndclub and you said it doesn't help but you are also saying in 0.8.7 it works?

So does it work? I will close this now as I also suspect it should work now.
Thanks.

@Classic298 commented on GitHub (Mar 2, 2026): @Sechma so for you it works @tnndclub and you said it doesn't help but you are also saying in 0.8.7 it works? So does it work? I will close this now as I also suspect it should work now. Thanks.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#90302