issue: Continuous Repetition of Tool-Use Responses #6133

New Issue

GiteaMirror · 2025-11-11T16:45:46-06:00

GiteaMirror commented

2025-11-11 16:45:46 -06:00

Originally created by @ziozzang on GitHub (Aug 19, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.22

Ollama Version (if applicable)

No response

Operating System

linux

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When I use GLM-4.5 with vLLM for tool-use, an unusual phenomenon has been observed. A specific phrase related to tool-use is continuously being repeated during the process.

Upon investigation, I have found that this issue occurs under the following conditions:

User query.
LLM response. (I will use a tool)
Reasoning ()
Tool-use and response.

Expected//

step 2. LLM response before tool-use print only once.
and 3-4(tool-use..) , 3-4(another tool use...) ...

Actual Behavior

When steps 3 and 4 are repeated, the response from step 2 is continuously output in between each repetition. Since this content does not exist in the actual API-level LLM request, it appears to be an issue with OpenWebUI's output handling. The message should be displayed only once and should not be repeatedly output.

Steps to Reproduce

docker v 0.6.22.

This is same issue.
https://www.reddit.com/r/LocalLLaMA/comments/1mekoy8/agentic_email_workflow_inside_of_openwebui/

Unlike other models, it appears that due to the specific characteristics of the GLM-4.5 model, it generates its current status outside of the tag during tool-use. Even with this model, there are no issues with the tool's invocation or usage, other than the repetition of the output on the OpenWebUI display.

Logs & Screenshots

no log. but posting with screenshot.

https://www.reddit.com/r/LocalLLaMA/comments/1mekoy8/agentic_email_workflow_inside_of_openwebui/

Additional Information

No response

Originally created by @ziozzang on GitHub (Aug 19, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.22 ### Ollama Version (if applicable) _No response_ ### Operating System linux ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When I use GLM-4.5 with vLLM for tool-use, an unusual phenomenon has been observed. A specific phrase related to tool-use is continuously being repeated during the process. Upon investigation, I have found that this issue occurs under the following conditions: 1. User query. 2. LLM response. (I will use a tool) 3. Reasoning (<think>) 4. Tool-use and response. Expected// - step 2. LLM response before tool-use print only once. - and 3-4(tool-use..) , 3-4(another tool use...) ... ### Actual Behavior When steps 3 and 4 are repeated, the response from step 2 is continuously output in between each repetition. Since this content does not exist in the actual API-level LLM request, it appears to be an issue with OpenWebUI's output handling. The message should be displayed only once and should not be repeatedly output. ### Steps to Reproduce docker v 0.6.22. This is same issue. https://www.reddit.com/r/LocalLLaMA/comments/1mekoy8/agentic_email_workflow_inside_of_openwebui/ Unlike other models, it appears that due to the specific characteristics of the GLM-4.5 model, it generates its current status outside of the <think> tag during tool-use. Even with this model, there are no issues with the tool's invocation or usage, other than the repetition of the output on the OpenWebUI display. ### Logs & Screenshots no log. but posting with screenshot. https://www.reddit.com/r/LocalLLaMA/comments/1mekoy8/agentic_email_workflow_inside_of_openwebui/ ### Additional Information _No response_

GiteaMirror added the bug label 2025-11-11 16:45:46 -06:00

GiteaMirror closed this issue

2025-11-11 16:45:47 -06:00

GiteaMirror commented

2025-11-11 16:45:47 -06:00

@ziozzang commented on GitHub (Aug 19, 2025):

LLM level temporary solution

add system prompt like this

When using a tool, do not generate any responses until the tool usage is complete. All responses must be generated only after the usage is finished

@ziozzang commented on GitHub (Aug 19, 2025): LLM level temporary solution - add system prompt like this ``` When using a tool, do not generate any responses until the tool usage is complete. All responses must be generated only after the usage is finished ```

GiteaMirror commented

2025-11-11 16:45:47 -06:00

@tjbck commented on GitHub (Aug 19, 2025):

Most likely an issue with the model itself, we're unable to reproduce with openai models. Keep us updated.

@tjbck commented on GitHub (Aug 19, 2025): Most likely an issue with the model itself, we're unable to reproduce with openai models. Keep us updated.

GiteaMirror referenced this issue

2026-04-19 20:40:35 -05:00

[GH-ISSUE #6133] enh: late chunking toggle option #14253

GiteaMirror referenced this issue

2026-04-25 04:11:56 -05:00

[GH-ISSUE #6133] enh: late chunking toggle option #29781

GiteaMirror referenced this issue