mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #19656] issue: Tool call response tokens are duplicated, causing 2x token consumption #18948
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @FujinoXiao on GitHub (Dec 1, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19656
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.6.40
Ollama Version (if applicable)
No response
Operating System
Ubuntu 22.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The tool response should only be included once in the context. For example, if a tool returns content that should consume ~10,000 tokens, only ~10,000 tokens should be used.
Actual Behavior
The tool response content is duplicated in the context, causing 2x token consumption. When a tool returns content worth ~10,000 tokens, it actually consumes ~20,000 tokens because the response appears twice in the message history.
Steps to Reproduce
Go to Open WebUI, navigate to Workspace -> Tools -> Create a new tool
Add the following test tool code:
from pydantic import Field
Enable this tool in a chat
Ask the model to call this tool (e.g., "Please use the repeat_test tool with count=10000")
Check the token usage
The word "test" is approximately 1 token, so repeating it 10,000 times should consume ~10,000 tokens plus a small amount for the user's question and model's response (maybe ~11,000 tokens total)
However, the actual token consumption shows ~22,000+ tokens, indicating the tool response is being duplicated
This issue scales linearly: if you set count=20000, the expected usage is ~20,000 tokens, but actual usage is ~40,000+ tokens. The tool response is consistently duplicated regardless of size.
Logs & Screenshots
Token consumption comparison:
Test 1: repeat_test with count=10000
Test 2: repeat_test with count=20000
Conclusion: Tool response tokens are consistently doubled, confirming duplication issue.
Additional Information
No response
@owui-terminator[bot] commented on GitHub (Dec 1, 2025):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#19390 issue: tools will double the token cost
by qq3829596922 • Nov 23, 2025 •
bug#19169 issue: System Prompt Duplication During Agentic Tool Calls Leading to Token Waste and Write-Cache Overprice
by alexis-dioxycle • Nov 13, 2025 •
bug#17058 issue: Response cannot be stopped after the tool is called
by EntropyYue • Aug 30, 2025 •
bug#17678 issue: OAuth token after some time missing in tool calls
by koflerm • Sep 23, 2025 •
bug#16721 issue: Continuous Repetition of Tool-Use Responses
by ziozzang • Aug 19, 2025 •
bugShow 5 more related issues
#17047 issue: Multiple tool calls cause repetitive text output
by pairwiserr • Aug 29, 2025 •
bug#15690 issue: tool calls fail when model makes multiple tool calls in one response
by Master-Pr0grammer • Jul 13, 2025 •
bug#16138 issue: tools names are doubled when calling
by latel • Jul 30, 2025 •
bug#19509 issue: User overview page calls /api/v1/users multiple times
by luke-wren • Nov 26, 2025 •
bug#12829 issue: Using "tools" causes the API to be called twice.
by KingPollux • Apr 14, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@FujinoXiao commented on GitHub (Dec 1, 2025):
This is a submission with detailed reproduction steps, test code, and token consumption data to properly document the bug.
and it is the first time
@tjbck commented on GitHub (Dec 1, 2025):
@silentoplayz confirmation wanted here!
@silentoplayz commented on GitHub (Dec 1, 2025):
I tested this issue with an external GroqCloud model and it failed to call the tool provided by @qq3829596922 in the issue post... at least up until the point where I told the model in the next query to "FIX IT FOR ME", to which it likely obliged (but hit a token limit), which translated into a 20k+ tokens API call. I think it would've been roughly 10k tokens used instead if it weren't duplicating tool call response tokens, ultimately confirming the reported 2x token consumption suspicion.
I had Kimi K2 Instruct fix the tool and I believe this confirms the issue further:

Tool code:
My local models are failing to use the tool successfully.

Edit: I lowered the repeat amount to 1k (x10 reduction) and the model called the tool fine.
@rgaricano commented on GitHub (Dec 1, 2025):
The problem occurs because tool results are being added to the message history twice: once during initial tool processing and again when converting content blocks back to messages for subsequent LLM calls.
First - In
chat_completion_tools_handler, tool results are immediately added to the message history after processing, here:140605e660/backend/open_webui/utils/middleware.py (L496-L499)& Second - In
process_chat_response, when processing streaming responses, the content blocks (which include tool results) are converted back to messages and added toform_data["messages"], here:140605e660/backend/open_webui/utils/middleware.py (L2993-L2998)Should be fixed just removing first addition lines (L496-L499)
@Ithanil commented on GitHub (Dec 1, 2025):
Not saying that there is nothing to see here, because it is actually the second report of this issue (see https://github.com/open-webui/open-webui/issues/19390 ),
But I think for actually understanding the issue we first need to look at the requests received by the LLM backend. And if I use the test tool (with low default reps) with native tool calling, I get the following request on the second turn (i.e. after the tool was called):
And in my opinion this is what I would expect and I see "Hello, World!Hello, World!Hello, World!" only once. Could you guys please look for the actual requests and tell where there is the apparent doubling?
EDIT:
What I find actually strange though is how the previous tool results are included in further rounds of chat:
@rgaricano commented on GitHub (Dec 1, 2025):
the issue is only with non-native function calling, native functin call haven't this duplication.
@Ithanil commented on GitHub (Dec 1, 2025):
Oh OK, thanks for clarifying. That wasn't clear to me.
In the case of non-native calling the messages look as follows, confirming the issue (tool result once in the
<context>and then again appended as "Tooltest_tool/repeat_testOutput: testtesttest"):@tjbck commented on GitHub (Dec 1, 2025):
52ccab8fc0@FujinoXiao commented on GitHub (Dec 2, 2025):
why so fast,i just want to fix the bug to be a contributor