mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-13 14:46:29 -05:00
[GH-ISSUE #23339] feat: Add model-level toggle to disable reinjecting reasoning/thinking into prompts (prevents <think> tag imitation and rendering/parsing failures) #58620
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ShirasawaSama on GitHub (Apr 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23339
Check Existing Issues
Verify Feature Scope
Problem Description
Open WebUI reconstructs chat history from stored
outputitems and may reinject reasoning/thinking (e.g.<think>...</think>) back into the next-turnmessages(viaconvert_output_to_messages(..., raw=True)).For some providers/models, once these tags appear in history, the model starts imitating/amplifying them (multiple/nested/variant thinking tags), which can eventually break frontend reasoning parsing/rendering (thinking blocks leak into normal text, duplicated blocks, malformed markdown/artifacts, etc.).
Screenshots
The content has been incorporated into the thought block:
The reason is: if you pass the thought process to the LLM with a tag as the main text, the LLM will attempt to mimic it and also output a fake tag.
Desired Solution you'd like
Alternatives Considered
Add a model-level toggle (advanced setting), e.g.
reinject_reasoning/include_reasoning_in_history:messages.Additional Context
Relevant code (for maintainers)
backend/open_webui/utils/middleware.py:process_messages_with_output()usesconvert_output_to_messages(..., raw=True)backend/open_webui/utils/misc.py:convert_output_to_messages(..., raw=True)appends reasoning as<think>...</think>@ShirasawaSama commented on GitHub (Apr 2, 2026):
Why we need feature flags here
Open WebUI converts structured
outputback into the next-turn LLMmessages(often withraw=true). That makes “what the UI shows” and “what the model sees” tightly coupled. A small upstream change (or a provider leaking<think>) can pollute prompts, increase token cost, and re-inject sensitive tool data. Feature flags let us keep official defaults while giving downstream forks a safe way to harden privacy and stability.What to gate (and why)
<think>-style contentfunction_call_output)function_call.arguments)Minimal flag design (pragmatic)
@visig9 commented on GitHub (Apr 4, 2026):
Also, Gemma 4 need it for the official requirement:
See: https://huggingface.co/google/gemma-4-E2B-it#3-multi-turn-conversations
@nekomiya-hinata commented on GitHub (Apr 4, 2026):
I would like to suggest a complementary approach to the toggle: placing the thinking content into a dedicated, separate field in the message payload (which could be configurable or fixed, e.g.,
reasoningorreasoning_content).Instead of concatenating the reasoning block into the raw
contentstring with<think>...</think>tags (which directly causes the prompt pollution and imitation issues mentioned), Open WebUI could structure the multi-turn history like this:Why this helps:
contentremains pristine, drastically reducing the chance of the LLM generating fake tags or malformed markdown.@ShirasawaSama commented on GitHub (Apr 8, 2026):
I disagree; in practice, many models will simply throw an error if they encounter an unknown field.
@Classic298 commented on GitHub (Apr 8, 2026):
also wouldnt this be possible with a relatively simple filter if anyone wants this?
@Classic298 commented on GitHub (Apr 8, 2026):
Thats nice and all, but other models DO need it, that's why Open WebUI sends the reasoning content.
You can build yourself a simple filter that strips the reasoning content before sending it to the AI
@Classic298 commented on GitHub (Apr 8, 2026):
@ShirasawaSama
Why? Some providers and models need it - and if you have a model that strictly doesn't need the thinking sent back to the provider couldn't a filter handle this easily?
@ShirasawaSama commented on GitHub (Apr 8, 2026):
In fact, for most models, not passing the thinking process does not cause any issues, so there is no need to return it.
Only a small number of models, such as those that support "extended thinking" (e.g., Claude), require the thinking process to be returned.
However, even for models that require the thinking process to be returned, it is not returned via the
<think>tag, but through a separate field.Embedding the thinking process into the assistant’s body is incorrect under any circumstances, as it completely breaks the KV cache and causes token consumption to spike.
This is why a control option for different models is necessary.
@Classic298 commented on GitHub (Apr 8, 2026):
why would it break KV cache?
If i consistently send the same data and the same fields to the provider it will cache that and reuse the cache.
Only modifications to those fields would break KV cache
Regardless, why cannot a filter be used here? As you correctly said some models need it. Yes, not all - but this is where perhaps a filter should take place and adjust the payload for special needs of the model provider.
OpenAI also (for large companies) needs the safety identifier to be sent otherwise you will quickly not have an account anymore. Can be done by Open WebUI via flag/toggle - but shouldn't. A Filter is the better option here
@Classic298 commented on GitHub (Apr 8, 2026):
Btw UI-rendered artifacts are never sent to the model to begin with
@ShirasawaSama commented on GitHub (Apr 8, 2026):
Can you give me an example of a model that requires the thinking process to be returned via the
<think>tag? As far as I know, the Gemini, Claude, ChatGPT, DeepSeek, Kimi, Grok, and Qwen series should not return the thinking process.Returning the thinking process via the
<think>tag pollutes the context and causes the LLM to insert multiple<think>tags into the output to simulate thinking.@Classic298 commented on GitHub (Apr 8, 2026):
Claude definitely requires returning the thinking to the provider across turns, and all providers that I know of require thinking to be sent back to the provider within the same turn at least (because if the model is only thinking and planning and then doing a tool call (end of generation) you need to send the thinking it just generated otherwise the model doesn't know what i planned to do and wants to do next).
So DeepSeek, Claude, Qwen, Kimi, all models with transparent thinking content require it to be sent within the same turn so the model knows why it did that tool call for example - and Anthropic in particular requires it across turns also
@ShirasawaSama commented on GitHub (Apr 8, 2026):
https://platform.claude.com/docs/en/build-with-claude/extended-thinking
I checked Claude's documentation, and even when the model is called via a tool, the response should follow the original "thinking block" format rather than using the
<think>tag.I don't have much of an opinion on whether to return the thought process, but I do believe that the model's additional "thought" field should not be converted into a
<think>tag, embedded within the assistant's content, and then sent back to the model.9bd84258d0/backend/open_webui/utils/misc.py (L244-L247)@ShirasawaSama commented on GitHub (Apr 8, 2026):
In fact, it’s currently nearly impossible to use the Claude model for native tool calls in OwUI, because OwUI inevitably generates messages that contain only the tool call but have empty content, causing Claude to throw an error immediately.
I used a filter to process the Claude model’s input, removing these empty blocks, which allowed it to use native tool calls.
In other words, send the reasoning to models are actually the rarer ones.
@ShirasawaSama commented on GitHub (Apr 8, 2026):
DeepSeek:
https://api-docs.deepseek.com/guides/thinking_mode
@ShirasawaSama commented on GitHub (Apr 8, 2026):
Gemini 3 Tool Calls:
Gemini requires that only the Thought Signatures be returned, not the thought content.
@ShirasawaSama commented on GitHub (Apr 8, 2026):
ChatGPT:
https://developers.openai.com/api/docs/guides/reasoning?example=planning#keeping-reasoning-items-in-context
When using the Responses API, it is recommended to return the response as-is:
@ShirasawaSama commented on GitHub (Apr 8, 2026):
So I think the key point is: don’t inject the reasoning content into the assistant’s output using the
<think>tag; instead, return the LLM’s data as-is as much as possible.@Classic298 commented on GitHub (Apr 8, 2026):
Yes i agree about how to inject it but i think current behavior on whether it is returned to the API at all can stay unchanged
@8bit-coder commented on GitHub (Apr 9, 2026):
Adding to this that I'm experiencing bugs with multi-message Gemma 4 conversations as well. The thinking tag getting reinserted into the prompt along with the previous chain of thought causes huge issues and breaks the model's output. Is there a temporary fix for this?
@ShirasawaSama commented on GitHub (Apr 9, 2026):
Add a "pass" between
elif item_type == 'reasoning':andif raw:9bd84258d0/backend/open_webui/utils/misc.py (L233-L242)@8bit-coder commented on GitHub (Apr 13, 2026):
I added the pass statement and it still has the issue. I installed vim in the container, edited the file on line 233 and saved it, opened it again to validate, and it still results in the same behavior. I even tried stopping and starting the container again but that undoes the changes so I have to do the changes while the container is live.
@Classic298 commented on GitHub (Apr 13, 2026):
@8bit-coder editing in the container and then restarting it with down and up -d doesnt really persist the changes in my experience
@8bit-coder commented on GitHub (Apr 13, 2026):
Should I edit the container data itself then and then bring it up?
@Classic298 commented on GitHub (Apr 13, 2026):
yes custom Dockerfile or through a replacement script on startup
@yrro commented on GitHub (Apr 22, 2026):
Here's a filter that strips
<think>...</think>from assistant messages.https://gist.github.com/yrro/b0f2765ea55ae3414e06b319dd07ae8e
Uncomment the
printstatement and look at your logs to check that it's working properly.@itsHenry35 commented on GitHub (Apr 26, 2026):
Yeah this is indeed needed when deepseek reasoning is on with native tool calling (can't set to default as in that case the model doesn't know there are tools to call), or otherwise it returns an error.
@JoeEnderman commented on GitHub (May 4, 2026):
That fixes it. Thank you! Now Gemma 4 26b a4b works amazingly.