mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #23343] issue: Model Response is injected into thinking section, no visible output #35485
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @seppel123 on GitHub (Apr 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23343
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.8.10
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The model's response should be displayed to the user
Actual Behavior
Somtimes the resppnse get injected into the thincking section, the user does not get any visible output.
Regenerate Button oder Continue response does not solve the Problem.
When i open the Thoughts Section by click the arrow i can see the thinking process followed by the model response.
What the user see:
Whats in the Toughts section:
Steps to Reproduce
Dont know the reason that leads to this problem, sometimes it works normal and sometimes i get this behavior
Logs & Screenshots
Additional Information
No response
@Classic298 commented on GitHub (Apr 2, 2026):
Cannot reproduce. We need proper reproduction steps. Even when going directly to the gemini api on openai beta endpoint i cannot reproduce. Neither via openrouter nor via litellm
Do you have pipes for connectivity?
Do you have ANY filters?
Did you configure anything on advanced Params?
Cannot reproduce with gemini claude gpt or anything. We need reproduction steps.
This looks like a custom model. What did you configure there? Custom params? Filters? Pipes?
Any modifications?
How and where do you get your models from
@seppel123 commented on GitHub (Apr 2, 2026):
Model Provider:
IONOS
Used in the Image is .openai/gpt-oss-120b from IONOS.
Pipe: no pipe!
Filter: no filter installed!
Actions:
Export to Excel
Export to Word
Export to PDF
Model Settings:
Function Calling > nativ
All other on Standard
Capabilities:
File Upload
File Context
Web Search
Usage
Citations
Status Updates
Builtin Tools
Default Features
Web_Search
Builtin Tools
All
No Knowledge Base in use
No special modifications
@Classic298 commented on GitHub (Apr 2, 2026):
Ok. Please try with another provider. Maybe the issue is from upstream if it happens with two different models as you said.
Either requesty or IONOS improperly handling the reasoning tags
And if you can't switch providers easily, log the output that's incoming from your provider(s) to see the output
@seppel123 commented on GitHub (Apr 2, 2026):
How can i trace this informations, in my docker logs i cant see any abnormalities when it happens.
@Classic298 commented on GitHub (Apr 2, 2026):
log output using filter or use open webui in debug logging
@seppel123 commented on GitHub (Apr 3, 2026):
I did reproduce a broken response, this time it isnt injected into thinking section, the web_search returned a enxtreme big resppnse 360000 Tokens and thats too much for the model and break ist, thats why i dont get an output.
This is not the same problem as shown above but how can i avoid this one?
web_search is jina.at
Search Result Count: 3
Concurrent Requests: 10
Bypass Embedding and Retrieval: on (full context Mode) << i think this is the problem?
Bypass Web Loader: off
Trust Proxy Environment: off
Bypass Embeddings and Retrieval will push the hole load into the model ans break the context window.
Am i right?
How to avoid that without using Embedding on web_search?
@Classic298 commented on GitHub (Apr 3, 2026):
yes. if you bypass embedding and retrieval youll inject the whole thing
web_search native tool does not use embedding anyways. Do you get 360000 of tokens from one single web_search request?
@seppel123 commented on GitHub (Apr 3, 2026):
Im confused by the logfile so i upload it:
log.txt
Can you take a look?
@Classic298 commented on GitHub (Apr 3, 2026):
I reviewed it
What I can confirm from your log.txt:
About your config question: yes, with Bypass Embeddings and Retrieval in full-context style, large web payloads can flood the model context and increase the chance of malformed/empty final output.
Recommended mitigation:
Based on the provided log, this currently looks more likely upstream/provider-side or model issue than an Open WebUI bug.
I can see malformed output framing (<think<|message|>) and empty final assistant message chunks while reasoning/tool payload exists, under very large web-search context injection.
At this stage, we should treat this as an upstream/stream issue or model dependent behaviour unless we can reproduce it with a known-good provider under similar conditions.
@seppel123 commented on GitHub (Apr 17, 2026):
This is the Doc for the Response from OSS120B on IONOS: https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example
Is the Response Block right or do they respond the false way?
Thx!
@seppel123 commented on GitHub (Apr 22, 2026):
Still there even with update 0.9.1
@Classic298 commented on GitHub (Apr 22, 2026):
Looking at the IONOS docs you linked, their non-streaming response is spec-adjacent (it returns both reasoning and reasoning_content fields, plus content separately — Open WebUI handles all three at utils/middleware.py:4066-4070). That's fine.
But they don't document their streaming format at all — the example only shows the final usage chunk. That's where this breaks.
The strongest clue is from your earlier log: the assistant output contained <think<|message|>…. That's not a normal OpenAI stream — it's a fragment of OpenAI's harmony chat format (GPT-OSS's native output format uses tokens like <|start|>assistant<|channel|>analysis<|message|>…<|end|><|start|>assistant<|channel|>final<|message|>…).
What you're seeing is IONOS's harmony-to-OpenAI converter leaking raw harmony tokens into the stream and partially munging <|channel|>analysis into <think. That corrupt opening tag means Open WebUI's
<think>parser captures the whole rest of the response (reasoning and final answer) as reasoning, and no</think>ever arrives to close it — which is exactly the "everything is in the thinking box, nothing in the main message" symptom.So: upstream / IONOS provider-side bug, not Open WebUI. Their harmony-format serializer is broken in streaming mode.
To confirm independently:
Try the same openai/gpt-oss-120b via OpenRouter, Groq, or a self-hosted vLLM/llama.cpp endpoint. If it's fine there, it's IONOS.
Try a non-GPT-OSS reasoning model on IONOS (e.g., Llama 3.3 70B). If that is fine but GPT-OSS isn't, it's specifically IONOS's GPT-OSS/harmony handling.
Report it to IONOS support with one of your debug-log captures showing <think<|message|> — that's unambiguous evidence of a broken harmony decoder.
@seppel123 commented on GitHub (Apr 22, 2026):
Thank you!!!
I contacted the support of IONOS they will investigate the issue may they can correct the response format.
I will share the results.
@seppel123 commented on GitHub (Apr 24, 2026):
Here the answer from IONOS Support, im not sure now if this is possible to fix?
Open-Webui say IONOS is the Problem and IONOS say Open-Webui is the problem :)
[Ticket#:207xxxx]
Hi,
we are compatible to OpenAI API but we do not fully implement everything that this API provides.
Potentially also not what OpenWebUI expects in this case.
The stream sends chunks in two separate waves using different fields:
Wave 1 — REASONING chunks (delta.reasoning):
Wave 2 — CONTENT chunks (delta.content):
The Exact Problem
The delta.reasoning field is non-standard. The official OpenAI streaming spec has no reasoning field in deltas. OpenWebUI handles it by wrapping those chunks in a visible ... thinking block — but it never properly closes that block before the delta.content chunks arrive.
The empty transition chunk {"delta": {"content": ""}} between the two waves fails to signal to OpenWebUI that thinking is over. So OpenWebUI either:
Streaming — What it should look like
The tags must be inline inside delta.content, never in a separate delta.reasoning field:
@Classic298 commented on GitHub (Apr 24, 2026):
@seppel123 — commented by Claude on behalf of @Classic298 (I'm currently unable to comment directly myself).
Thanks for forwarding the IONOS response. I traced their described stream through the open-webui source (dev branch) and their diagnosis is incorrect on two of three points:
delta.reasoningis not "non-standard" relative to how the ecosystem actually works today. DeepSeek usesreasoning_content, OpenRouter / xAI / Qwen / MiniMax usereasoning, Anthropic passthroughs usethinking— Open WebUI handles all three atbackend/open_webui/utils/middleware.py:4108-4112:IONOS emitting
delta.reasoningis fine and is not the bug.The claim that "empty
{"delta": {"content": ""}}fails to close the thinking block" is factually wrong. Open WebUI transitions from reasoning to content on the first non-empty content chunk, atmiddleware.py:4145-4170. The empty chunk is a no-op — which is correct, because there is no "empty content signals thinking is over" convention in OpenAI's streaming spec that a client should honor, and IONOS didn't invent one either.Walking the exact stream they described through the code:
{"delta":{"reasoning":"Answer"}}→ opens a reasoning item, accumulates "Answer".{"delta":{"content":""}}→value = "", falsy;if value:skipped. No-op. Reasoning stays open (correct).{"delta":{"content":"The"}}→value = "The", truthy. Enters the close branch at 4145: last item is a reasoning item withattributes.type == 'reasoning_content', so it marks itstatus='completed', setsended_at/duration, appends a fresh message item, and streams "The" into the message.So the reasoning → content transition works for the exact stream IONOS described. No bug on our side.
The actual root cause from the original log is a Harmony format token leak, and that IS on IONOS's side. GPT-OSS natively outputs Harmony tokens like
<|start|>assistant<|channel|>analysis<|message|>…<|end|><|start|>assistant<|channel|>final<|message|>…. In your earlier log we saw<think<|message|>…in the assistant output — that's IONOS's Harmony-to-OpenAI converter mangling<|channel|>analysisinto<think, producing a malformed opening tag. Open WebUI's reasoning-tag detector (middleware.py:153-162) then keeps the reasoning block open because no</think>ever arrives. That's exactly the "everything is in the thinking box, nothing in the main message" symptom.Their proposed "fix" (emitting
<think>/</think>inline insidedelta.content) would also work — Open WebUI accepts that via the reasoning-tag detector listed above. But they don't need to change format; they need to stop their Harmony serializer from leaking tokens into the stream.If you want to send something back to IONOS support, the one-line version is: the streaming
delta.reasoning→delta.contenttransition works correctly in Open WebUI; the actual bug is the Harmony token leak (<think<|message|>) their serializer produces, which is unambiguously on their side.