mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #23343] issue: Model Response is injected into thinking section, no visible output #19956
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @seppel123 on GitHub (Apr 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23343
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.8.10
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The model's response should be displayed to the user
Actual Behavior
Somtimes the resppnse get injected into the thincking section, the user does not get any visible output.
Regenerate Button oder Continue response does not solve the Problem.
When i open the Thoughts Section by click the arrow i can see the thinking process followed by the model response.
What the user see:
Whats in the Toughts section:
Steps to Reproduce
Dont know the reason that leads to this problem, sometimes it works normal and sometimes i get this behavior
Logs & Screenshots
Additional Information
No response
@Classic298 commented on GitHub (Apr 2, 2026):
Cannot reproduce. We need proper reproduction steps. Even when going directly to the gemini api on openai beta endpoint i cannot reproduce. Neither via openrouter nor via litellm
Do you have pipes for connectivity?
Do you have ANY filters?
Did you configure anything on advanced Params?
Cannot reproduce with gemini claude gpt or anything. We need reproduction steps.
This looks like a custom model. What did you configure there? Custom params? Filters? Pipes?
Any modifications?
How and where do you get your models from
@seppel123 commented on GitHub (Apr 2, 2026):
Model Provider:
IONOS and Requesty
It happens with Models from both providers:
Used in the Image is .openai/gpt-oss-120b from IONOS.
Pipe: no pipe!
Filter: no filter installed!
Actions:
Export to Excel
Export to Word
Export to PDF
Model Settings:
Function Calling > nativ
All other on Standard
Capabilities:
File Upload
File Context
Web Search
Usage
Citations
Status Updates
Builtin Tools
Default Features
Web_Search
Builtin Tools
All
No Knowledge Base in use
No special modifications
@Classic298 commented on GitHub (Apr 2, 2026):
Ok. Please try with another provider. Maybe the issue is from upstream if it happens with two different models as you said.
Either requesty or IONOS improperly handling the reasoning tags
And if you can't switch providers easily, log the output that's incoming from your provider(s) to see the output
@seppel123 commented on GitHub (Apr 2, 2026):
How can i trace this informations, in my docker logs i cant see any abnormalities when it happens.
@Classic298 commented on GitHub (Apr 2, 2026):
log output using filter or use open webui in debug logging
@seppel123 commented on GitHub (Apr 3, 2026):
I did reproduce a broken response, this time it isnt injected into thinking section, the web_search returned a enxtreme big resppnse 360000 Tokens and thats too much for the model and break ist, thats why i dont get an output.
This is not the same problem as shown above but how can i avoid this one?
web_search is jina.at
Search Result Count: 3
Concurrent Requests: 10
Bypass Embedding and Retrieval: on (full context Mode) << i think this is the problem?
Bypass Web Loader: off
Trust Proxy Environment: off
Bypass Embeddings and Retrieval will push the hole load into the model ans break the context window.
Am i right?
How to avoid that without using Embedding on web_search?
@Classic298 commented on GitHub (Apr 3, 2026):
yes. if you bypass embedding and retrieval youll inject the whole thing
web_search native tool does not use embedding anyways. Do you get 360000 of tokens from one single web_search request?
@seppel123 commented on GitHub (Apr 3, 2026):
Im confused by the logfile so i upload it:
log.txt
Can you take a look?
@Classic298 commented on GitHub (Apr 3, 2026):
I reviewed it
What I can confirm from your log.txt:
About your config question: yes, with Bypass Embeddings and Retrieval in full-context style, large web payloads can flood the model context and increase the chance of malformed/empty final output.
Recommended mitigation:
Based on the provided log, this currently looks more likely upstream/provider-side or model issue than an Open WebUI bug.
I can see malformed output framing (<think<|message|>) and empty final assistant message chunks while reasoning/tool payload exists, under very large web-search context injection.
At this stage, we should treat this as an upstream/stream issue or model dependent behaviour unless we can reproduce it with a known-good provider under similar conditions.
@seppel123 commented on GitHub (Apr 17, 2026):
This is the Doc for the Response from OSS120B on IONOS: https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example
Is the Response Block right or do they respond the false way?
Thx!