[GH-ISSUE #23343] issue: Model Response is injected into thinking section, no visible output #19956

Closed
opened 2026-04-20 02:31:33 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @seppel123 on GitHub (Apr 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23343

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.8.10

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The model's response should be displayed to the user

Actual Behavior

Somtimes the resppnse get injected into the thincking section, the user does not get any visible output.
Regenerate Button oder Continue response does not solve the Problem.

When i open the Thoughts Section by click the arrow i can see the thinking process followed by the model response.

What the user see:

Image

Whats in the Toughts section:

Image

Steps to Reproduce

Dont know the reason that leads to this problem, sometimes it works normal and sometimes i get this behavior

Logs & Screenshots


Additional Information

No response

Originally created by @seppel123 on GitHub (Apr 2, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23343 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.8.10 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The model's response should be displayed to the user ### Actual Behavior Somtimes the resppnse get injected into the thincking section, the user does not get any visible output. Regenerate Button oder Continue response does not solve the Problem. When i open the Thoughts Section by click the arrow i can see the thinking process followed by the model response. What the user see: ![Image](https://github.com/user-attachments/assets/24b7b290-5de5-4828-a16d-e07f5676750a) Whats in the Toughts section: ![Image](https://github.com/user-attachments/assets/d0dca7bb-c875-4557-ba1a-c1f66cef2ecd) ### Steps to Reproduce Dont know the reason that leads to this problem, sometimes it works normal and sometimes i get this behavior ### Logs & Screenshots --- ### Additional Information _No response_
GiteaMirror added the bug label 2026-04-20 02:31:33 -05:00
Author
Owner

@Classic298 commented on GitHub (Apr 2, 2026):

Cannot reproduce. We need proper reproduction steps. Even when going directly to the gemini api on openai beta endpoint i cannot reproduce. Neither via openrouter nor via litellm

Do you have pipes for connectivity?
Do you have ANY filters?
Did you configure anything on advanced Params?

Cannot reproduce with gemini claude gpt or anything. We need reproduction steps.

This looks like a custom model. What did you configure there? Custom params? Filters? Pipes?

Any modifications?

How and where do you get your models from

<!-- gh-comment-id:4176182622 --> @Classic298 commented on GitHub (Apr 2, 2026): Cannot reproduce. We need proper reproduction steps. Even when going directly to the gemini api on openai beta endpoint i cannot reproduce. Neither via openrouter nor via litellm Do you have pipes for connectivity? Do you have ANY filters? Did you configure anything on advanced Params? Cannot reproduce with gemini claude gpt or anything. We need reproduction steps. This looks like a custom model. What did you configure there? Custom params? Filters? Pipes? Any modifications? How and where do you get your models from
Author
Owner

@seppel123 commented on GitHub (Apr 2, 2026):

Model Provider:
IONOS and Requesty
It happens with Models from both providers:
Used in the Image is .openai/gpt-oss-120b from IONOS.

Pipe: no pipe!

Filter: no filter installed!

Actions:
Export to Excel
Export to Word
Export to PDF

Model Settings:
Function Calling > nativ
All other on Standard

Capabilities:
File Upload
File Context
Web Search
Usage
Citations
Status Updates
Builtin Tools

Default Features
Web_Search

Builtin Tools
All

No Knowledge Base in use

No special modifications

<!-- gh-comment-id:4176558762 --> @seppel123 commented on GitHub (Apr 2, 2026): **Model Provider**: IONOS and Requesty It happens with Models from both providers: Used in the Image is .openai/gpt-oss-120b from IONOS. **Pipe**: no pipe! **Filter**: no filter installed! **Actions**: Export to Excel Export to Word Export to PDF **Model Settings**: Function Calling > nativ All other on Standard **Capabilities**: File Upload File Context Web Search Usage Citations Status Updates Builtin Tools **Default Features** Web_Search **Builtin Tools** All **No Knowledge Base in use** **No special modifications**
Author
Owner

@Classic298 commented on GitHub (Apr 2, 2026):

Ok. Please try with another provider. Maybe the issue is from upstream if it happens with two different models as you said.
Either requesty or IONOS improperly handling the reasoning tags

And if you can't switch providers easily, log the output that's incoming from your provider(s) to see the output

<!-- gh-comment-id:4176700528 --> @Classic298 commented on GitHub (Apr 2, 2026): Ok. Please try with another provider. Maybe the issue is from upstream if it happens with two different models as you said. Either requesty or IONOS improperly handling the reasoning tags And if you can't switch providers easily, log the output that's incoming from your provider(s) to see the output
Author
Owner

@seppel123 commented on GitHub (Apr 2, 2026):

How can i trace this informations, in my docker logs i cant see any abnormalities when it happens.

<!-- gh-comment-id:4176873332 --> @seppel123 commented on GitHub (Apr 2, 2026): How can i trace this informations, in my docker logs i cant see any abnormalities when it happens.
Author
Owner

@Classic298 commented on GitHub (Apr 2, 2026):

log output using filter or use open webui in debug logging

<!-- gh-comment-id:4177591353 --> @Classic298 commented on GitHub (Apr 2, 2026): log output using filter or use open webui in debug logging
Author
Owner

@seppel123 commented on GitHub (Apr 3, 2026):

I did reproduce a broken response, this time it isnt injected into thinking section, the web_search returned a enxtreme big resppnse 360000 Tokens and thats too much for the model and break ist, thats why i dont get an output.

This is not the same problem as shown above but how can i avoid this one?
web_search is jina.at
Search Result Count: 3
Concurrent Requests: 10
Bypass Embedding and Retrieval: on (full context Mode) << i think this is the problem?
Bypass Web Loader: off
Trust Proxy Environment: off

Bypass Embeddings and Retrieval will push the hole load into the model ans break the context window.
Am i right?

How to avoid that without using Embedding on web_search?

<!-- gh-comment-id:4182637257 --> @seppel123 commented on GitHub (Apr 3, 2026): I did reproduce a broken response, this time it isnt injected into thinking section, the web_search returned a enxtreme big resppnse 360000 Tokens and thats too much for the model and break ist, thats why i dont get an output. This is not the same problem as shown above but how can i avoid this one? web_search is jina.at Search Result Count: 3 Concurrent Requests: 10 Bypass Embedding and Retrieval: on (full context Mode) << i think this is the problem? Bypass Web Loader: off Trust Proxy Environment: off Bypass Embeddings and Retrieval will push the hole load into the model ans break the context window. Am i right? How to avoid that without using Embedding on web_search?
Author
Owner

@Classic298 commented on GitHub (Apr 3, 2026):

  1. yes. if you bypass embedding and retrieval youll inject the whole thing

  2. web_search native tool does not use embedding anyways. Do you get 360000 of tokens from one single web_search request?

<!-- gh-comment-id:4182652149 --> @Classic298 commented on GitHub (Apr 3, 2026): 1) yes. if you bypass embedding and retrieval youll inject the whole thing 2) web_search native tool does not use embedding anyways. Do you get 360000 of tokens from one single web_search request?
Author
Owner

@seppel123 commented on GitHub (Apr 3, 2026):

Im confused by the logfile so i upload it:
log.txt

Can you take a look?

<!-- gh-comment-id:4182723575 --> @seppel123 commented on GitHub (Apr 3, 2026): Im confused by the logfile so i upload it: [log.txt](https://github.com/user-attachments/files/26457347/log.txt) Can you take a look?
Author
Owner

@Classic298 commented on GitHub (Apr 3, 2026):

I reviewed it

What I can confirm from your log.txt:

  • Multiple assistant message outputs are empty ("text": "") while reasoning/tool sections are present.
  • There is at least one malformed assistant output chunk containing control-like text: "<think<|message|>...".
  • Prompt/token load is very high (~111k–112k tokens in this chat), and search_web is injecting very large raw page content (including boilerplate/cookie/CAPTCHA pages).

About your config question: yes, with Bypass Embeddings and Retrieval in full-context style, large web payloads can flood the model context and increase the chance of malformed/empty final output.

Recommended mitigation:

  • Lower Search Result Count (start with 2–3).
  • Disable full raw-context behavior / avoid bypass mode for very large pages.
  • Add filtering/truncation for boilerplate-heavy pages before model injection - the model doesn't need all the other information on a website

Based on the provided log, this currently looks more likely upstream/provider-side or model issue than an Open WebUI bug.

I can see malformed output framing (<think<|message|>) and empty final assistant message chunks while reasoning/tool payload exists, under very large web-search context injection.

At this stage, we should treat this as an upstream/stream issue or model dependent behaviour unless we can reproduce it with a known-good provider under similar conditions.

<!-- gh-comment-id:4182979289 --> @Classic298 commented on GitHub (Apr 3, 2026): I reviewed it What I can confirm from your log.txt: - Multiple assistant message outputs are empty ("text": "") while reasoning/tool sections are present. - There is at least one malformed assistant output chunk containing control-like text: "<think<|message|>...". - Prompt/token load is very high (~111k–112k tokens in this chat), and search_web is injecting very large raw page content (including boilerplate/cookie/CAPTCHA pages). About your config question: yes, with Bypass Embeddings and Retrieval in full-context style, large web payloads can flood the model context and increase the chance of malformed/empty final output. Recommended mitigation: - Lower Search Result Count (start with 2–3). - Disable full raw-context behavior / avoid bypass mode for very large pages. - Add filtering/truncation for boilerplate-heavy pages before model injection - the model doesn't need all the other information on a website Based on the provided log, this currently looks more likely upstream/provider-side **or model issue** than an Open WebUI bug. I can see malformed output framing (<think<|message|>) and empty final assistant message chunks while reasoning/tool payload exists, under very large web-search context injection. At this stage, we should treat this as an upstream/stream issue **or model dependent behaviour** unless we can reproduce it with a known-good provider under similar conditions.
Author
Owner

@seppel123 commented on GitHub (Apr 17, 2026):

This is the Doc for the Response from OSS120B on IONOS: https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example

Is the Response Block right or do they respond the false way?

Thx!

<!-- gh-comment-id:4266683326 --> @seppel123 commented on GitHub (Apr 17, 2026): This is the Doc for the Response from OSS120B on IONOS: [https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example](https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example) Is the Response Block right or do they respond the false way? Thx!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19956