[GH-ISSUE #23343] issue: Model Response is injected into thinking section, no visible output #35485

New Issue

GiteaMirror · 2026-04-25T09:42:03-05:00

GiteaMirror commented

2026-04-25 09:42:03 -05:00

Originally created by @seppel123 on GitHub (Apr 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23343

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.8.10

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The model's response should be displayed to the user

Actual Behavior

Somtimes the resppnse get injected into the thincking section, the user does not get any visible output.
Regenerate Button oder Continue response does not solve the Problem.

When i open the Thoughts Section by click the arrow i can see the thinking process followed by the model response.

What the user see:

Whats in the Toughts section:

Steps to Reproduce

Dont know the reason that leads to this problem, sometimes it works normal and sometimes i get this behavior

Logs & Screenshots

Additional Information

No response

Originally created by @seppel123 on GitHub (Apr 2, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23343 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.8.10 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The model's response should be displayed to the user ### Actual Behavior Somtimes the resppnse get injected into the thincking section, the user does not get any visible output. Regenerate Button oder Continue response does not solve the Problem. When i open the Thoughts Section by click the arrow i can see the thinking process followed by the model response. What the user see: ![Image](https://github.com/user-attachments/assets/24b7b290-5de5-4828-a16d-e07f5676750a) Whats in the Toughts section: ![Image](https://github.com/user-attachments/assets/d0dca7bb-c875-4557-ba1a-c1f66cef2ecd) ### Steps to Reproduce Dont know the reason that leads to this problem, sometimes it works normal and sometimes i get this behavior ### Logs & Screenshots --- ### Additional Information _No response_

GiteaMirror added the bug label 2026-04-25 09:42:03 -05:00

GiteaMirror closed this issue

2026-04-25 09:42:04 -05:00

GiteaMirror commented

2026-04-25 09:42:05 -05:00

@Classic298 commented on GitHub (Apr 2, 2026):

Cannot reproduce. We need proper reproduction steps. Even when going directly to the gemini api on openai beta endpoint i cannot reproduce. Neither via openrouter nor via litellm

Do you have pipes for connectivity?
Do you have ANY filters?
Did you configure anything on advanced Params?

Cannot reproduce with gemini claude gpt or anything. We need reproduction steps.

This looks like a custom model. What did you configure there? Custom params? Filters? Pipes?

Any modifications?

How and where do you get your models from

@Classic298 commented on GitHub (Apr 2, 2026): Cannot reproduce. We need proper reproduction steps. Even when going directly to the gemini api on openai beta endpoint i cannot reproduce. Neither via openrouter nor via litellm Do you have pipes for connectivity? Do you have ANY filters? Did you configure anything on advanced Params? Cannot reproduce with gemini claude gpt or anything. We need reproduction steps. This looks like a custom model. What did you configure there? Custom params? Filters? Pipes? Any modifications? How and where do you get your models from

GiteaMirror commented

2026-04-25 09:42:06 -05:00

@seppel123 commented on GitHub (Apr 2, 2026):

Model Provider:
IONOS

Used in the Image is .openai/gpt-oss-120b from IONOS.

Pipe: no pipe!

Filter: no filter installed!

Actions:
Export to Excel
Export to Word
Export to PDF

Model Settings:
Function Calling > nativ
All other on Standard

Capabilities:
File Upload
File Context
Web Search
Usage
Citations
Status Updates
Builtin Tools

Default Features
Web_Search

Builtin Tools
All

No Knowledge Base in use

No special modifications

@seppel123 commented on GitHub (Apr 2, 2026): **Model Provider**: IONOS Used in the Image is .openai/gpt-oss-120b from IONOS. **Pipe**: no pipe! **Filter**: no filter installed! **Actions**: Export to Excel Export to Word Export to PDF **Model Settings**: Function Calling > nativ All other on Standard **Capabilities**: File Upload File Context Web Search Usage Citations Status Updates Builtin Tools **Default Features** Web_Search **Builtin Tools** All **No Knowledge Base in use** **No special modifications**

GiteaMirror commented

2026-04-25 09:42:06 -05:00

@Classic298 commented on GitHub (Apr 2, 2026):

Ok. Please try with another provider. Maybe the issue is from upstream if it happens with two different models as you said.
Either requesty or IONOS improperly handling the reasoning tags

And if you can't switch providers easily, log the output that's incoming from your provider(s) to see the output

@Classic298 commented on GitHub (Apr 2, 2026): Ok. Please try with another provider. Maybe the issue is from upstream if it happens with two different models as you said. Either requesty or IONOS improperly handling the reasoning tags And if you can't switch providers easily, log the output that's incoming from your provider(s) to see the output

GiteaMirror commented

2026-04-25 09:42:07 -05:00

@seppel123 commented on GitHub (Apr 2, 2026):

How can i trace this informations, in my docker logs i cant see any abnormalities when it happens.

@seppel123 commented on GitHub (Apr 2, 2026): How can i trace this informations, in my docker logs i cant see any abnormalities when it happens.

GiteaMirror commented

2026-04-25 09:42:07 -05:00

@Classic298 commented on GitHub (Apr 2, 2026):

log output using filter or use open webui in debug logging

@Classic298 commented on GitHub (Apr 2, 2026): log output using filter or use open webui in debug logging

GiteaMirror commented

2026-04-25 09:42:08 -05:00

@seppel123 commented on GitHub (Apr 3, 2026):

I did reproduce a broken response, this time it isnt injected into thinking section, the web_search returned a enxtreme big resppnse 360000 Tokens and thats too much for the model and break ist, thats why i dont get an output.

This is not the same problem as shown above but how can i avoid this one?
web_search is jina.at
Search Result Count: 3
Concurrent Requests: 10
Bypass Embedding and Retrieval: on (full context Mode) << i think this is the problem?
Bypass Web Loader: off
Trust Proxy Environment: off

Bypass Embeddings and Retrieval will push the hole load into the model ans break the context window.
Am i right?

How to avoid that without using Embedding on web_search?

@seppel123 commented on GitHub (Apr 3, 2026): I did reproduce a broken response, this time it isnt injected into thinking section, the web_search returned a enxtreme big resppnse 360000 Tokens and thats too much for the model and break ist, thats why i dont get an output. This is not the same problem as shown above but how can i avoid this one? web_search is jina.at Search Result Count: 3 Concurrent Requests: 10 Bypass Embedding and Retrieval: on (full context Mode) << i think this is the problem? Bypass Web Loader: off Trust Proxy Environment: off Bypass Embeddings and Retrieval will push the hole load into the model ans break the context window. Am i right? How to avoid that without using Embedding on web_search?

GiteaMirror commented

2026-04-25 09:42:08 -05:00

@Classic298 commented on GitHub (Apr 3, 2026):

yes. if you bypass embedding and retrieval youll inject the whole thing
web_search native tool does not use embedding anyways. Do you get 360000 of tokens from one single web_search request?

@Classic298 commented on GitHub (Apr 3, 2026): 1) yes. if you bypass embedding and retrieval youll inject the whole thing 2) web_search native tool does not use embedding anyways. Do you get 360000 of tokens from one single web_search request?

GiteaMirror commented

2026-04-25 09:42:08 -05:00

@seppel123 commented on GitHub (Apr 3, 2026):

Im confused by the logfile so i upload it:
log.txt

Can you take a look?

@seppel123 commented on GitHub (Apr 3, 2026): Im confused by the logfile so i upload it: [log.txt](https://github.com/user-attachments/files/26457347/log.txt) Can you take a look?

GiteaMirror commented

2026-04-25 09:42:09 -05:00

@Classic298 commented on GitHub (Apr 3, 2026):

I reviewed it

What I can confirm from your log.txt:

Multiple assistant message outputs are empty ("text": "") while reasoning/tool sections are present.
There is at least one malformed assistant output chunk containing control-like text: "<think<|message|>...".
Prompt/token load is very high (~111k–112k tokens in this chat), and search_web is injecting very large raw page content (including boilerplate/cookie/CAPTCHA pages).

About your config question: yes, with Bypass Embeddings and Retrieval in full-context style, large web payloads can flood the model context and increase the chance of malformed/empty final output.

Recommended mitigation:

Lower Search Result Count (start with 2–3).
Disable full raw-context behavior / avoid bypass mode for very large pages.
Add filtering/truncation for boilerplate-heavy pages before model injection - the model doesn't need all the other information on a website

Based on the provided log, this currently looks more likely upstream/provider-side or model issue than an Open WebUI bug.

I can see malformed output framing (<think<|message|>) and empty final assistant message chunks while reasoning/tool payload exists, under very large web-search context injection.

At this stage, we should treat this as an upstream/stream issue or model dependent behaviour unless we can reproduce it with a known-good provider under similar conditions.

@Classic298 commented on GitHub (Apr 3, 2026): I reviewed it What I can confirm from your log.txt: - Multiple assistant message outputs are empty ("text": "") while reasoning/tool sections are present. - There is at least one malformed assistant output chunk containing control-like text: "<think<|message|>...". - Prompt/token load is very high (~111k–112k tokens in this chat), and search_web is injecting very large raw page content (including boilerplate/cookie/CAPTCHA pages). About your config question: yes, with Bypass Embeddings and Retrieval in full-context style, large web payloads can flood the model context and increase the chance of malformed/empty final output. Recommended mitigation: - Lower Search Result Count (start with 2–3). - Disable full raw-context behavior / avoid bypass mode for very large pages. - Add filtering/truncation for boilerplate-heavy pages before model injection - the model doesn't need all the other information on a website Based on the provided log, this currently looks more likely upstream/provider-side **or model issue** than an Open WebUI bug. I can see malformed output framing (<think<|message|>) and empty final assistant message chunks while reasoning/tool payload exists, under very large web-search context injection. At this stage, we should treat this as an upstream/stream issue **or model dependent behaviour** unless we can reproduce it with a known-good provider under similar conditions.

GiteaMirror commented

2026-04-25 09:42:10 -05:00

@seppel123 commented on GitHub (Apr 17, 2026):

This is the Doc for the Response from OSS120B on IONOS: https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example

Is the Response Block right or do they respond the false way?

Thx!

@seppel123 commented on GitHub (Apr 17, 2026): This is the Doc for the Response from OSS120B on IONOS: [https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example](https://docs.ionos.com/cloud/ai/ai-model-hub/models/llms/openai-gpt-oss-120b#reasoning-example) Is the Response Block right or do they respond the false way? Thx!

GiteaMirror commented

2026-04-25 09:42:11 -05:00

@seppel123 commented on GitHub (Apr 22, 2026):

Still there even with update 0.9.1

@seppel123 commented on GitHub (Apr 22, 2026): Still there even with update 0.9.1

GiteaMirror commented

2026-04-25 09:42:11 -05:00

@Classic298 commented on GitHub (Apr 22, 2026):

Looking at the IONOS docs you linked, their non-streaming response is spec-adjacent (it returns both reasoning and reasoning_content fields, plus content separately — Open WebUI handles all three at utils/middleware.py:4066-4070). That's fine.

But they don't document their streaming format at all — the example only shows the final usage chunk. That's where this breaks.

What you're seeing is IONOS's harmony-to-OpenAI converter leaking raw harmony tokens into the stream and partially munging <|channel|>analysis into <think. That corrupt opening tag means Open WebUI's <think> parser captures the whole rest of the response (reasoning and final answer) as reasoning, and no </think> ever arrives to close it — which is exactly the "everything is in the thinking box, nothing in the main message" symptom.

So: upstream / IONOS provider-side bug, not Open WebUI. Their harmony-format serializer is broken in streaming mode.

To confirm independently:

Try the same openai/gpt-oss-120b via OpenRouter, Groq, or a self-hosted vLLM/llama.cpp endpoint. If it's fine there, it's IONOS.
Try a non-GPT-OSS reasoning model on IONOS (e.g., Llama 3.3 70B). If that is fine but GPT-OSS isn't, it's specifically IONOS's GPT-OSS/harmony handling.

Report it to IONOS support with one of your debug-log captures showing <think<|message|> — that's unambiguous evidence of a broken harmony decoder.

@Classic298 commented on GitHub (Apr 22, 2026): Looking at the IONOS docs you linked, their non-streaming response is spec-adjacent (it returns both reasoning and reasoning_content fields, plus content separately — Open WebUI handles all three at utils/middleware.py:4066-4070). That's fine. But they don't document their streaming format at all — the example only shows the final usage chunk. That's where this breaks. The strongest clue is from your earlier log: the assistant output contained <think<|message|>…. That's not a normal OpenAI stream — it's a fragment of OpenAI's harmony chat format (GPT-OSS's native output format uses tokens like <|start|>assistant<|channel|>analysis<|message|>…<|end|><|start|>assistant<|channel|>final<|message|>…). What you're seeing is IONOS's harmony-to-OpenAI converter leaking raw harmony tokens into the stream and partially munging <|channel|>analysis into <think. That corrupt opening tag means Open WebUI's `<think>` parser captures the whole rest of the response (reasoning and final answer) as reasoning, and no `</think>` ever arrives to close it — which is exactly the "everything is in the thinking box, nothing in the main message" symptom. So: upstream / IONOS provider-side bug, not Open WebUI. Their harmony-format serializer is broken in streaming mode. To confirm independently: Try the same openai/gpt-oss-120b via OpenRouter, Groq, or a self-hosted vLLM/llama.cpp endpoint. If it's fine there, it's IONOS. Try a non-GPT-OSS reasoning model on IONOS (e.g., Llama 3.3 70B). If that is fine but GPT-OSS isn't, it's specifically IONOS's GPT-OSS/harmony handling. Report it to IONOS support with one of your debug-log captures showing <think<|message|> — that's unambiguous evidence of a broken harmony decoder.

GiteaMirror commented

2026-04-25 09:42:12 -05:00

@seppel123 commented on GitHub (Apr 22, 2026):

Thank you!!!

I contacted the support of IONOS they will investigate the issue may they can correct the response format.

I will share the results.

@seppel123 commented on GitHub (Apr 22, 2026): Thank you!!! I contacted the support of IONOS they will investigate the issue may they can correct the response format. I will share the results.

GiteaMirror commented

2026-04-25 09:42:13 -05:00

@seppel123 commented on GitHub (Apr 24, 2026):

Here the answer from IONOS Support, im not sure now if this is possible to fix?

Open-Webui say IONOS is the Problem and IONOS say Open-Webui is the problem :)

[Ticket#:207xxxx]

Hi,

we are compatible to OpenAI API but we do not fully implement everything that this API provides.
Potentially also not what OpenWebUI expects in this case.
The stream sends chunks in two separate waves using different fields:
Wave 1 — REASONING chunks (delta.reasoning):

{"delta": {"reasoning": "Answer"}}
 {"delta": {"reasoning": " philosoph"}}
 {"delta": {"reasoning": "ically"}}
 {"delta": {"reasoning": "."}}
 ← transition: {"delta": {"content": ""}}   ← empty content chunk

Wave 2 — CONTENT chunks (delta.content):

  {"delta": {"content": "The"}}
   {"delta": {"content": " question"}}

...

The Exact Problem
The delta.reasoning field is non-standard. The official OpenAI streaming spec has no reasoning field in deltas. OpenWebUI handles it by wrapping those chunks in a visible ... thinking block — but it never properly closes that block before the delta.content chunks arrive.

The empty transition chunk {"delta": {"content": ""}} between the two waves fails to signal to OpenWebUI that thinking is over. So OpenWebUI either:

Keeps the thinking block open → actual answer content streams in and appears inside the thinking section (user must expand "Thoughts" to see it)
Loses the content entirely → nothing shown at all

Streaming — What it should look like
The tags must be inline inside delta.content, never in a separate delta.reasoning field:

Chunk 1 — open thinking block:
 data: {"choices":[{"delta":
{"role":"assistant","content":"<think>"}
,"index":0,"finish_reason":null}],"id":"chatcmpl-xxx","model":"openai/gpt-oss-120b","object":"chat.completion.chunk"}
 Chunks 2–N — reasoning text, still inside content:
 data: {"choices":[{"delta":
{"content":"Answer philosophically."}
,"index":0,"finish_reason":null}],...}
 Chunk N+1 — close thinking block:
 data: {"choices":[{"delta":
{"content":"</think>"}
,"index":0,"finish_reason":null}],...}
 Chunks N+2… — actual answer:
 data: {"choices":[{"delta":
{"content":"The question "What is the meaning of life?""}
,"index":0,"finish_reason":null}],...}
 data: {"choices":[{"delta":
{"content":" has been asked by poets..."}
,"index":0,"finish_reason":null}],...}
 Final chunk:
 data: {"choices":[{"delta":{},"index":0,"finish_reason":"stop"}],...}
 data: [DONE]

@seppel123 commented on GitHub (Apr 24, 2026): Here the answer from IONOS Support, im not sure now if this is possible to fix? Open-Webui say IONOS is the Problem and IONOS say Open-Webui is the problem :) [Ticket#:207xxxx] Hi, we are compatible to OpenAI API but we do not fully implement everything that this API provides. Potentially also not what OpenWebUI expects in this case. The stream sends chunks in two separate waves using different fields: Wave 1 — REASONING chunks (delta.reasoning): ``` {"delta": {"reasoning": "Answer"}} {"delta": {"reasoning": " philosoph"}} {"delta": {"reasoning": "ically"}} {"delta": {"reasoning": "."}} ← transition: {"delta": {"content": ""}} ← empty content chunk ``` Wave 2 — CONTENT chunks (delta.content): ``` {"delta": {"content": "The"}} {"delta": {"content": " question"}} ``` ... The Exact Problem The delta.reasoning field is non-standard. The official OpenAI streaming spec has no reasoning field in deltas. OpenWebUI handles it by wrapping those chunks in a visible <think>...</think> thinking block — but it never properly closes that block before the delta.content chunks arrive. The empty transition chunk {"delta": {"content": ""}} between the two waves fails to signal to OpenWebUI that thinking is over. So OpenWebUI either: - Keeps the thinking block open → actual answer content streams in and appears inside the thinking section (user must expand "Thoughts" to see it) - Loses the content entirely → nothing shown at all Streaming — What it should look like The <think> tags must be inline inside delta.content, never in a separate delta.reasoning field: ``` Chunk 1 — open thinking block: data: {"choices":[{"delta": {"role":"assistant","content":"<think>"} ,"index":0,"finish_reason":null}],"id":"chatcmpl-xxx","model":"openai/gpt-oss-120b","object":"chat.completion.chunk"} Chunks 2–N — reasoning text, still inside content: data: {"choices":[{"delta": {"content":"Answer philosophically."} ,"index":0,"finish_reason":null}],...} Chunk N+1 — close thinking block: data: {"choices":[{"delta": {"content":"</think>"} ,"index":0,"finish_reason":null}],...} Chunks N+2… — actual answer: data: {"choices":[{"delta": {"content":"The question "What is the meaning of life?""} ,"index":0,"finish_reason":null}],...} data: {"choices":[{"delta": {"content":" has been asked by poets..."} ,"index":0,"finish_reason":null}],...} Final chunk: data: {"choices":[{"delta":{},"index":0,"finish_reason":"stop"}],...} data: [DONE] ```

GiteaMirror commented

2026-04-25 09:42:14 -05:00

@Classic298 commented on GitHub (Apr 24, 2026):

@seppel123 — commented by Claude on behalf of @Classic298 (I'm currently unable to comment directly myself).

Thanks for forwarding the IONOS response. I traced their described stream through the open-webui source (dev branch) and their diagnosis is incorrect on two of three points:

delta.reasoning is not "non-standard" relative to how the ecosystem actually works today. DeepSeek uses reasoning_content, OpenRouter / xAI / Qwen / MiniMax use reasoning, Anthropic passthroughs use thinking — Open WebUI handles all three at backend/open_webui/utils/middleware.py:4108-4112:
```
reasoning_content = (
    delta.get('reasoning_content')
    or delta.get('reasoning')
    or delta.get('thinking')
)
```
IONOS emitting delta.reasoning is fine and is not the bug.
The claim that "empty {"delta": {"content": ""}} fails to close the thinking block" is factually wrong. Open WebUI transitions from reasoning to content on the first non-empty content chunk, at middleware.py:4145-4170. The empty chunk is a no-op — which is correct, because there is no "empty content signals thinking is over" convention in OpenAI's streaming spec that a client should honor, and IONOS didn't invent one either.

Walking the exact stream they described through the code:
- {"delta":{"reasoning":"Answer"}} → opens a reasoning item, accumulates "Answer".
- {"delta":{"content":""}} → value = "", falsy; if value: skipped. No-op. Reasoning stays open (correct).
- {"delta":{"content":"The"}} → value = "The", truthy. Enters the close branch at 4145: last item is a reasoning item with attributes.type == 'reasoning_content', so it marks it status='completed', sets ended_at / duration, appends a fresh message item, and streams "The" into the message.
So the reasoning → content transition works for the exact stream IONOS described. No bug on our side.
The actual root cause from the original log is a Harmony format token leak, and that IS on IONOS's side. GPT-OSS natively outputs Harmony tokens like <|start|>assistant<|channel|>analysis<|message|>…<|end|><|start|>assistant<|channel|>final<|message|>…. In your earlier log we saw <think<|message|>… in the assistant output — that's IONOS's Harmony-to-OpenAI converter mangling <|channel|>analysis into <think, producing a malformed opening tag. Open WebUI's reasoning-tag detector (middleware.py:153-162) then keeps the reasoning block open because no </think> ever arrives. That's exactly the "everything is in the thinking box, nothing in the main message" symptom.

Their proposed "fix" (emitting <think>/</think> inline inside delta.content) would also work — Open WebUI accepts that via the reasoning-tag detector listed above. But they don't need to change format; they need to stop their Harmony serializer from leaking tokens into the stream.

If you want to send something back to IONOS support, the one-line version is: the streaming delta.reasoning → delta.content transition works correctly in Open WebUI; the actual bug is the Harmony token leak (<think<|message|>) their serializer produces, which is unambiguously on their side.

@Classic298 commented on GitHub (Apr 24, 2026): @seppel123 — commented by Claude on behalf of @Classic298 (I'm currently unable to comment directly myself). Thanks for forwarding the IONOS response. I traced their described stream through the open-webui source (dev branch) and their diagnosis is incorrect on two of three points: 1. **`delta.reasoning` is not "non-standard"** relative to how the ecosystem actually works today. DeepSeek uses `reasoning_content`, OpenRouter / xAI / Qwen / MiniMax use `reasoning`, Anthropic passthroughs use `thinking` — Open WebUI handles all three at `backend/open_webui/utils/middleware.py:4108-4112`: ```python reasoning_content = ( delta.get('reasoning_content') or delta.get('reasoning') or delta.get('thinking') ) ``` IONOS emitting `delta.reasoning` is fine and is not the bug. 2. **The claim that "empty `{"delta": {"content": ""}}` fails to close the thinking block" is factually wrong.** Open WebUI transitions from reasoning to content on the first **non-empty** content chunk, at `middleware.py:4145-4170`. The empty chunk is a no-op — which is correct, because there is no "empty content signals thinking is over" convention in OpenAI's streaming spec that a client should honor, and IONOS didn't invent one either. Walking the exact stream they described through the code: - `{"delta":{"reasoning":"Answer"}}` → opens a reasoning item, accumulates "Answer". - `{"delta":{"content":""}}` → `value = ""`, falsy; `if value:` skipped. No-op. Reasoning stays open (correct). - `{"delta":{"content":"The"}}` → `value = "The"`, truthy. Enters the close branch at 4145: last item is a reasoning item with `attributes.type == 'reasoning_content'`, so it marks it `status='completed'`, sets `ended_at` / `duration`, appends a fresh message item, and streams "The" into the message. So the reasoning → content transition works for the exact stream IONOS described. No bug on our side. 3. **The actual root cause from the original log is a Harmony format token leak**, and that IS on IONOS's side. GPT-OSS natively outputs Harmony tokens like `<|start|>assistant<|channel|>analysis<|message|>…<|end|><|start|>assistant<|channel|>final<|message|>…`. In your earlier log we saw `<think<|message|>…` in the assistant output — that's IONOS's Harmony-to-OpenAI converter mangling `<|channel|>analysis` into `<think`, producing a malformed opening tag. Open WebUI's reasoning-tag detector (`middleware.py:153-162`) then keeps the reasoning block open because no `</think>` ever arrives. That's exactly the "everything is in the thinking box, nothing in the main message" symptom. Their proposed "fix" (emitting `<think>`/`</think>` inline inside `delta.content`) would also work — Open WebUI accepts that via the reasoning-tag detector listed above. But they don't need to change format; they need to stop their Harmony serializer from leaking tokens into the stream. If you want to send something back to IONOS support, the one-line version is: *the streaming `delta.reasoning` → `delta.content` transition works correctly in Open WebUI; the actual bug is the Harmony token leak (`<think<|message|>`) their serializer produces, which is unambiguously on their side.*

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#35485