[GH-ISSUE #23844] issue: <think> wrap in convert_output_to_messages leaks reasoning markup for chat templates that don't strip <think> (Gemma 4 et al.) — regression of #23742 #20084

New Issue

GiteaMirror · 2026-04-20T02:40:18-05:00

GiteaMirror commented

2026-04-20 02:40:18 -05:00

Originally created by @nulltea on GitHub (Apr 17, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23844

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

dev @ 3dd825581 ("refac" — the preserve-reasoning_content change from PR #23742, landed 2026-04-17). Same bug is present on :main (pre-PR-#23742 behavior) via a slightly different path, but this issue targets current dev.

Ollama Version (if applicable)

N/A — backend is llama.cpp (ghcr.io/ggml-org/llama.cpp:server-vulkan, build b8772-bafae2765) exposing an OpenAI-compatible /v1/chat/completions endpoint.

Operating System

Ubuntu 22.04

Browser (if applicable)

Zen

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

After a tool-call round-trip with a reasoning-capable model served over an OpenAI-compatible endpoint that natively emits delta.reasoning_content (e.g. llama.cpp serving unsloth/gemma-4-26B-A4B-it-GGUF), the reply should render as a collapsible "Thought for N seconds" block with clean assistant content below, identical to the tool-less case.

Actual Behavior

The reply leaks reasoning-channel text into the visible message body. On Gemma 4:

Starts with a literal paragraph thought.
Sometimes followed by a literal <channel|> marker.
The full chain-of-thought renders as regular message content instead of in the reasoning collapsible.

Tool-less chats with the same model show no leak — reasoning separates correctly.

Steps to Reproduce

Run Open WebUI current dev in Docker.
Add an OpenAI-compatible connection to a llama.cpp server serving unsloth/gemma-4-26B-A4B-it-GGUF (tested with ghcr.io/ggml-org/llama.cpp:server-vulkan build b8772-bafae2765).
Admin Panel → Settings → Models → gemma 4 (26B A4B): Function Calling = Native. Other Advanced Params = Default.

Workspace → Tools → create a minimal tool:

class Tools:
    def lookup(self, q: str) -> str:
        """Return a fixed fact about the query."""
        return f"Fact about {q}: it is a concept."

New chat, select the model, enable the tool, send any prompt the model decides to call the tool for.
Observe: reply begins with literal thought / <channel|> markup.

Reproduction without the UI** (same leak via direct API call, isolates it to OpenWebUI's message shape — send the assistant history exactly as convert_output_to_messages(raw=True) currently builds it: reasoning_content field AND <think>…</think> wrap in content):

curl -sSk -N -X POST <openai-compat-endpoint>/v1/chat/completions \
  -H "Authorization: Bearer <key>" -H "Content-Type: application/json" -d '{
    "model":"<gemma-4-26b-a4b-alias>",
    "messages":[
      {"role":"user","content":"test"},
      {"role":"assistant","content":"<think>reasoning text</think>","reasoning_content":"reasoning text","tool_calls":[{"type":"function","id":"c1","function":{"name":"lookup","arguments":"{\"q\":\"test\"}"}}]},
      {"role":"tool","tool_call_id":"c1","name":"lookup","content":"fact"}
    ],
    "stream":true,"max_tokens":80,
    "tools":[{"type":"function","function":{"name":"lookup","description":"x","parameters":{"type":"object","properties":{"q":{"type":"string"}},"required":["q"]}}}]
  }'

delta.contentstream starts with literal tokensthought, \n, <channel|>, then bleeds the reply. Removing just the …wrap from the assistant'scontent(keeping thereasoning_contentfield alone) makes the same request stream reasoning cleanly indelta.reasoning_contentand only the final answer indelta.content`.

Logs & Screenshots

Additional Information

Fix: https://github.com/open-webui/open-webui/pull/23843

Originally created by @nulltea on GitHub (Apr 17, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23844 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version dev @ 3dd825581 ("refac" — the preserve-reasoning_content change from PR #23742, landed 2026-04-17). Same bug is present on `:main` (pre-PR-#23742 behavior) via a slightly different path, but this issue targets current `dev`. ### Ollama Version (if applicable) N/A — backend is llama.cpp (ghcr.io/ggml-org/llama.cpp:server-vulkan, build b8772-bafae2765) exposing an OpenAI-compatible /v1/chat/completions endpoint. ### Operating System Ubuntu 22.04 ### Browser (if applicable) Zen ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior After a tool-call round-trip with a reasoning-capable model served over an OpenAI-compatible endpoint that natively emits `delta.reasoning_content` (e.g. llama.cpp serving `unsloth/gemma-4-26B-A4B-it-GGUF`), the reply should render as a collapsible "Thought for N seconds" block with clean assistant content below, identical to the tool-less case. ### Actual Behavior The reply leaks reasoning-channel text into the visible message body. On Gemma 4: - Starts with a literal paragraph `thought`. - Sometimes followed by a literal `<channel|>` marker. - The full chain-of-thought renders as regular message content instead of in the reasoning collapsible. Tool-less chats with the same model show no leak — reasoning separates correctly. ### Steps to Reproduce 1. Run Open WebUI current `dev` in Docker. 2. Add an OpenAI-compatible connection to a llama.cpp server serving `unsloth/gemma-4-26B-A4B-it-GGUF` (tested with `ghcr.io/ggml-org/llama.cpp:server-vulkan` build `b8772-bafae2765`). 3. Admin Panel → Settings → Models → `gemma 4 (26B A4B)`: Function Calling = **Native**. Other Advanced Params = Default. 4. Workspace → Tools → create a minimal tool: ```python class Tools: def lookup(self, q: str) -> str: """Return a fixed fact about the query.""" return f"Fact about {q}: it is a concept." ``` 5. New chat, select the model, enable the tool, send any prompt the model decides to call the tool for. 6. Observe: reply begins with literal `thought` / `<channel|>` markup. Reproduction without the UI** (same leak via direct API call, isolates it to OpenWebUI's message shape — send the assistant history exactly as `convert_output_to_messages(raw=True)` currently builds it: `reasoning_content` field AND `<think>…</think>` wrap in `content`): ``` curl -sSk -N -X POST <openai-compat-endpoint>/v1/chat/completions \ -H "Authorization: Bearer <key>" -H "Content-Type: application/json" -d '{ "model":"<gemma-4-26b-a4b-alias>", "messages":[ {"role":"user","content":"test"}, {"role":"assistant","content":"<think>reasoning text</think>","reasoning_content":"reasoning text","tool_calls":[{"type":"function","id":"c1","function":{"name":"lookup","arguments":"{\"q\":\"test\"}"}}]}, {"role":"tool","tool_call_id":"c1","name":"lookup","content":"fact"} ], "stream":true,"max_tokens":80, "tools":[{"type":"function","function":{"name":"lookup","description":"x","parameters":{"type":"object","properties":{"q":{"type":"string"}},"required":["q"]}}}] }' ``` delta.content` stream starts with literal tokens `thought`, `\n`, `<channel|>`, then bleeds the reply. Removing just the `<think>…</think>` wrap from the assistant's `content` (keeping the `reasoning_content` field alone) makes the same request stream reasoning cleanly in `delta.reasoning_content` and only the final answer in `delta.content`. ### Logs & Screenshots <img width="1095" height="952" alt="Image" src="https://github.com/user-attachments/assets/f8966c9d-dc3c-4f16-babe-be2f611c821b" /> <img width="1023" height="177" alt="Image" src="https://github.com/user-attachments/assets/c70e0b96-4d5e-4409-b637-22ede42d3b31" /> ### Additional Information Fix: https://github.com/open-webui/open-webui/pull/23843

GiteaMirror added the bug label 2026-04-20 02:40:18 -05:00

GiteaMirror referenced this issue

2026-04-20 05:56:43 -05:00

[PR #20084] [CLOSED] fix: fix #19991 uv issue #25459

GiteaMirror referenced this issue

2026-04-20 05:57:45 -05:00

[PR #20190] [CLOSED] fix: langchain dependencies update is missing (#19991) #25496

GiteaMirror referenced this issue

2026-04-25 13:24:39 -05:00

[PR #20084] [CLOSED] fix: fix #19991 uv issue #41089

GiteaMirror referenced this issue

2026-04-25 13:26:15 -05:00