[GH-ISSUE #23844] issue: <think> wrap in convert_output_to_messages leaks reasoning markup for chat templates that don't strip <think> (Gemma 4 et al.) — regression of #23742 #58752

Open
opened 2026-05-05 23:51:13 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @nulltea on GitHub (Apr 17, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23844

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

dev @ 3dd825581 ("refac" — the preserve-reasoning_content change from PR #23742, landed 2026-04-17). Same bug is present on :main (pre-PR-#23742 behavior) via a slightly different path, but this issue targets current dev.

Ollama Version (if applicable)

N/A — backend is llama.cpp (ghcr.io/ggml-org/llama.cpp:server-vulkan, build b8772-bafae2765) exposing an OpenAI-compatible /v1/chat/completions endpoint.

Operating System

Ubuntu 22.04

Browser (if applicable)

Zen

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

After a tool-call round-trip with a reasoning-capable model served over an OpenAI-compatible endpoint that natively emits delta.reasoning_content (e.g. llama.cpp serving unsloth/gemma-4-26B-A4B-it-GGUF), the reply should render as a collapsible "Thought for N seconds" block with clean assistant content below, identical to the tool-less case.

Actual Behavior

The reply leaks reasoning-channel text into the visible message body. On Gemma 4:

  • Starts with a literal paragraph thought.
  • Sometimes followed by a literal <channel|> marker.
  • The full chain-of-thought renders as regular message content instead of in the reasoning collapsible.

Tool-less chats with the same model show no leak — reasoning separates correctly.

Steps to Reproduce

  1. Run Open WebUI current dev in Docker.

  2. Add an OpenAI-compatible connection to a llama.cpp server serving unsloth/gemma-4-26B-A4B-it-GGUF (tested with ghcr.io/ggml-org/llama.cpp:server-vulkan build b8772-bafae2765).

  3. Admin Panel → Settings → Models → gemma 4 (26B A4B): Function Calling = Native. Other Advanced Params = Default.

  4. Workspace → Tools → create a minimal tool:

    class Tools:
        def lookup(self, q: str) -> str:
            """Return a fixed fact about the query."""
            return f"Fact about {q}: it is a concept."
    
  5. New chat, select the model, enable the tool, send any prompt the model decides to call the tool for.

  6. Observe: reply begins with literal thought / <channel|> markup.

Reproduction without the UI** (same leak via direct API call, isolates it to OpenWebUI's message shape — send the assistant history exactly as convert_output_to_messages(raw=True) currently builds it: reasoning_content field AND <think>…</think> wrap in content):

curl -sSk -N -X POST <openai-compat-endpoint>/v1/chat/completions \
  -H "Authorization: Bearer <key>" -H "Content-Type: application/json" -d '{
    "model":"<gemma-4-26b-a4b-alias>",
    "messages":[
      {"role":"user","content":"test"},
      {"role":"assistant","content":"<think>reasoning text</think>","reasoning_content":"reasoning text","tool_calls":[{"type":"function","id":"c1","function":{"name":"lookup","arguments":"{\"q\":\"test\"}"}}]},
      {"role":"tool","tool_call_id":"c1","name":"lookup","content":"fact"}
    ],
    "stream":true,"max_tokens":80,
    "tools":[{"type":"function","function":{"name":"lookup","description":"x","parameters":{"type":"object","properties":{"q":{"type":"string"}},"required":["q"]}}}]
  }'

delta.contentstream starts with literal tokensthought, \n, <channel|>, then bleeds the reply. Removing just the wrap from the assistant'scontent(keeping thereasoning_contentfield alone) makes the same request stream reasoning cleanly indelta.reasoning_contentand only the final answer indelta.content`.

Logs & Screenshots

Image Image

Additional Information

Fix: https://github.com/open-webui/open-webui/pull/23843

Originally created by @nulltea on GitHub (Apr 17, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23844 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version dev @ 3dd825581 ("refac" — the preserve-reasoning_content change from PR #23742, landed 2026-04-17). Same bug is present on `:main` (pre-PR-#23742 behavior) via a slightly different path, but this issue targets current `dev`. ### Ollama Version (if applicable) N/A — backend is llama.cpp (ghcr.io/ggml-org/llama.cpp:server-vulkan, build b8772-bafae2765) exposing an OpenAI-compatible /v1/chat/completions endpoint. ### Operating System Ubuntu 22.04 ### Browser (if applicable) Zen ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior After a tool-call round-trip with a reasoning-capable model served over an OpenAI-compatible endpoint that natively emits `delta.reasoning_content` (e.g. llama.cpp serving `unsloth/gemma-4-26B-A4B-it-GGUF`), the reply should render as a collapsible "Thought for N seconds" block with clean assistant content below, identical to the tool-less case. ### Actual Behavior The reply leaks reasoning-channel text into the visible message body. On Gemma 4: - Starts with a literal paragraph `thought`. - Sometimes followed by a literal `<channel|>` marker. - The full chain-of-thought renders as regular message content instead of in the reasoning collapsible. Tool-less chats with the same model show no leak — reasoning separates correctly. ### Steps to Reproduce 1. Run Open WebUI current `dev` in Docker. 2. Add an OpenAI-compatible connection to a llama.cpp server serving `unsloth/gemma-4-26B-A4B-it-GGUF` (tested with `ghcr.io/ggml-org/llama.cpp:server-vulkan` build `b8772-bafae2765`). 3. Admin Panel → Settings → Models → `gemma 4 (26B A4B)`: Function Calling = **Native**. Other Advanced Params = Default. 4. Workspace → Tools → create a minimal tool: ```python class Tools: def lookup(self, q: str) -> str: """Return a fixed fact about the query.""" return f"Fact about {q}: it is a concept." ``` 5. New chat, select the model, enable the tool, send any prompt the model decides to call the tool for. 6. Observe: reply begins with literal `thought` / `<channel|>` markup. Reproduction without the UI** (same leak via direct API call, isolates it to OpenWebUI's message shape — send the assistant history exactly as `convert_output_to_messages(raw=True)` currently builds it: `reasoning_content` field AND `<think>…</think>` wrap in `content`): ``` curl -sSk -N -X POST <openai-compat-endpoint>/v1/chat/completions \ -H "Authorization: Bearer <key>" -H "Content-Type: application/json" -d '{ "model":"<gemma-4-26b-a4b-alias>", "messages":[ {"role":"user","content":"test"}, {"role":"assistant","content":"<think>reasoning text</think>","reasoning_content":"reasoning text","tool_calls":[{"type":"function","id":"c1","function":{"name":"lookup","arguments":"{\"q\":\"test\"}"}}]}, {"role":"tool","tool_call_id":"c1","name":"lookup","content":"fact"} ], "stream":true,"max_tokens":80, "tools":[{"type":"function","function":{"name":"lookup","description":"x","parameters":{"type":"object","properties":{"q":{"type":"string"}},"required":["q"]}}}] }' ``` delta.content` stream starts with literal tokens `thought`, `\n`, `<channel|>`, then bleeds the reply. Removing just the `<think>…</think>` wrap from the assistant's `content` (keeping the `reasoning_content` field alone) makes the same request stream reasoning cleanly in `delta.reasoning_content` and only the final answer in `delta.content`. ### Logs & Screenshots <img width="1095" height="952" alt="Image" src="https://github.com/user-attachments/assets/f8966c9d-dc3c-4f16-babe-be2f611c821b" /> <img width="1023" height="177" alt="Image" src="https://github.com/user-attachments/assets/c70e0b96-4d5e-4409-b637-22ede42d3b31" /> ### Additional Information Fix: https://github.com/open-webui/open-webui/pull/23843
GiteaMirror added the bug label 2026-05-05 23:51:13 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58752