mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nulltea on GitHub (Apr 17, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23844
Check Existing Issues
Installation Method
Docker
Open WebUI Version
dev @
3dd825581("refac" — the preserve-reasoning_content change from PR #23742, landed 2026-04-17). Same bug is present on:main(pre-PR-#23742 behavior) via a slightly different path, but this issue targets currentdev.Ollama Version (if applicable)
N/A — backend is llama.cpp (ghcr.io/ggml-org/llama.cpp:server-vulkan, build b8772-bafae2765) exposing an OpenAI-compatible /v1/chat/completions endpoint.
Operating System
Ubuntu 22.04
Browser (if applicable)
Zen
Confirmation
README.md.Expected Behavior
After a tool-call round-trip with a reasoning-capable model served over an OpenAI-compatible endpoint that natively emits
delta.reasoning_content(e.g. llama.cpp servingunsloth/gemma-4-26B-A4B-it-GGUF), the reply should render as a collapsible "Thought for N seconds" block with clean assistant content below, identical to the tool-less case.Actual Behavior
The reply leaks reasoning-channel text into the visible message body. On Gemma 4:
thought.<channel|>marker.Tool-less chats with the same model show no leak — reasoning separates correctly.
Steps to Reproduce
Run Open WebUI current
devin Docker.Add an OpenAI-compatible connection to a llama.cpp server serving
unsloth/gemma-4-26B-A4B-it-GGUF(tested withghcr.io/ggml-org/llama.cpp:server-vulkanbuildb8772-bafae2765).Admin Panel → Settings → Models →
gemma 4 (26B A4B): Function Calling = Native. Other Advanced Params = Default.Workspace → Tools → create a minimal tool:
New chat, select the model, enable the tool, send any prompt the model decides to call the tool for.
Observe: reply begins with literal
thought/<channel|>markup.Reproduction without the UI** (same leak via direct API call, isolates it to OpenWebUI's message shape — send the assistant history exactly as
convert_output_to_messages(raw=True)currently builds it:reasoning_contentfield AND<think>…</think>wrap incontent):delta.content
stream starts with literal tokensthought,\n,<channel|>, then bleeds the reply. Removing just the…wrap from the assistant'scontent(keeping thereasoning_contentfield alone) makes the same request stream reasoning cleanly indelta.reasoning_contentand only the final answer indelta.content`.Logs & Screenshots
Additional Information
Fix: https://github.com/open-webui/open-webui/pull/23843