issue: Non-closing think section with latest openAI API backend with reasoning_parser #5618

New Issue

GiteaMirror · 2025-11-11T16:26:27-06:00

GiteaMirror commented

2025-11-11 16:26:27 -06:00

Originally created by @jingyibo123 on GitHub (Jun 23, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

0.6.15

Ollama Version (if applicable)

No response

Operating System

Ubuntu 20.04

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Expected behavior: show reasoning_content in the frontend folded think-section, and when reasoning_content is over and content starts, finish the think-section and show content outside..

Actual Behavior

When working on think-enabled models such as qwen3 on latest openAI server deployed with lmdeploy, which returns first reasoning_content then content without </think> clouse, the open-webui frontend shows a ever-lasting thinking section including both the reasoning_content and the actual content.

lmdeploy's API's behavior ( stream response):

... content=<think>...
... reasoning_content="XX"...
... reasoning_content="XX"...
... reasoning_content="XX"...
... content="XX"...
... content="XX"...

vllm's API's behavior ( stream response):

... content=""...
... reasoning_content="XX"...
... reasoning_content="XX"...
... reasoning_content="XX"...
... content="XX"...
... content="XX"...

without reasoning enabled:

... content="<think>"...
... content="XX"...
... content="XX"...
... content="</think>"...
... content="XX"...
... content="XX"...

Open-webuiworks OK when not enabling reasoning or enabled with vllm, NOT with deploy

Steps to Reproduce

Deploy mode with lmdeploy

lmdeploy serve api_server \
    Qwen/Qwen3-32B \
    --reasoning-parser qwen-qwq \
    --tool-call-parser qwen

Logs & Screenshots

No error in logs.

Additional Information

No response

Originally created by @jingyibo123 on GitHub (Jun 23, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version 0.6.15 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 20.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Expected behavior: show `reasoning_content` in the frontend folded think-section, and when reasoning_content is over and content starts, finish the think-section and show content outside.. ### Actual Behavior When working on think-enabled models such as qwen3 on latest openAI server deployed with lmdeploy, which returns first `reasoning_content` then `content` **without** `</think>` clouse, the open-webui frontend shows a ever-lasting thinking section including both the reasoning_content and the actual content. lmdeploy's API's behavior ( stream response): ``` ... content=<think>... ... reasoning_content="XX"... ... reasoning_content="XX"... ... reasoning_content="XX"... ... content="XX"... ... content="XX"... ``` vllm's API's behavior ( stream response): ``` ... content=""... ... reasoning_content="XX"... ... reasoning_content="XX"... ... reasoning_content="XX"... ... content="XX"... ... content="XX"... ``` without reasoning enabled: ``` ... content="<think>"... ... content="XX"... ... content="XX"... ... content="</think>"... ... content="XX"... ... content="XX"... ``` Open-webuiworks OK when not enabling reasoning or enabled with vllm, NOT with deploy ### Steps to Reproduce Deploy mode with lmdeploy ```bash lmdeploy serve api_server \ Qwen/Qwen3-32B \ --reasoning-parser qwen-qwq \ --tool-call-parser qwen ``` ### Logs & Screenshots No error in logs. ### Additional Information _No response_

GiteaMirror added the bug label 2025-11-11 16:26:27 -06:00

GiteaMirror closed this issue

2025-11-11 16:26:28 -06:00

GiteaMirror commented

2025-11-11 16:26:28 -06:00

@tjbck commented on GitHub (Jun 24, 2025):

This is a model level issue.

@tjbck commented on GitHub (Jun 24, 2025): This is a model level issue.

GiteaMirror commented

2025-11-11 16:26:28 -06:00

@jingyibo123 commented on GitHub (Jun 25, 2025):

@tjbck I'd like to know what is OpenWebUI's expected streaming response format for a think-enabled openAI-compatible chat API?

@jingyibo123 commented on GitHub (Jun 25, 2025): @tjbck I'd like to know what is OpenWebUI's expected streaming response format for a think-enabled openAI-compatible chat API?

GiteaMirror referenced this issue

2026-04-19 20:33:07 -05:00

[GH-ISSUE #5618] 'Internal Server Error' after update via Pip to v. 0.3.22 and 0.3.23 on Windows #14055

GiteaMirror referenced this issue

2026-04-25 03:56:38 -05:00

[GH-ISSUE #5618] 'Internal Server Error' after update via Pip to v. 0.3.22 and 0.3.23 on Windows #29583

GiteaMirror referenced this issue

2026-05-05 13:49:38 -05:00

[GH-ISSUE #5618] 'Internal Server Error' after update via Pip to v. 0.3.22 and 0.3.23 on Windows #52721

Sign in to join this conversation.