issue: Rendering bug when the response from the model contains <think> #6156

New Issue

GiteaMirror · 2025-11-11T16:46:18-06:00

GiteaMirror commented

2025-11-11 16:46:18 -06:00

Originally created by @alanxmay on GitHub (Aug 21, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.22

Ollama Version (if applicable)

No response

Operating System

ubuntu 22.04

Browser (if applicable)

chrome

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The non-reasoning model's output should not include thinking block..., just like in qwen chat:

Actual Behavior

Steps to Reproduce

USER: repeat <think> 5 times

Logs & Screenshots

Additional Information

No response

Originally created by @alanxmay on GitHub (Aug 21, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.22 ### Ollama Version (if applicable) _No response_ ### Operating System ubuntu 22.04 ### Browser (if applicable) chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The non-reasoning model's output should not include thinking block..., just like in qwen chat: <img width="2036" height="696" alt="Image" src="https://github.com/user-attachments/assets/48f2cac9-9ee7-4010-b424-7384ded3604b" /> ### Actual Behavior <img width="1870" height="648" alt="Image" src="https://github.com/user-attachments/assets/2526b522-3c9d-4920-852b-449cc5f7b914" /> ### Steps to Reproduce USER: `repeat <think> 5 times` ### Logs & Screenshots <img width="1870" height="648" alt="Image" src="https://github.com/user-attachments/assets/7c936af8-fd46-497a-bc70-0e5e531d7c95" /> ### Additional Information _No response_

GiteaMirror added the bug label 2025-11-11 16:46:18 -06:00

GiteaMirror closed this issue

2025-11-11 16:46:18 -06:00

GiteaMirror commented

2025-11-11 16:46:19 -06:00

@alanxmay commented on GitHub (Aug 21, 2025):

related issue #15461

@alanxmay commented on GitHub (Aug 21, 2025): related issue #15461

GiteaMirror commented

2025-11-11 16:46:19 -06:00

@tjbck commented on GitHub (Aug 21, 2025):

This is not a trivial fix as a lot of Ollama model depend on this behaviour. With that being said, we can investigate a way to make this an option in the model editor.

@tjbck commented on GitHub (Aug 21, 2025): This is not a trivial fix as a lot of Ollama model depend on this behaviour. With that being said, we can investigate a way to make this an option in the model editor.

GiteaMirror commented

2025-11-11 16:46:19 -06:00

@alanxmay commented on GitHub (Aug 22, 2025):

The key issue is how to distinguish reasoning content from the response provided by a provider API. Since many providers use vLLM as the inference engine, their responses generally support the reasoning_content field.

from openai import OpenAI

client = OpenAI(...)
...
stream = client.chat.completions.create(model=model,
                                        messages=messages,
                                        stream=True)

print("client: Start streaming chat completions...")
printed_reasoning_content = False
printed_content = False

for chunk in stream:
    reasoning_content = None
    content = None
    # Check the content is reasoning_content or content
    if hasattr(chunk.choices[0].delta, "reasoning_content"):
        reasoning_content = chunk.choices[0].delta.reasoning_content
    elif hasattr(chunk.choices[0].delta, "content"):
        content = chunk.choices[0].delta.content

    if reasoning_content is not None:
        if not printed_reasoning_content:
            printed_reasoning_content = True
            print("reasoning_content:", end="", flush=True)
        print(reasoning_content, end="", flush=True)
    elif content is not None:
        if not printed_content:
            printed_content = True
            print("\ncontent:", end="", flush=True)
        # Extract and print the content
        print(content, end="", flush=True)

So a better solution maybe make an option for the provider.

For example, in the DeepSeek API, the reasoning content also located in chunk.choices[0].delta.reasoning_content.

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True
)

reasoning_content = ""
content = ""

for chunk in response:
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    else:
        content += chunk.choices[0].delta.content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"})
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True
)
# ...

For ollama whenstream=True, the reasoning content live in the chunk.message.thinking.

import ollama from 'ollama'

async function main() {
  const response = await ollama.chat({
    model: 'deepseek-r1',
    messages: [
      {
        role: 'user',
        content: 'What is 10 + 23',
      },
    ],
    stream: true,
    think: true,
  })

  let startedThinking = false
  let finishedThinking = false

  for await (const chunk of response) {
    if (chunk.message.thinking && !startedThinking) {
      startedThinking = true
      process.stdout.write('Thinking:\n========\n\n')
    } else if (chunk.message.content && startedThinking && !finishedThinking) {
      finishedThinking = true
      process.stdout.write('\n\nResponse:\n========\n\n')
    }

    if (chunk.message.thinking) {
      process.stdout.write(chunk.message.thinking)
    } else if (chunk.message.content) {
      process.stdout.write(chunk.message.content)
    }
  }
}

main()

@alanxmay commented on GitHub (Aug 22, 2025): The key issue is how to distinguish reasoning content from the response provided by a provider API. Since many providers use vLLM as the inference engine, their responses generally support the [`reasoning_content` field](https://docs.vllm.ai/en/v0.9.1/features/reasoning_outputs.html#streaming-chat-completions). ``` from openai import OpenAI client = OpenAI(...) ... stream = client.chat.completions.create(model=model, messages=messages, stream=True) print("client: Start streaming chat completions...") printed_reasoning_content = False printed_content = False for chunk in stream: reasoning_content = None content = None # Check the content is reasoning_content or content if hasattr(chunk.choices[0].delta, "reasoning_content"): reasoning_content = chunk.choices[0].delta.reasoning_content elif hasattr(chunk.choices[0].delta, "content"): content = chunk.choices[0].delta.content if reasoning_content is not None: if not printed_reasoning_content: printed_reasoning_content = True print("reasoning_content:", end="", flush=True) print(reasoning_content, end="", flush=True) elif content is not None: if not printed_content: printed_content = True print("\ncontent:", end="", flush=True) # Extract and print the content print(content, end="", flush=True) ``` So a better solution maybe make an option for the provider. For example, in the [DeepSeek API](https://api-docs.deepseek.com/guides/reasoning_model#api-example), the reasoning content also located in `chunk.choices[0].delta.reasoning_content`. ```python from openai import OpenAI client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com") # Round 1 messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}] response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=True ) reasoning_content = "" content = "" for chunk in response: if chunk.choices[0].delta.reasoning_content: reasoning_content += chunk.choices[0].delta.reasoning_content else: content += chunk.choices[0].delta.content # Round 2 messages.append({"role": "assistant", "content": content}) messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"}) response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=True ) # ... ``` For [ollama](https://ollama.com/blog/thinking) when`stream=True`, the reasoning content live in the `chunk.message.thinking`. ```javascript import ollama from 'ollama' async function main() { const response = await ollama.chat({ model: 'deepseek-r1', messages: [ { role: 'user', content: 'What is 10 + 23', }, ], stream: true, think: true, }) let startedThinking = false let finishedThinking = false for await (const chunk of response) { if (chunk.message.thinking && !startedThinking) { startedThinking = true process.stdout.write('Thinking:\n========\n\n') } else if (chunk.message.content && startedThinking && !finishedThinking) { finishedThinking = true process.stdout.write('\n\nResponse:\n========\n\n') } if (chunk.message.thinking) { process.stdout.write(chunk.message.thinking) } else if (chunk.message.content) { process.stdout.write(chunk.message.content) } } } main() ```

GiteaMirror commented

2025-11-11 16:46:19 -06:00

@tjbck commented on GitHub (Aug 26, 2025):

Closing in favour of #16930

@tjbck commented on GitHub (Aug 26, 2025): Closing in favour of #16930

issue: Rendering bug when the response from the model contains <think> #6156

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

issue: Rendering bug when the response from the model contains `<think>` #6156