issue: Rendering bug when the response from the model contains <think> #6156

Closed
opened 2025-11-11 16:46:18 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @alanxmay on GitHub (Aug 21, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.22

Ollama Version (if applicable)

No response

Operating System

ubuntu 22.04

Browser (if applicable)

chrome

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The non-reasoning model's output should not include thinking block..., just like in qwen chat:
Image

Actual Behavior

Image

Steps to Reproduce

USER: repeat <think> 5 times

Logs & Screenshots

Image

Additional Information

No response

Originally created by @alanxmay on GitHub (Aug 21, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.22 ### Ollama Version (if applicable) _No response_ ### Operating System ubuntu 22.04 ### Browser (if applicable) chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The non-reasoning model's output should not include thinking block..., just like in qwen chat: <img width="2036" height="696" alt="Image" src="https://github.com/user-attachments/assets/48f2cac9-9ee7-4010-b424-7384ded3604b" /> ### Actual Behavior <img width="1870" height="648" alt="Image" src="https://github.com/user-attachments/assets/2526b522-3c9d-4920-852b-449cc5f7b914" /> ### Steps to Reproduce USER: `repeat <think> 5 times` ### Logs & Screenshots <img width="1870" height="648" alt="Image" src="https://github.com/user-attachments/assets/7c936af8-fd46-497a-bc70-0e5e531d7c95" /> ### Additional Information _No response_
GiteaMirror added the bug label 2025-11-11 16:46:18 -06:00
Author
Owner

@alanxmay commented on GitHub (Aug 21, 2025):

related issue #15461

@alanxmay commented on GitHub (Aug 21, 2025): related issue #15461
Author
Owner

@tjbck commented on GitHub (Aug 21, 2025):

This is not a trivial fix as a lot of Ollama model depend on this behaviour. With that being said, we can investigate a way to make this an option in the model editor.

@tjbck commented on GitHub (Aug 21, 2025): This is not a trivial fix as a lot of Ollama model depend on this behaviour. With that being said, we can investigate a way to make this an option in the model editor.
Author
Owner

@alanxmay commented on GitHub (Aug 22, 2025):

The key issue is how to distinguish reasoning content from the response provided by a provider API. Since many providers use vLLM as the inference engine, their responses generally support the reasoning_content field.

from openai import OpenAI

client = OpenAI(...)
...
stream = client.chat.completions.create(model=model,
                                        messages=messages,
                                        stream=True)

print("client: Start streaming chat completions...")
printed_reasoning_content = False
printed_content = False

for chunk in stream:
    reasoning_content = None
    content = None
    # Check the content is reasoning_content or content
    if hasattr(chunk.choices[0].delta, "reasoning_content"):
        reasoning_content = chunk.choices[0].delta.reasoning_content
    elif hasattr(chunk.choices[0].delta, "content"):
        content = chunk.choices[0].delta.content

    if reasoning_content is not None:
        if not printed_reasoning_content:
            printed_reasoning_content = True
            print("reasoning_content:", end="", flush=True)
        print(reasoning_content, end="", flush=True)
    elif content is not None:
        if not printed_content:
            printed_content = True
            print("\ncontent:", end="", flush=True)
        # Extract and print the content
        print(content, end="", flush=True)

So a better solution maybe make an option for the provider.

For example, in the DeepSeek API, the reasoning content also located in chunk.choices[0].delta.reasoning_content.

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True
)

reasoning_content = ""
content = ""

for chunk in response:
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    else:
        content += chunk.choices[0].delta.content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"})
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True
)
# ...

For ollama whenstream=True, the reasoning content live in the chunk.message.thinking.

import ollama from 'ollama'

async function main() {
  const response = await ollama.chat({
    model: 'deepseek-r1',
    messages: [
      {
        role: 'user',
        content: 'What is 10 + 23',
      },
    ],
    stream: true,
    think: true,
  })

  let startedThinking = false
  let finishedThinking = false

  for await (const chunk of response) {
    if (chunk.message.thinking && !startedThinking) {
      startedThinking = true
      process.stdout.write('Thinking:\n========\n\n')
    } else if (chunk.message.content && startedThinking && !finishedThinking) {
      finishedThinking = true
      process.stdout.write('\n\nResponse:\n========\n\n')
    }

    if (chunk.message.thinking) {
      process.stdout.write(chunk.message.thinking)
    } else if (chunk.message.content) {
      process.stdout.write(chunk.message.content)
    }
  }
}

main()
@alanxmay commented on GitHub (Aug 22, 2025): The key issue is how to distinguish reasoning content from the response provided by a provider API. Since many providers use vLLM as the inference engine, their responses generally support the [`reasoning_content` field](https://docs.vllm.ai/en/v0.9.1/features/reasoning_outputs.html#streaming-chat-completions). ``` from openai import OpenAI client = OpenAI(...) ... stream = client.chat.completions.create(model=model, messages=messages, stream=True) print("client: Start streaming chat completions...") printed_reasoning_content = False printed_content = False for chunk in stream: reasoning_content = None content = None # Check the content is reasoning_content or content if hasattr(chunk.choices[0].delta, "reasoning_content"): reasoning_content = chunk.choices[0].delta.reasoning_content elif hasattr(chunk.choices[0].delta, "content"): content = chunk.choices[0].delta.content if reasoning_content is not None: if not printed_reasoning_content: printed_reasoning_content = True print("reasoning_content:", end="", flush=True) print(reasoning_content, end="", flush=True) elif content is not None: if not printed_content: printed_content = True print("\ncontent:", end="", flush=True) # Extract and print the content print(content, end="", flush=True) ``` So a better solution maybe make an option for the provider. For example, in the [DeepSeek API](https://api-docs.deepseek.com/guides/reasoning_model#api-example), the reasoning content also located in `chunk.choices[0].delta.reasoning_content`. ```python from openai import OpenAI client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com") # Round 1 messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}] response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=True ) reasoning_content = "" content = "" for chunk in response: if chunk.choices[0].delta.reasoning_content: reasoning_content += chunk.choices[0].delta.reasoning_content else: content += chunk.choices[0].delta.content # Round 2 messages.append({"role": "assistant", "content": content}) messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"}) response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=True ) # ... ``` For [ollama](https://ollama.com/blog/thinking) when`stream=True`, the reasoning content live in the `chunk.message.thinking`. ```javascript import ollama from 'ollama' async function main() { const response = await ollama.chat({ model: 'deepseek-r1', messages: [ { role: 'user', content: 'What is 10 + 23', }, ], stream: true, think: true, }) let startedThinking = false let finishedThinking = false for await (const chunk of response) { if (chunk.message.thinking && !startedThinking) { startedThinking = true process.stdout.write('Thinking:\n========\n\n') } else if (chunk.message.content && startedThinking && !finishedThinking) { finishedThinking = true process.stdout.write('\n\nResponse:\n========\n\n') } if (chunk.message.thinking) { process.stdout.write(chunk.message.thinking) } else if (chunk.message.content) { process.stdout.write(chunk.message.content) } } } main() ```
Author
Owner

@tjbck commented on GitHub (Aug 26, 2025):

Closing in favour of #16930

@tjbck commented on GitHub (Aug 26, 2025): Closing in favour of #16930
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6156