[GH-ISSUE #11259] issue: Think tags not detected if opening tag is in prompt template? #54827

New Issue

GiteaMirror · 2026-05-05T16:46:40-05:00

GiteaMirror commented

2026-05-05 16:46:40 -05:00

Originally created by @bjj on GitHub (Mar 6, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/11259

Check Existing Issues

I have searched the existing issues and discussions.

Installation Method

Docker

Open WebUI Version

v0.5.20 (latest)

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04

Browser (if applicable)

Edge

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have checked the browser console logs.
I have checked the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

After prompting QwQ-32B, the initial <think> should be recognized as thinking when it is part of the prompt template (see the end of https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json , compare the preview: https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/tokenizer_config.json )

Actual Behavior

Thinking starts streaming in as an answer, eventually ending in </think> but open-webui didn't recognize it

Steps to Reproduce

Get QwQ-32B (not the preview), submit any prompt (thinking is forced because it's in the chat template).

Logs & Screenshots

Additional Information

No response

Originally created by @bjj on GitHub (Mar 6, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/11259 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Installation Method Docker ### Open WebUI Version v0.5.20 (latest) ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 ### Browser (if applicable) Edge ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have checked the browser console logs. - [x] I have checked the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior After prompting QwQ-32B, the initial `<think>` should be recognized as thinking when it is part of the prompt template (see the end of https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json , compare the preview: https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/tokenizer_config.json ) ### Actual Behavior Thinking starts streaming in as an answer, eventually ending in `</think>` but open-webui didn't recognize it ### Steps to Reproduce Get QwQ-32B (not the preview), submit any prompt (thinking is forced because it's in the chat template). ### Logs & Screenshots ![Image](https://github.com/user-attachments/assets/2d82d4d6-f7f5-4fde-aed0-56e184df6d3d) ### Additional Information _No response_

GiteaMirror added the bug label 2026-05-05 16:46:40 -05:00

GiteaMirror closed this issue

2026-05-05 16:46:41 -05:00

GiteaMirror commented

2026-05-05 16:46:42 -05:00

@bjj commented on GitHub (Mar 6, 2025):

This might be specific to vLLM, since it is using the tokenizer_config literally. llama-serve, at least, seems to prune it off of the prompt and let the model generate it (?)

@bjj commented on GitHub (Mar 6, 2025): This might be specific to vLLM, since it is using the tokenizer_config literally. `llama-serve`, at least, seems to prune it off of the prompt and let the model generate it (?)

GiteaMirror commented

2026-05-05 16:46:43 -05:00

@mindkrypted commented on GitHub (Mar 6, 2025):

@bjj
It's not specific to vLLM, I'm using TabbyAPI to serve the quanted model with exllamaV2 and it's also not being recognized properly within Open-WebUI.

There's an open discussion on the HF model's page https://huggingface.co/Qwen/QwQ-32B/discussions/4

The <think> tag is provided by the inference server, but it's not displayed in Open-WebUI. (see the screenshot with detailed logs)

@mindkrypted commented on GitHub (Mar 6, 2025): @bjj It's not specific to vLLM, I'm using TabbyAPI to serve the quanted model with exllamaV2 and it's also not being recognized properly within Open-WebUI. There's an open discussion on the HF model's page [https://huggingface.co/Qwen/QwQ-32B/discussions/4](url) The `<think>` tag is provided by the inference server, but it's not displayed in Open-WebUI. *(see the screenshot with detailed logs)* ![Image](https://github.com/user-attachments/assets/bd696258-9254-47ed-bf43-38b30e26753c)

GiteaMirror commented

2026-05-05 16:46:43 -05:00

@Lzhang-hub commented on GitHub (Mar 6, 2025):

It it because first <think> is in chat template, so model fisrt output token is not <think>, open-webui can not got it.

https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json

@Lzhang-hub commented on GitHub (Mar 6, 2025): It it because first `<think>` is in chat template, so model fisrt output token is not `<think>`, open-webui can not got it. https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json <img width="1679" alt="Image" src="https://github.com/user-attachments/assets/af8d43d9-667d-4258-bb9c-c8791e1d2eed" />

GiteaMirror commented

2026-05-05 16:46:44 -05:00

@alvarolopez commented on GitHub (Mar 7, 2025):

This is also happening with DeepSeek models (c.f. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/blame/main/tokenizer_config.json)

The change was introduced weeks ago in 74fbf131a9

@alvarolopez commented on GitHub (Mar 7, 2025): This is also happening with DeepSeek models (c.f. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/blame/main/tokenizer_config.json) The change was introduced weeks ago in https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/commit/74fbf131a939963dd1e244389bb61ad0d0440a4d

GiteaMirror commented

2026-05-05 16:46:46 -05:00

@mindkrypted commented on GitHub (Mar 8, 2025):

With TabbyAPI, I'm able to get the "normal" tag when removing it from the chat template.
The end looks like this after the modification:
{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n

@mindkrypted commented on GitHub (Mar 8, 2025): With TabbyAPI, I'm able to get the "normal" <think> tag when removing it from the chat template. The end looks like this after the modification: `{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n`

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#54827