[GH-ISSUE #12234] qwen3:4b still output thinking progress with Chat API "think = false". #54652

Closed
opened 2026-04-29 06:45:53 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @owenzhao on GitHub (Sep 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12234

What is the issue?

qwen3:latest, which is 8b works well. However, qwen3:4b still output think progress in content.

Image

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.11.10

Originally created by @owenzhao on GitHub (Sep 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12234 ### What is the issue? qwen3:latest, which is 8b works well. However, qwen3:4b still output think progress in content. <img width="770" height="380" alt="Image" src="https://github.com/user-attachments/assets/8bf16da8-55a6-452b-aee4-3f737c621533" /> ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.11.10
GiteaMirror added the bug label 2026-04-29 06:45:53 -05:00
Author
Owner

@owenzhao commented on GitHub (Sep 9, 2025):

I think this probably is a regression of Ollama as in my memory qwen3:4b was good when think parameter was first introduced.

<!-- gh-comment-id:3272339768 --> @owenzhao commented on GitHub (Sep 9, 2025): I think this probably is a regression of Ollama as in my memory qwen3:4b was good when think parameter was first introduced.
Author
Owner

@owenzhao commented on GitHub (Sep 9, 2025):

Also, "stream = false". If that is relevant.

<!-- gh-comment-id:3272345787 --> @owenzhao commented on GitHub (Sep 9, 2025): Also, "stream = false". If that is relevant.
Author
Owner

@rick-github commented on GitHub (Sep 9, 2025):

#12022

<!-- gh-comment-id:3272350639 --> @rick-github commented on GitHub (Sep 9, 2025): #12022
Author
Owner

@owenzhao commented on GitHub (Sep 9, 2025):

#12022

Thank you. However, it seems there is only one qwen3:4b on ollama.com. Can I get qwen3:4b-instruct in ollama.com. Or I have to get it from other place?

<!-- gh-comment-id:3272398350 --> @owenzhao commented on GitHub (Sep 9, 2025): > [#12022](https://github.com/ollama/ollama/issues/12022) Thank you. However, it seems there is only one qwen3:4b on ollama.com. Can I get qwen3:4b-instruct in ollama.com. Or I have to get it from other place?
Author
Owner

@rick-github commented on GitHub (Sep 9, 2025):

https://ollama.com/library/qwen3/tags

<!-- gh-comment-id:3272401635 --> @rick-github commented on GitHub (Sep 9, 2025): https://ollama.com/library/qwen3/tags
Author
Owner

@owenzhao commented on GitHub (Sep 9, 2025):

I also tested with 0.6b and 1.7b, they both work well most of the time. However, some times 1.7b will append "no_think" to answers. For example:

When translating "PROXY: %lld", you get "代理:%lld /no_think".

<!-- gh-comment-id:3272413460 --> @owenzhao commented on GitHub (Sep 9, 2025): I also tested with 0.6b and 1.7b, they both work well most of the time. However, some times 1.7b will append "no_think" to answers. For example: When translating "PROXY: %lld", you get "代理:%lld /no_think".
Author
Owner

@rick-github commented on GitHub (Sep 9, 2025):

When translating "PROXY: %lld", you get "代理:%lld /no_think".

This a side effect of ollama trying to control the thinking mode. The ollama template controls thinking by adding /think or /no_think to the end of the last user message. If that message is something that the model is going to send back to the client (eg when translating something), then it's possible that the control token will be sent as well. You can work around this by moving the code that adds the control token to the part of the template that creates the system message. The only drawback is that the user message can then override the system message and toggle the thinking flag.

This may be why qwen decided to move away from a hybrid model for the update to qwen3:4b.

<!-- gh-comment-id:3272528397 --> @rick-github commented on GitHub (Sep 9, 2025): > When translating "PROXY: %lld", you get "代理:%lld /no_think". This a side effect of ollama trying to control the thinking mode. The [ollama template](https://ollama.com/library/qwen3:1.7b/blobs/ae370d884f) controls thinking by adding `/think` or `/no_think` to the end of the last user message. If that message is something that the model is going to send back to the client (eg when translating something), then it's possible that the control token will be sent as well. You can work around this by moving the code that adds the control token to the part of the template that creates the system message. The only drawback is that the user message can then override the system message and toggle the thinking flag. This may be why qwen decided to move away from a hybrid model for the update to qwen3:4b.
Author
Owner

@owenzhao commented on GitHub (Sep 10, 2025):

Thanks for your explanation. @rick-github

However, I am not using "no_think" to control. I have used the Ollama's API with "think = false". I knew that "no_think" in prompt worked before. But since Ollama added "think" parameter to Ollama own API, I have turned to this.

I intended use Ollama Native API instead of the compatible OpenAI API, as the latter doesn't support "think" parameter now.

<!-- gh-comment-id:3272938140 --> @owenzhao commented on GitHub (Sep 10, 2025): Thanks for your explanation. @rick-github However, I am not using "no_think" to control. I have used the Ollama's API with "think = false". I knew that "no_think" in prompt worked before. But since Ollama added "think" parameter to Ollama own API, I have turned to this. I intended use Ollama Native API instead of the compatible OpenAI API, as the latter doesn't support "think" parameter now.
Author
Owner

@rick-github commented on GitHub (Sep 10, 2025):

That is how the think API works. The template adds /think or /no_think based on the value of the flag passed in the API call.

<!-- gh-comment-id:3273015623 --> @rick-github commented on GitHub (Sep 10, 2025): That is how the `think` API works. The template adds `/think` or `/no_think` based on the value of the flag passed in the API call.
Author
Owner

@owenzhao commented on GitHub (Sep 10, 2025):

That is how the think API works. The template adds /think or /no_think based on the value of the flag passed in the API call.

Really? In that case, only qwen3-4b-instruct is recommended as only it follows no_thinking very well but in fact it doesn't support thinking at all. It humors me somehow.

<!-- gh-comment-id:3273347566 --> @owenzhao commented on GitHub (Sep 10, 2025): > That is how the `think` API works. The template adds `/think` or `/no_think` based on the value of the flag passed in the API call. Really? In that case, only qwen3-4b-instruct is recommended as only it follows no_thinking very well but in fact it doesn't support thinking at all. It humors me somehow.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54652