[GH-ISSUE #12575] Using "/no_think" with HYBRID models does not work anymore #70402

Closed
opened 2026-05-04 21:26:07 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @LFd3v on GitHub (Oct 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12575

What is the issue?

Hi, there.

As the title says. At least from v0.12.3, using "/no_think" in a chat message does not disable "thinking" mode for hybrid models, like qwen3:8b.

Please not that this issue refers to models that are hybrid, meaning that their inference can be done in both modes. One example is qwen3:8b that only has one version, unlike qwen3:4b that was broken later in instruct and thinking versions. But the issue affect all hybrid models that I tested, not only qwen3. It used to work if "/no_think" was added to either an user message or t the system prompt, but now it is simply ignored in both cases.

It is very useful for hybrid models, since the user can toggle "thinking" mode in the middle of a conversation when it is needed. As of now, it is not possible anymore.

This used to work perfectly fine, right after the initial qwen3 models were released. Unfortunately, I am not sure about what Ollama version exactly broke this feature. If requested, I can try to go back and test older versions.

Thanks for working on Ollama as an Open Source project.

Regards

Relevant log output

N/A

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.5

Originally created by @LFd3v on GitHub (Oct 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12575 ### What is the issue? Hi, there. As the title says. At least from v0.12.3, using "/no_think" in a chat message does not disable "thinking" mode for _hybrid_ models, like qwen3:8b. Please not that this issue refers to models that are hybrid, meaning that their inference can be done in both modes. One example is qwen3:8b that only has one version, unlike qwen3:4b that was broken later in instruct and thinking versions. But the issue affect all hybrid models that I tested, not only qwen3. It used to work if "/no_think" was added to either an user message or t the system prompt, but now it is simply ignored in both cases. It is very useful for hybrid models, since the user can toggle "thinking" mode in the middle of a conversation when it is needed. As of now, it is not possible anymore. This used to work perfectly fine, right after the initial qwen3 models were released. Unfortunately, I am not sure about what Ollama version exactly broke this feature. If requested, I can try to go back and test older versions. Thanks for working on Ollama as an Open Source project. Regards ### Relevant log output ```shell N/A ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.5
GiteaMirror added the bug label 2026-05-04 21:26:08 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 11, 2025):

What client are you using?

<!-- gh-comment-id:3393416345 --> @rick-github commented on GitHub (Oct 11, 2025): What client are you using?
Author
Owner

@LFd3v commented on GitHub (Oct 11, 2025):

I mostly use Ollama with the CLI client and Page Assist. For instance, this used to work:

ollama run qwen3:8b "Tell me a joke about cheese /no_think"

or just start a chat with ollama run qwen3::8b and then type "tell me a joke about cheese /no_think", and the reply will include "thinking" traces. Same if I use /set system "I am a helpful assistant /no_think", which used to work as well.

<!-- gh-comment-id:3393529157 --> @LFd3v commented on GitHub (Oct 11, 2025): I mostly use Ollama with the CLI client and Page Assist. For instance, this used to work: ollama run qwen3:8b "Tell me a joke about cheese /no_think" or just start a chat with `ollama run qwen3::8b` and then type "tell me a joke about cheese /no_think", and the reply will include "thinking" traces. Same if I use `/set system "I am a helpful assistant /no_think"`, which used to work as well.
Author
Owner

@rick-github commented on GitHub (Oct 11, 2025):

https://github.com/ollama/ollama/pull/12533 made thinking the default for think-enabled models, to solve some template issues. That means controlling thinking now needs to be done via the API, rather than instructions in the prompt. When using the ollama CLI this is straightforward and similar to the way you are using it now:

ollama run qwen3:8b "Tell me a joke about cheese" --think=false
$ ollama run qwen3:8b
>>> /set nothink
Set 'nothink' mode.
>>> tell me a joke about cheese
Why don't cheeses ever get cold?

Because they always keep their **cheese**! 🧀😄

Similarly, with clients like OpenWebUI, users can set the think parameter from the chat control drop-down in the upper right.

It's more problematical if a user is using a client that doesn't allow setting parameters in the API call. In that case, the simplest approach would be to modify the template to remove thinking control:

$ echo FROM qwen3:8b > Modelfile
$ ollama show --modelfile qwen3:8b | egrep -v "^FROM |think.*think" >> Modelfile
$ ollama create qwen3:8b-manual-think
$ ollama run qwen3:8b-manual-think
>>> hello
<think>
Okay, the user sent "hello". I need to respond in a friendly and helpful way. Let me start by greeting them back. Maybe say "Hello!" to be polite. Then, ask how I can assist them. Keep it open-ended so 
they feel comfortable sharing what they need help with. Make sure the tone is positive and approachable. Let me check if there's anything else I should add. Maybe a smiley emoji to keep it friendly. 
Alright, that should work.
</think>

Hello! 😊 How can I assist you today? I'm here to help with any questions or tasks you might have!

>>> /set system "I am a helpful assistant /no_think"
Set system message.
>>> hello
<think>

</think>

Hello again! 😊 How can I help you today? I'm here to assist with anything you need!

Since ollama no longer knows this model has thinking control, the cosmetic rendering of think tags is no longer done.

<!-- gh-comment-id:3393602366 --> @rick-github commented on GitHub (Oct 11, 2025): https://github.com/ollama/ollama/pull/12533 made thinking the default for think-enabled models, to solve some template issues. That means controlling thinking now needs to be done via the API, rather than instructions in the prompt. When using the ollama CLI this is straightforward and similar to the way you are using it now: ``` ollama run qwen3:8b "Tell me a joke about cheese" --think=false ``` ```console $ ollama run qwen3:8b >>> /set nothink Set 'nothink' mode. >>> tell me a joke about cheese Why don't cheeses ever get cold? Because they always keep their **cheese**! 🧀😄 ``` Similarly, with clients like OpenWebUI, users can set the `think` parameter from the chat control drop-down in the upper right. It's more problematical if a user is using a client that doesn't allow setting parameters in the API call. In that case, the simplest approach would be to modify the template to remove thinking control: ```console $ echo FROM qwen3:8b > Modelfile $ ollama show --modelfile qwen3:8b | egrep -v "^FROM |think.*think" >> Modelfile $ ollama create qwen3:8b-manual-think ``` ```console $ ollama run qwen3:8b-manual-think >>> hello <think> Okay, the user sent "hello". I need to respond in a friendly and helpful way. Let me start by greeting them back. Maybe say "Hello!" to be polite. Then, ask how I can assist them. Keep it open-ended so they feel comfortable sharing what they need help with. Make sure the tone is positive and approachable. Let me check if there's anything else I should add. Maybe a smiley emoji to keep it friendly. Alright, that should work. </think> Hello! 😊 How can I assist you today? I'm here to help with any questions or tasks you might have! >>> /set system "I am a helpful assistant /no_think" Set system message. >>> hello <think> </think> Hello again! 😊 How can I help you today? I'm here to assist with anything you need! ``` Since ollama no longer knows this model has thinking control, the cosmetic rendering of `think` tags is no longer done.
Author
Owner

@LFd3v commented on GitHub (Oct 11, 2025):

@rick-github Thank you a lot for the info and explanations. So I guess this feature is not available anymore, unless the workaround mentioned above is applied to the model template, just to make it clear, right? Too bad. If this is the case, and there is no chance it will be implemented again, I guess this issue can be closed. At least it may help someone else that still wanted to use the control in the user messages for qwen3 and other models. Regards

<!-- gh-comment-id:3393655509 --> @LFd3v commented on GitHub (Oct 11, 2025): @rick-github Thank you a lot for the info and explanations. So I guess this feature is not available anymore, unless the workaround mentioned above is applied to the model template, just to make it clear, right? Too bad. If this is the case, and there is no chance it will be implemented again, I guess this issue can be closed. At least it may help someone else that still wanted to use the control in the user messages for qwen3 and other models. Regards
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70402