[GH-ISSUE #11712] qwen3:235b + ollama 0.10.1 + ubuntu 22.04 don't disable think. #33513

Open
opened 2026-04-22 16:16:42 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @isgbuddy on GitHub (Aug 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11712

What is the issue?

I add /nothink or /no_think in prompt.

Or I /set nothink in ollama command line.

qwen3:235b still give think process.

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @isgbuddy on GitHub (Aug 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11712 ### What is the issue? I add /nothink or /no_think in prompt. Or I /set nothink in ollama command line. qwen3:235b still give think process. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-22 16:16:42 -05:00
Author
Owner

@isgbuddy commented on GitHub (Aug 6, 2025):

It is a interesting thiing. I run qwen3:0.6b at the same server.

/set nothink is work here.

For qwen3:235b. /set nothink dont' work.

Image
<!-- gh-comment-id:3157184176 --> @isgbuddy commented on GitHub (Aug 6, 2025): It is a interesting thiing. I run qwen3:0.6b at the same server. /set nothink is work here. For qwen3:235b. /set nothink dont' work. <img width="507" height="243" alt="Image" src="https://github.com/user-attachments/assets/735cbf26-f228-4c8c-87d6-1d251d33cf93" />
Author
Owner

@isgbuddy commented on GitHub (Aug 6, 2025):

Image
<!-- gh-comment-id:3157186939 --> @isgbuddy commented on GitHub (Aug 6, 2025): <img width="1172" height="641" alt="Image" src="https://github.com/user-attachments/assets/89a33b22-cf47-42be-9129-50196b709ee4" />
Author
Owner

@minzanupam commented on GitHub (Aug 7, 2025):

Qwen3 has had a recent update 2507. Even since that update they have split the thinking and non thinking into separate models. So even if you run it in non thinking mode it will still add tokens. It is just that the default models are thinking models on ollama.

If you are fine with downloading both models. I think they are labeled with the instruct tag.

<!-- gh-comment-id:3164927885 --> @minzanupam commented on GitHub (Aug 7, 2025): Qwen3 has had a recent update 2507. Even since that update they have split the thinking and non thinking into separate models. So even if you run it in non thinking mode it will still add <think> tokens. It is just that the default models are thinking models on ollama. If you are fine with downloading both models. I think they are labeled with the instruct tag.
Author
Owner

@YuenSzeHong commented on GitHub (Jan 31, 2026):

root@a41a66f5f614:/# ollama -v
ollama version is 0.15.2
root@a41a66f5f614:/# ollama ls
NAME                                       ID              SIZE      MODIFIED
gpt-oss:latest                             17052f91a42e    13 GB     12 minutes ago
qwen3-embedding:latest                     64b933495768    4.7 GB    10 hours ago
x/z-image-turbo:latest                     1053737ea587    12 GB     35 hours ago
qwen3-vl:latest                            901cae732162    6.1 GB    2 days ago
gpt-oss:20b                                aa4295ac10c3    13 GB     5 months ago
goekdenizguelmez/JOSIEFIED-Qwen3:latest    88bfd8fa4e00    5.0 GB    8 months ago
qwen3:latest                               e4b5fd7f8af0    5.2 GB    8 months ago
deepseek-r1:1.5b                           a42b25d8c10a    1.1 GB    12 months ago
deepseek-r1:7b                             0a8c26691023    4.7 GB    12 months ago
llama3.2:3b                                a80c4f17acd5    2.0 GB    16 months ago
root@a41a66f5f614:/# ollama run gpt-oss:20bn
pulling manifest
Error: pull model manifest: file does not exist
root@a41a66f5f614:/# ^C
root@a41a66f5f614:/# ollama run gpt-oss:20b
>>> /set nothink
Set 'nothink' mode.
>>> test
Thinking...
The user just typed "test". They likely want a response. Maybe they want a test of the chat? We should respond
appropriately. Possibly the user is testing the system. We can respond with "Hello! How can I help you today?" or
something. It's a minimal prompt. So answer accordingly.
...done thinking.

Hello! It looks like you’re running a quick test. How can I assist you today?

>>>
root@a41a66f5f614:/# ollama run qwen3
>>> /set nothink
warning: model "qwen3" does not support thinking output
Set 'nothink' mode.
>>> test
<think>
Okay, the user sent "test" as a message. I need to respond appropriately. Let me think about how to handle this.

First, "test" is a common word used to check if a system is working. Since the user might just be testing the
chatbot's response, I should acknowledge their message in a friendly manner. Maybe ask how I can assist them
today. That way, I'm being helpful and open to further interaction. I should keep the response simple and
welcoming. Let me make sure there's no confusion and that the reply is polite. Alright, that should cover it.
</think>

Hello! How can I assist you today? 😊

>>> Send a message (/? for help)
<!-- gh-comment-id:3828411242 --> @YuenSzeHong commented on GitHub (Jan 31, 2026): ```console root@a41a66f5f614:/# ollama -v ollama version is 0.15.2 root@a41a66f5f614:/# ollama ls NAME ID SIZE MODIFIED gpt-oss:latest 17052f91a42e 13 GB 12 minutes ago qwen3-embedding:latest 64b933495768 4.7 GB 10 hours ago x/z-image-turbo:latest 1053737ea587 12 GB 35 hours ago qwen3-vl:latest 901cae732162 6.1 GB 2 days ago gpt-oss:20b aa4295ac10c3 13 GB 5 months ago goekdenizguelmez/JOSIEFIED-Qwen3:latest 88bfd8fa4e00 5.0 GB 8 months ago qwen3:latest e4b5fd7f8af0 5.2 GB 8 months ago deepseek-r1:1.5b a42b25d8c10a 1.1 GB 12 months ago deepseek-r1:7b 0a8c26691023 4.7 GB 12 months ago llama3.2:3b a80c4f17acd5 2.0 GB 16 months ago root@a41a66f5f614:/# ollama run gpt-oss:20bn pulling manifest Error: pull model manifest: file does not exist root@a41a66f5f614:/# ^C root@a41a66f5f614:/# ollama run gpt-oss:20b >>> /set nothink Set 'nothink' mode. >>> test Thinking... The user just typed "test". They likely want a response. Maybe they want a test of the chat? We should respond appropriately. Possibly the user is testing the system. We can respond with "Hello! How can I help you today?" or something. It's a minimal prompt. So answer accordingly. ...done thinking. Hello! It looks like you’re running a quick test. How can I assist you today? >>> root@a41a66f5f614:/# ollama run qwen3 >>> /set nothink warning: model "qwen3" does not support thinking output Set 'nothink' mode. >>> test <think> Okay, the user sent "test" as a message. I need to respond appropriately. Let me think about how to handle this. First, "test" is a common word used to check if a system is working. Since the user might just be testing the chatbot's response, I should acknowledge their message in a friendly manner. Maybe ask how I can assist them today. That way, I'm being helpful and open to further interaction. I should keep the response simple and welcoming. Let me make sure there's no confusion and that the reply is polite. Alright, that should cover it. </think> Hello! How can I assist you today? 😊 >>> Send a message (/? for help) ```
Author
Owner

@ullenboom commented on GitHub (Feb 20, 2026):

I'm running into the same issue with qwen3-vl. It's very talkative.

Setting

 /set nothink

isn't working, and the instruct model is only in the cloud, nothing local.

<!-- gh-comment-id:3934218960 --> @ullenboom commented on GitHub (Feb 20, 2026): I'm running into the same issue with [qwen3-vl](https://ollama.com/library/qwen3-vl). It's **very** talkative. Setting /set nothink isn't working, and the instruct model is only in the cloud, nothing local.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33513