[GH-ISSUE #10492] Disable Thinking Mode #6901

New Issue

GiteaMirror · 2026-04-12T18:46:49-05:00

GiteaMirror commented

2026-04-12 18:46:49 -05:00

Originally created by @ChenDianWzh on GitHub (Apr 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10492

With the advent of qwen3, I feel that ollama can add a new parameter when using python calls to control whether the model thinks or not, I hope it can be realized, thank you

Originally created by @ChenDianWzh on GitHub (Apr 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10492 With the advent of qwen3, I feel that ollama can add a new parameter when using python calls to control whether the model thinks or not, I hope it can be realized, thank you

GiteaMirror added the feature request label 2026-04-12 18:46:49 -05:00

GiteaMirror closed this issue

2026-04-12 18:46:50 -05:00

GiteaMirror commented

2026-04-12 18:46:52 -05:00

@crazyi commented on GitHub (Apr 30, 2025):

I also want to know how to disable thinking in ollama or freely switch from one to another.

@crazyi commented on GitHub (Apr 30, 2025): I also want to know how to disable thinking in ollama or freely switch from one to another.

GiteaMirror commented

2026-04-12 18:46:53 -05:00

@yjwu-leadstec commented on GitHub (Apr 30, 2025):

Input /nothink after your prompt

@yjwu-leadstec commented on GitHub (Apr 30, 2025): Input /nothink after your prompt

GiteaMirror commented

2026-04-12 18:46:54 -05:00

@yebanliuying commented on GitHub (Apr 30, 2025):

That's right, similar to: ollama run qwen3:32b -- enable_thinking=false

@yebanliuying commented on GitHub (Apr 30, 2025): That's right, similar to: ollama run qwen3:32b -- enable_thinking=false

GiteaMirror commented

2026-04-12 18:46:54 -05:00

@myf5 commented on GitHub (Apr 30, 2025):

In vllm, it provides this switch by putting chat_template_kwargs in the API call. How to do similar in ollama?

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

@myf5 commented on GitHub (Apr 30, 2025): In vllm, it provides this switch by putting chat_template_kwargs in the API call. How to do similar in ollama? ``` curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen3-8B", "messages": [ {"role": "user", "content": "Give me a short introduction to large language models."} ], "temperature": 0.7, "top_p": 0.8, "top_k": 20, "max_tokens": 8192, "presence_penalty": 1.5, "chat_template_kwargs": {"enable_thinking": false} }' ```

GiteaMirror commented

2026-04-12 18:46:55 -05:00

@smileyboy2019 commented on GitHub (Apr 30, 2025):

@yebanliuying 在ollama api 里面如何设置

@smileyboy2019 commented on GitHub (Apr 30, 2025): @yebanliuying 在ollama api 里面如何设置

GiteaMirror commented

2026-04-12 18:46:57 -05:00

@ChenDianWzh commented on GitHub (Apr 30, 2025):

我是想通过python调用看看能不能禁用思考

@yebanliuying 在ollama api 里面如何设置

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.***>

@ChenDianWzh commented on GitHub (Apr 30, 2025): 我是想通过python调用看看能不能禁用思考 ---- Replied Message ---- | From | ***@***.***> | | Date | 04/30/2025 17:43 | | To | ollama/ollama ***@***.***> | | Cc | ChenDianWzh ***@***.***>, Author ***@***.***> | | Subject | Re: [ollama/ollama] Disable Thinking Mode (Issue #10492) | smileyboy2019 left a comment (ollama/ollama#10492) @yebanliuying 在ollama api 里面如何设置 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

GiteaMirror commented

2026-04-12 18:46:59 -05:00

@yjwu-leadstec commented on GitHub (Apr 30, 2025):

Check their blog.

@yjwu-leadstec commented on GitHub (Apr 30, 2025): ![Image](https://github.com/user-attachments/assets/4420c220-8b4f-4b00-bfe6-9985785b215f) Check their blog.

GiteaMirror commented

2026-04-12 18:47:01 -05:00

@ChenDianWzh commented on GitHub (Apr 30, 2025):

在python中使用ollama 跟这种还不一样呢，希望出一个参数可以控制吧

image.png (view on web)

Check their blog.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.***>

@ChenDianWzh commented on GitHub (Apr 30, 2025): 在python中使用ollama 跟这种还不一样呢，希望出一个参数可以控制吧 ---- Replied Message ---- | From | Eugene ***@***.***> | | Date | 04/30/2025 17:48 | | To | ollama/ollama ***@***.***> | | Cc | ChenDianWzh ***@***.***>, Author ***@***.***> | | Subject | Re: [ollama/ollama] Disable Thinking Mode (Issue #10492) | yjwu-leadstec left a comment (ollama/ollama#10492) image.png (view on web) Check their blog. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

GiteaMirror commented

2026-04-12 18:47:03 -05:00

@yjwu-leadstec commented on GitHub (Apr 30, 2025):

https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2712576.html

或者看看阿里的API文档

@yjwu-leadstec commented on GitHub (Apr 30, 2025): https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2712576.html 或者看看阿里的API文档

GiteaMirror commented

2026-04-12 18:47:04 -05:00

@ChenDianWzh commented on GitHub (Apr 30, 2025):

thank you bro

https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2712576.html

或者看看阿里的API文档

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.***>

@ChenDianWzh commented on GitHub (Apr 30, 2025): thank you bro ---- Replied Message ---- | From | Eugene ***@***.***> | | Date | 04/30/2025 17:50 | | To | ollama/ollama ***@***.***> | | Cc | ChenDianWzh ***@***.***>, Author ***@***.***> | | Subject | Re: [ollama/ollama] Disable Thinking Mode (Issue #10492) | yjwu-leadstec left a comment (ollama/ollama#10492) https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=https%3A%2F%2Fhelp.aliyun.com%2Fdocument_detail%2F2712576.html 或者看看阿里的API文档 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

GiteaMirror commented

2026-04-12 18:47:05 -05:00

@kalustian commented on GitHub (May 10, 2025):

.

That's right, similar to: ollama run qwen3:32b -- enable_thinking=false

I agree, I also would like to see in Ollama a built-in parameter to disable the thinking mode. Some times it gets anoying to type /no_think during inference.

@kalustian commented on GitHub (May 10, 2025): . > That's right, similar to: ollama run qwen3:32b -- enable_thinking=false I agree, I also would like to see in Ollama a built-in parameter to disable the thinking mode. Some times it gets anoying to type /no_think during inference.

GiteaMirror commented

2026-04-12 18:47:06 -05:00

@hintdesk commented on GitHub (May 12, 2025):

It will be great if we can disable it over parameters. I tried it in template but it doesn't work for DeepSeek at all.

@hintdesk commented on GitHub (May 12, 2025): It will be great if we can disable it over parameters. I tried it in template but it doesn't work for DeepSeek at all. <img width="618" alt="Image" src="https://github.com/user-attachments/assets/93c45aac-6e89-41f2-9e2b-04adc421a4e1" />

GiteaMirror commented

2026-04-12 18:47:06 -05:00

@yjwu-leadstec commented on GitHub (May 12, 2025):

It will be great if we can disable it over parameters. I tried it in template but it doesn't work for DeepSeek at all.

Disable DeepSeek thinking? Only the MOE model of Qwen3 can disable thinking... DeepSeek is not an Qwen3 MOE model...

@yjwu-leadstec commented on GitHub (May 12, 2025): > It will be great if we can disable it over parameters. I tried it in template but it doesn't work for DeepSeek at all. > > <img alt="Image" width="618" src="https://private-user-images.githubusercontent.com/23084655/442637486-93c45aac-6e89-41f2-9e2b-04adc421a4e1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDcwMzc0NTYsIm5iZiI6MTc0NzAzNzE1NiwicGF0aCI6Ii8yMzA4NDY1NS80NDI2Mzc0ODYtOTNjNDVhYWMtNmU4OS00MWYyLTllMmItMDRhZGM0MjFhNGUxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA1MTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNTEyVDA4MDU1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZjYjQ4ZDY3ZTRmMjY3MzlkOWJlYjQxNGUwZGE1OWQ4NmVmNjYwMjkwYjFlNjRjYTMzMjNiNjEyNzQ1YWMxODImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.wVNyP1qj1Nt1nJN1Xct1QZuAMWi5y_BvtWIGwgy8gdg"> Disable DeepSeek thinking? Only the MOE model of Qwen3 can disable thinking... DeepSeek is not an Qwen3 MOE model...

GiteaMirror commented

2026-04-12 18:47:07 -05:00

@fslongjin commented on GitHub (May 29, 2025):

I've developed a proxy tool that disables the thinking mode of qwen3 on ollama. Once the proxy is up and running, you simply need to set the ollama endpoint for qwen3's requests to the proxy.

https://github.com/fslongjin/qwen3-ollama-no-thinking-proxy

@fslongjin commented on GitHub (May 29, 2025): I've developed a proxy tool that disables the thinking mode of qwen3 on ollama. Once the proxy is up and running, you simply need to set the ollama endpoint for qwen3's requests to the proxy. https://github.com/fslongjin/qwen3-ollama-no-thinking-proxy

GiteaMirror commented

2026-04-12 18:47:09 -05:00

@anuramat commented on GitHub (May 29, 2025):

I think it's implemented in 0.9.0: https://github.com/ollama/ollama/releases/tag/v0.9.0-rc0

@anuramat commented on GitHub (May 29, 2025): I think it's implemented in 0.9.0: https://github.com/ollama/ollama/releases/tag/v0.9.0-rc0

GiteaMirror commented

2026-04-12 18:47:11 -05:00

@1257mp commented on GitHub (Jul 17, 2025):

Input /nothink after your prompt

Ollama is (at least through my testing) now using the /set command to enable/disable modes/settings/etc. so this should now be:

/set nothink --> to DISABLE thinking while running the model (I have only really confirmed this behaviour with the deepseek LLMs)

/set think --> to ENABLE thinking while running the model (again, only tested on deepseek LLMs)

All of this should be run after the ollama run <LLM model name> command from the CLI, unless you are using another method to run this.

@1257mp commented on GitHub (Jul 17, 2025): > Input /nothink after your prompt Ollama is (at least through my testing) now using the `/set` command to enable/disable modes/settings/etc. so this should now be: `/set nothink` --> to DISABLE thinking while running the model (I have only really confirmed this behaviour with the deepseek LLMs) `/set think` --> to ENABLE thinking while running the model (again, only tested on deepseek LLMs) All of this should be run after the `ollama run <LLM model name>` command from the CLI, unless you are using another method to run this.

GiteaMirror commented

2026-04-12 18:47:12 -05:00

@mkozjak commented on GitHub (Aug 7, 2025):

Doesn't work with Ollama 0.9.6.

> ollama run qwen3:4b
>>> /set nothink
Set 'nothink' mode.
>>> hello
<think>
Okay, the user sent "hello". I need to respond appropriately. First, I should greet them in a friendly and professional
way. Since they just said "hello", I should acknowledge their greeting and maybe ask how I can assist them today.

Let me think about the best response. In Chinese, "hello" is "你好". So I can start with "你好！有什么我可以帮助你的吗？
" which translates to "Hello! How can I help you today?".

Wait, but sometimes people might want a more casual response. Let me check the standard phrases. In Chinese, common
greetings are "你好" followed by a question to engage them.

I should make sure the response is natural and not too formal. Maybe add a smiley emoji to keep it friendly. Let me
see. The user might be testing if I can handle simple greetings, so the response should be straightforward.

Also, need to avoid any errors in the Chinese. Let me confirm the translation. "你好！有什么我可以帮助你的吗？" is
correct. The phrase "有什么我可以帮助你的吗" is a common way to ask "How can I help you?"

Yes, that's right. So the response would be: 你好！有什么我可以帮助你的吗？

Wait, but the user might be expecting an English response. Wait, the user wrote "hello" in English, but the
instructions say the assistant should respond in Chinese. Wait, the problem says "You are an AI assistant. You must
respond in Chinese. Please write the response."

Wait, the user's message is "hello", and the instruction says to respond in Chinese. So I need to respond in Chinese.

Yes, so the response should be in Chinese. So the correct response is 你好！有什么我可以帮助你的吗？

Let me check if there's a more natural way. Sometimes people use "你好！有什么我可以帮你的吗？" but "帮助你" is more
common than "帮助你的". Wait, "帮助你" is the object, so "有什么我可以帮助你的吗" is correct.

Alternatively, maybe "你好！需要我帮忙吗？" but that's a bit more direct. But the user said "hello", so the standard
response is to ask how they can help.

Hmm, the most natural response in Chinese for "hello" is "你好！有什么我可以帮助你的吗？" So I'll go with that.

Wait, but the user might be using English, but the assistant must respond in Chinese. So the response is in Chinese.
Let me make sure the grammar is correct.

Yes, "有什么我可以帮助你的吗" is correct. The structure is "有什么（what）我可以（I can）帮助你的（help you）吗（
question marker)".

So the response is 你好！有什么我可以帮助你的吗？

Adding a smiley emoji might be good, like 你好！有什么我可以帮助你的吗？😊

But the instructions don't specify emojis, but in Chinese chat, emojis are common. Maybe better to include one to be
friendly.

Wait, the user's message is "hello", so the response should be concise and friendly.

Yes, I think that's the right response.
</think>

你好！有什么我可以帮助你的吗？😊

>>> Send a message (/? for help)

@mkozjak commented on GitHub (Aug 7, 2025): Doesn't work with Ollama 0.9.6. ``` > ollama run qwen3:4b >>> /set nothink Set 'nothink' mode. >>> hello <think> Okay, the user sent "hello". I need to respond appropriately. First, I should greet them in a friendly and professional way. Since they just said "hello", I should acknowledge their greeting and maybe ask how I can assist them today. Let me think about the best response. In Chinese, "hello" is "你好". So I can start with "你好！有什么我可以帮助你的吗？ " which translates to "Hello! How can I help you today?". Wait, but sometimes people might want a more casual response. Let me check the standard phrases. In Chinese, common greetings are "你好" followed by a question to engage them. I should make sure the response is natural and not too formal. Maybe add a smiley emoji to keep it friendly. Let me see. The user might be testing if I can handle simple greetings, so the response should be straightforward. Also, need to avoid any errors in the Chinese. Let me confirm the translation. "你好！有什么我可以帮助你的吗？" is correct. The phrase "有什么我可以帮助你的吗" is a common way to ask "How can I help you?" Yes, that's right. So the response would be: 你好！有什么我可以帮助你的吗？ Wait, but the user might be expecting an English response. Wait, the user wrote "hello" in English, but the instructions say the assistant should respond in Chinese. Wait, the problem says "You are an AI assistant. You must respond in Chinese. Please write the response." Wait, the user's message is "hello", and the instruction says to respond in Chinese. So I need to respond in Chinese. Yes, so the response should be in Chinese. So the correct response is 你好！有什么我可以帮助你的吗？ Let me check if there's a more natural way. Sometimes people use "你好！有什么我可以帮你的吗？" but "帮助你" is more common than "帮助你的". Wait, "帮助你" is the object, so "有什么我可以帮助你的吗" is correct. Alternatively, maybe "你好！需要我帮忙吗？" but that's a bit more direct. But the user said "hello", so the standard response is to ask how they can help. Hmm, the most natural response in Chinese for "hello" is "你好！有什么我可以帮助你的吗？" So I'll go with that. Wait, but the user might be using English, but the assistant must respond in Chinese. So the response is in Chinese. Let me make sure the grammar is correct. Yes, "有什么我可以帮助你的吗" is correct. The structure is "有什么（what）我可以（I can）帮助你的（help you）吗（ question marker)". So the response is 你好！有什么我可以帮助你的吗？ Adding a smiley emoji might be good, like 你好！有什么我可以帮助你的吗？😊 But the instructions don't specify emojis, but in Chinese chat, emojis are common. Maybe better to include one to be friendly. Wait, the user's message is "hello", so the response should be concise and friendly. Yes, I think that's the right response. </think> 你好！有什么我可以帮助你的吗？😊 >>> Send a message (/? for help) ```

GiteaMirror commented

2026-04-12 18:47:12 -05:00

@rick-github commented on GitHub (Aug 7, 2025):

$ ollama -v
ollama version is 0.9.6
$ ollama run qwen3:4b
>>> /set nothink
Set 'nothink' mode.
>>> hello
Hello! How can I assist you today? 😊

>>>

Try re-pulling the model, think support in ollama requires an updated template.

@rick-github commented on GitHub (Aug 7, 2025): ```console $ ollama -v ollama version is 0.9.6 $ ollama run qwen3:4b >>> /set nothink Set 'nothink' mode. >>> hello Hello! How can I assist you today? 😊 >>> ``` Try re-pulling the model, think support in ollama requires an updated template.

GiteaMirror commented

2026-04-12 18:47:13 -05:00

@mkozjak commented on GitHub (Aug 7, 2025):

$ ollama -v
ollama version is 0.9.6
$ ollama run qwen3:4b

/set nothink
Set 'nothink' mode.
hello
Hello! How can I assist you today? 😊

Try re-pulling the model, think support in ollama requires an updated template.

mkozjak@mbp:~ > ollama rm qwen3:4b
deleted 'qwen3:4b'
mkozjak@mbp:~ > ollama pull qwen3:4b
pulling manifest
pulling 3e4cb1417446: 100% ▕███████████████████████████████████████████████████████████████▏ 2.5 GB
pulling 53e4ea15e8f5: 100% ▕███████████████████████████████████████████████████████████████▏ 1.5 KB
pulling d18a5cc71b84: 100% ▕███████████████████████████████████████████████████████████████▏  11 KB
pulling cff3f395ef37: 100% ▕███████████████████████████████████████████████████████████████▏  120 B
pulling e18a783aae55: 100% ▕███████████████████████████████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
success
mkozjak@mbp:~ > ollama -v
ollama version is 0.9.6
Warning: client version is 0.11.3
mkozjak@mbp:~ > ollama run qwen3:4b
>>> /set nothink
Set 'nothink' mode.
>>> hello
<think>
Okay, the user said "hello". I need to respond appropriately. Let me think.

First, "hello" is a greeting, so I should acknowledge it. Maybe start with a friendly response. Since the user is just
saying hello, I don't have much context. I should keep it simple and open-ended^C

>>> Send a message (/? for help)

I'm on a mac.

@mkozjak commented on GitHub (Aug 7, 2025): > $ ollama -v > ollama version is 0.9.6 > $ ollama run qwen3:4b > >>> /set nothink > Set 'nothink' mode. > >>> hello > Hello! How can I assist you today? 😊 > > >>> > Try re-pulling the model, think support in ollama requires an updated template. ``` mkozjak@mbp:~ > ollama rm qwen3:4b deleted 'qwen3:4b' mkozjak@mbp:~ > ollama pull qwen3:4b pulling manifest pulling 3e4cb1417446: 100% ▕███████████████████████████████████████████████████████████████▏ 2.5 GB pulling 53e4ea15e8f5: 100% ▕███████████████████████████████████████████████████████████████▏ 1.5 KB pulling d18a5cc71b84: 100% ▕███████████████████████████████████████████████████████████████▏ 11 KB pulling cff3f395ef37: 100% ▕███████████████████████████████████████████████████████████████▏ 120 B pulling e18a783aae55: 100% ▕███████████████████████████████████████████████████████████████▏ 487 B verifying sha256 digest writing manifest success mkozjak@mbp:~ > ollama -v ollama version is 0.9.6 Warning: client version is 0.11.3 mkozjak@mbp:~ > ollama run qwen3:4b >>> /set nothink Set 'nothink' mode. >>> hello <think> Okay, the user said "hello". I need to respond appropriately. Let me think. First, "hello" is a greeting, so I should acknowledge it. Maybe start with a friendly response. Since the user is just saying hello, I don't have much context. I should keep it simple and open-ended^C >>> Send a message (/? for help) ``` I'm on a mac.

GiteaMirror commented

2026-04-12 18:47:14 -05:00

@rick-github commented on GitHub (Aug 7, 2025):

It looks like the model was updated 11 hours ago to push new weights and remove the thinking control:

$ diff -u <(ollama show --template qwen3:4b-orig) <(ollama show --template qwen3:4b)
--- /dev/fd/63	2025-08-07 12:28:31.619240649 +0200
+++ /dev/fd/62	2025-08-07 12:28:31.620240717 +0200
@@ -30,14 +30,7 @@
 {{- range $i, $_ := .Messages }}
 {{- $last := eq (len (slice $.Messages $i)) 1 -}}
 {{- if eq .Role "user" }}<|im_start|>user
-{{ .Content }}
-{{- if and $.IsThinkSet (eq $i $lastUserIdx) }}
-   {{- if $.Think -}}
-      {{- " "}}/think
-   {{- else -}}
-      {{- " "}}/no_think
-   {{- end -}}
-{{- end }}<|im_end|>
+{{ .Content }}<|im_end|>
 {{ else if eq .Role "assistant" }}<|im_start|>assistant
 {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
 <think>{{ .Thinking }}</think>
@@ -54,11 +47,5 @@
 </tool_response><|im_end|>
 {{ end }}
 {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
-{{ if and $.IsThinkSet (not $.Think) -}}
-<think>
-
-</think>
-
-{{ end -}}
 {{ end }}
 {{- end }}
\ No newline at end of file

Perhaps the new model will have a different mechanism for controlling thinking.

@drifkin

@rick-github commented on GitHub (Aug 7, 2025): It looks like the model was updated 11 hours ago to push new weights and remove the thinking control: ```diff $ diff -u <(ollama show --template qwen3:4b-orig) <(ollama show --template qwen3:4b) --- /dev/fd/63 2025-08-07 12:28:31.619240649 +0200 +++ /dev/fd/62 2025-08-07 12:28:31.620240717 +0200 @@ -30,14 +30,7 @@ {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|im_start|>user -{{ .Content }} -{{- if and $.IsThinkSet (eq $i $lastUserIdx) }} - {{- if $.Think -}} - {{- " "}}/think - {{- else -}} - {{- " "}}/no_think - {{- end -}} -{{- end }}<|im_end|> +{{ .Content }}<|im_end|> {{ else if eq .Role "assistant" }}<|im_start|>assistant {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think>{{ .Thinking }}</think> @@ -54,11 +47,5 @@ </tool_response><|im_end|> {{ end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant -{{ if and $.IsThinkSet (not $.Think) -}} -<think> - -</think> - -{{ end -}} {{ end }} {{- end }} \ No newline at end of file ``` Perhaps the new model will have a different mechanism for controlling thinking. @drifkin

GiteaMirror commented

2026-04-12 18:47:15 -05:00

@drifkin commented on GitHub (Aug 7, 2025):

so these new qwen models don't have thinking control, instead they expect you to use a thinking model vs. a non-thinking model. We'll think about whether we should offer automatic switching in the cli via these existing commands, but we need to think through the implications of that a bit more, it could get really complicated!

@drifkin commented on GitHub (Aug 7, 2025): so these new qwen models don't have thinking control, instead they expect you to use a thinking model vs. a non-thinking model. We'll think about whether we should offer automatic switching in the cli via these existing commands, but we need to think through the implications of that a bit more, it could get really complicated!

GiteaMirror commented

2026-04-12 18:47:16 -05:00

@TranQuyenSinh commented on GitHub (Aug 11, 2025):

I found that command set /nothink or --think=false also work with qwen3:8b and doesn't with qwen3:4b

@TranQuyenSinh commented on GitHub (Aug 11, 2025): I found that command set /nothink or --think=false also work with qwen3:8b and doesn't with qwen3:4b

GiteaMirror commented

2026-04-12 18:47:16 -05:00

@MrMuhannadObeidat commented on GitHub (Aug 15, 2025):

--think=false doesn't with qwen3:4b. Making the model completely unusable for me at least. what a shame!

@MrMuhannadObeidat commented on GitHub (Aug 15, 2025): --think=false doesn't with qwen3:4b. Making the model completely unusable for me at least. what a shame!

GiteaMirror commented

2026-04-12 18:47:17 -05:00

@rick-github commented on GitHub (Aug 15, 2025):

Use the non-thinking version of the model, qwen3:4b-instruct-2507-q4_K_M.

$ ollama run qwen3:4b-instruct-2507-q4_K_M hello
Hello! How can I assist you today? 😊

@rick-github commented on GitHub (Aug 15, 2025): Use the non-thinking version of the model, [qwen3:4b-instruct-2507-q4_K_M](https://ollama.com/library/qwen3:4b-instruct-2507-q4_K_M). ```console $ ollama run qwen3:4b-instruct-2507-q4_K_M hello Hello! How can I assist you today? 😊 ```

GiteaMirror commented

2026-04-12 18:47:18 -05:00

@user123-source commented on GitHub (Aug 16, 2025):

Please list these model separately to the original Qwen3 to avoid this confusion.

@user123-source commented on GitHub (Aug 16, 2025): Please list these model separately to the original Qwen3 to avoid this confusion.

GiteaMirror commented

2026-04-12 18:47:18 -05:00

@MrMuhannadObeidat commented on GitHub (Aug 16, 2025):

@rick-github thanks for pointing to that. It works without the thinking piece.

@MrMuhannadObeidat commented on GitHub (Aug 16, 2025): @rick-github thanks for pointing to that. It works without the thinking piece.

GiteaMirror referenced this issue

2026-04-22 09:35:53 -05:00

[GH-ISSUE #6901] High CPU and slow generate tockens #30125

GiteaMirror referenced this issue

2026-04-28 17:19:03 -05:00

[GH-ISSUE #6901] High CPU and slow generate tockens #50876

GiteaMirror referenced this issue

2026-05-04 03:59:05 -05:00

[GH-ISSUE #6901] High CPU and slow generate tockens #66405

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#6901