[GH-ISSUE #10457] Can enable_thinking parameter be adjusted by passing a map through environment variables? #6875

Closed
opened 2026-04-12 18:43:16 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @somnifex on GitHub (Apr 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10457

For example, in a Docker deployment, using environment variables
-e enable_thinking={"qwen3:14b": False, "qwen3:30b": True,......}
Pass a map to adjust the enable_thinking parameter

Originally created by @somnifex on GitHub (Apr 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10457 For example, in a Docker deployment, using environment variables -e enable_thinking={"qwen3:14b": False, "qwen3:30b": True,......} Pass a map to adjust the `enable_thinking` parameter
GiteaMirror added the feature request label 2026-04-12 18:43:16 -05:00
Author
Owner

@jeffrey-cwj commented on GitHub (Apr 29, 2025):

how to add enable_thinking param when using completion api?

<!-- gh-comment-id:2837251670 --> @jeffrey-cwj commented on GitHub (Apr 29, 2025): how to add `enable_thinking` param when using completion api?
Author
Owner

@lyfuci commented on GitHub (Apr 29, 2025):

@jeffrey-cwj @utopeadia
https://qwenlm.github.io/blog/qwen3/

the blog give this answer may help
We provide a soft switch mechanism that allows users to dynamically control the model’s behavior when enable_thinking=True. Specifically, you can add /think and /no_think to user prompts or system messages to switch the model’s thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.

and I follow this answer , it works. but you still remove the blank block.

<!-- gh-comment-id:2838531242 --> @lyfuci commented on GitHub (Apr 29, 2025): @jeffrey-cwj @utopeadia https://qwenlm.github.io/blog/qwen3/ the blog give this answer may help `We provide a soft switch mechanism that allows users to dynamically control the model’s behavior when enable_thinking=True. Specifically, you can add /think and /no_think to user prompts or system messages to switch the model’s thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.` and I follow this answer , it works. but you still remove the blank <think></think> block.
Author
Owner

@somnifex commented on GitHub (Apr 29, 2025):

@jeffrey-cwj @utopeadia https://qwenlm.github.io/blog/qwen3/

the blog give this answer may help We provide a soft switch mechanism that allows users to dynamically control the model’s behavior when enable_thinking=True. Specifically, you can add /think and /no_think to user prompts or system messages to switch the model’s thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.

and I follow this answer , it works. but you still remove the blank block.

This solution is feasible, but I believe Ollama can be further optimized for Qwen3. In the description of Qwen3, due to compatibility issues, even though the "think" mode is not used, the model's output still includes an empty tag. However, for Ollama, as a downstream application of the base model, removing this part of the tag might be more user-friendly for end users

<!-- gh-comment-id:2838649167 --> @somnifex commented on GitHub (Apr 29, 2025): > [@jeffrey-cwj](https://github.com/jeffrey-cwj) [@utopeadia](https://github.com/utopeadia) https://qwenlm.github.io/blog/qwen3/ > > the blog give this answer may help `We provide a soft switch mechanism that allows users to dynamically control the model’s behavior when enable_thinking=True. Specifically, you can add /think and /no_think to user prompts or system messages to switch the model’s thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.` > > and I follow this answer , it works. but you still remove the blank block. This solution is feasible, but I believe Ollama can be further optimized for Qwen3. In the description of Qwen3, due to compatibility issues, even though the "think" mode is not used, the model's output still includes an empty <think> tag. However, for Ollama, as a downstream application of the base model, removing this part of the tag might be more user-friendly for end users
Author
Owner

@hchasens commented on GitHub (Apr 29, 2025):

Parsing /no_think or /think from the prompt seams like a temporary solution. What happens when a user wants to add "/think" in the context of a conversation without triggering the switch? I think this is a good stand in solution until the API can be updated and applications using ollama as a dep can follow suit. I'd be very surprised if model capable of dual modes weren't produced by other companies. I think, in time, it'll become another parameter like temp that you just set.

This solution is user friendly and it allows application developers to quickly test how the model will work with each flag set. I do look forward to seeing a reasoning button added to open-webui though.

That said, a permanent API solution is a must imo.

<!-- gh-comment-id:2839143282 --> @hchasens commented on GitHub (Apr 29, 2025): Parsing `/no_think` or `/think` from the prompt seams like a temporary solution. What happens when a user wants to add "/think" in the context of a conversation without triggering the switch? I think this is a good stand in solution until the API can be updated and applications using ollama as a dep can follow suit. I'd be very surprised if model capable of dual modes weren't produced by other companies. I think, in time, it'll become another parameter like `temp` that you just set. This solution is user friendly and it allows application developers to quickly test how the model will work with each flag set. I do look forward to seeing a `reasoning` button added to open-webui though. That said, a permanent API solution is a must imo.
Author
Owner

@rick-github commented on GitHub (Apr 29, 2025):

To clarify, /no_think or /think are not parsed from the prompt. The model is trained to switch when it sees these in the input token stream. There's nothing the inference framework (ollama, LMstudio, vLLM, etc) can do to influence this, other than rewriting the prompt to remove them.

$ ollama run qwen3:8b hello, what does /no_think mean?
<think>

</think>

The `/no_think` command is not a standard or widely recognized command in most applications, operating systems, or
...
<!-- gh-comment-id:2839251321 --> @rick-github commented on GitHub (Apr 29, 2025): To clarify, `/no_think` or `/think` are not parsed from the prompt. The model is trained to switch when it sees these in the input token stream. There's nothing the inference framework (ollama, LMstudio, vLLM, etc) can do to influence this, other than rewriting the prompt to remove them. ```console $ ollama run qwen3:8b hello, what does /no_think mean? <think> </think> The `/no_think` command is not a standard or widely recognized command in most applications, operating systems, or ... ```
Author
Owner

@yebanliuying commented on GitHub (Apr 30, 2025):

That's right, similar to: ollama run qwen3:32b -- enable_thinking=false

<!-- gh-comment-id:2840843872 --> @yebanliuying commented on GitHub (Apr 30, 2025): That's right, similar to: ollama run qwen3:32b -- enable_thinking=false
Author
Owner

@rick-github commented on GitHub (May 30, 2025):

https://github.com/ollama/ollama/releases/tag/v0.9.0

<!-- gh-comment-id:2921919137 --> @rick-github commented on GitHub (May 30, 2025): https://github.com/ollama/ollama/releases/tag/v0.9.0
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6875