[PR #10584] add thinking support to the api and cli #13285

Closed
opened 2026-04-13 00:22:54 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/10584

State: closed
Merged: Yes


Users can now control whether thinking models think or not, and if enabled, the thinking response is parsed separately from the content.

  • Both /api/generate and /api/chat now accept a "think" option that allows specifying whether thinking mode should be on or not
  • Templates get passed this new option so, e.g., qwen3's template can put /think or /no_think in the system prompt depending on the value of the setting
  • Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates
  • Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the "think" option is not specified, the behavior is unchanged from previous versions of ollama
  • Add parsing for thinking blocks in both streaming/non-streaming mode in both /generate and /chat
  • Update the CLI to make use of these changes. Users can pass --think or --think=false to control thinking, or during an interactive session they can use the commands /set think or /set nothink
  • A --hidethinking option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like ollama run qwen3 --think --hidethinking "my question here" where you just want to see the answer but still want the benefits of thinking models

TODO:

  • Don't parse thinking blocks when the user doesn't explicitly set the option, to maintain backwards compatibility
  • Warning on CLI when using a non-thinking/older version of a model (with an old template)
  • Wire up capabilities fully
  • Decide when to fail v. warn (only if thinking is set and true?)
  • Unify parsing for streaming/non-streaming
  • Update templates (to turn on/off & also allow for Assistant "prefixing")
  • Infer tags from template
  • Update python/js libraries
  • don't output control characters in non-interactive terminal cases
**Original Pull Request:** https://github.com/ollama/ollama/pull/10584 **State:** closed **Merged:** Yes --- Users can now control whether thinking models think or not, and if enabled, the thinking response is parsed separately from the content. - Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models TODO: - [x] Don't parse thinking blocks when the user doesn't explicitly set the option, to maintain backwards compatibility - [x] Warning on CLI when using a non-thinking/older version of a model (with an old template) - [x] Wire up capabilities fully - [x] Decide when to fail v. warn (only if thinking is set and true?) - [x] Unify parsing for streaming/non-streaming - [x] Update templates (to turn on/off & also allow for Assistant "prefixing") - [x] Infer tags from template - [x] Update python/js libraries - [x] don't output control characters in non-interactive terminal cases
GiteaMirror added the pull-request label 2026-04-13 00:22:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13285