[PR #10584] [MERGED] add thinking support to the api and cli #75579

Closed
opened 2026-05-05 08:00:14 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10584
Author: @drifkin
Created: 5/6/2025
Status: Merged
Merged: 5/29/2025
Merged by: @drifkin

Base: mainHead: drifkin/thinking-api-support


📝 Commits (8)

  • de22af6 add thinking support to the api and cli
  • 2da7b0c don't render control chars for thinking prefix when piping
  • 2d33175 document think parameter and thinking response
  • df0cd44 remove extraneous Think option
  • a4590b4 fix err location in tuple
  • 7d1e98b thinkingParseState -> thinkingState
  • 45f63d7 address review comments
  • 6c50278 remove bad log added back from rebase

📊 Changes

17 files changed (+1195 additions, -49 deletions)

View changed files

📝 api/types.go (+19 -2)
📝 api/types_test.go (+47 -0)
📝 cmd/cmd.go (+163 -15)
📝 cmd/interactive.go (+32 -0)
cmd/warn_thinking_test.go (+63 -0)
📝 docs/api.md (+3 -0)
📝 model/bytepairencoding.go (+10 -1)
📝 readline/types.go (+2 -0)
📝 server/images.go (+18 -2)
📝 server/prompt.go (+11 -3)
📝 server/prompt_test.go (+2 -1)
📝 server/routes.go (+77 -12)
📝 server/routes_generate_test.go (+19 -0)
server/thinking.go (+300 -0)
server/thinking_test.go (+403 -0)
📝 template/template.go (+25 -13)
📝 types/model/capability.go (+1 -0)

📄 Description

Users can now control whether thinking models think or not, and if enabled, the thinking response is parsed separately from the content.

  • Both /api/generate and /api/chat now accept a "think" option that allows specifying whether thinking mode should be on or not
  • Templates get passed this new option so, e.g., qwen3's template can put /think or /no_think in the system prompt depending on the value of the setting
  • Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates
  • Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the "think" option is not specified, the behavior is unchanged from previous versions of ollama
  • Add parsing for thinking blocks in both streaming/non-streaming mode in both /generate and /chat
  • Update the CLI to make use of these changes. Users can pass --think or --think=false to control thinking, or during an interactive session they can use the commands /set think or /set nothink
  • A --hidethinking option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like ollama run qwen3 --think --hidethinking "my question here" where you just want to see the answer but still want the benefits of thinking models

TODO:

  • Don't parse thinking blocks when the user doesn't explicitly set the option, to maintain backwards compatibility
  • Warning on CLI when using a non-thinking/older version of a model (with an old template)
  • Wire up capabilities fully
  • Decide when to fail v. warn (only if thinking is set and true?)
  • Unify parsing for streaming/non-streaming
  • Update templates (to turn on/off & also allow for Assistant "prefixing")
  • Infer tags from template
  • Update python/js libraries
  • don't output control characters in non-interactive terminal cases

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10584 **Author:** [@drifkin](https://github.com/drifkin) **Created:** 5/6/2025 **Status:** ✅ Merged **Merged:** 5/29/2025 **Merged by:** [@drifkin](https://github.com/drifkin) **Base:** `main` ← **Head:** `drifkin/thinking-api-support` --- ### 📝 Commits (8) - [`de22af6`](https://github.com/ollama/ollama/commit/de22af68969c87cbc70b3d331246928847e8e5ca) add thinking support to the api and cli - [`2da7b0c`](https://github.com/ollama/ollama/commit/2da7b0c1e2e6deb2613a639dac6a5629a1305de8) don't render control chars for thinking prefix when piping - [`2d33175`](https://github.com/ollama/ollama/commit/2d33175290c824b85af4038b420e210b80f71551) document `think` parameter and `thinking` response - [`df0cd44`](https://github.com/ollama/ollama/commit/df0cd4473ef48a8334232b78032ec70a2e9d6a09) remove extraneous Think option - [`a4590b4`](https://github.com/ollama/ollama/commit/a4590b45225976be912a3283bdab61c09080befd) fix err location in tuple - [`7d1e98b`](https://github.com/ollama/ollama/commit/7d1e98b410b684c710d1fccea20d67d4798eae53) `thinkingParseState` -> `thinkingState` - [`45f63d7`](https://github.com/ollama/ollama/commit/45f63d794f9fd21d06414e80e598034644965396) address review comments - [`6c50278`](https://github.com/ollama/ollama/commit/6c502780ee7c30c481617a654deb00666a65457e) remove bad log added back from rebase ### 📊 Changes **17 files changed** (+1195 additions, -49 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+19 -2) 📝 `api/types_test.go` (+47 -0) 📝 `cmd/cmd.go` (+163 -15) 📝 `cmd/interactive.go` (+32 -0) ➕ `cmd/warn_thinking_test.go` (+63 -0) 📝 `docs/api.md` (+3 -0) 📝 `model/bytepairencoding.go` (+10 -1) 📝 `readline/types.go` (+2 -0) 📝 `server/images.go` (+18 -2) 📝 `server/prompt.go` (+11 -3) 📝 `server/prompt_test.go` (+2 -1) 📝 `server/routes.go` (+77 -12) 📝 `server/routes_generate_test.go` (+19 -0) ➕ `server/thinking.go` (+300 -0) ➕ `server/thinking_test.go` (+403 -0) 📝 `template/template.go` (+25 -13) 📝 `types/model/capability.go` (+1 -0) </details> ### 📄 Description Users can now control whether thinking models think or not, and if enabled, the thinking response is parsed separately from the content. - Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models TODO: - [x] Don't parse thinking blocks when the user doesn't explicitly set the option, to maintain backwards compatibility - [x] Warning on CLI when using a non-thinking/older version of a model (with an old template) - [x] Wire up capabilities fully - [x] Decide when to fail v. warn (only if thinking is set and true?) - [x] Unify parsing for streaming/non-streaming - [x] Update templates (to turn on/off & also allow for Assistant "prefixing") - [x] Infer tags from template - [x] Update python/js libraries - [x] don't output control characters in non-interactive terminal cases --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:00:14 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75579