[GH-ISSUE #14972] Add enable_thinking parameter to disable CoT/Reasoning generation #9626

Closed
opened 2026-04-12 22:31:40 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @HavenCTO on GitHub (Mar 20, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14972

Requesting a native boolean flag (enable_thinking: false) to explicitly disable Chain-of-Thought (CoT) reasoning for models like Qwen 3.5 and GPT-OSS.

  • Current Issues:
  • Unreliable Workarounds: System prompts ("Do not think") are often ignored.
  • Performance Hit: Models generate 5-10x more tokens on "thinking" than the actual answer, drastically slowing
  • inference (e.g., 80 t/s → 10 t/s).
  • Truncation: Users must set extremely low num_predict limits, cutting off valid answers mid-sentence.
  • Stability: High reasoning settings frequently cause infinite loops in agents.

Proposed Solution:

Add a dedicated parameter to the API and Modelfile to toggle reasoning generation off, ensuring direct, fast, and stable responses without relying on fragile prompt engineering.

Originally created by @HavenCTO on GitHub (Mar 20, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14972 Requesting a native boolean flag (enable_thinking: false) to explicitly disable Chain-of-Thought (CoT) reasoning for models like Qwen 3.5 and GPT-OSS. - Current Issues: - Unreliable Workarounds: System prompts ("Do not think") are often ignored. - Performance Hit: Models generate 5-10x more tokens on "thinking" than the actual answer, drastically slowing - inference (e.g., 80 t/s → 10 t/s). - Truncation: Users must set extremely low num_predict limits, cutting off valid answers mid-sentence. - Stability: High reasoning settings frequently cause infinite loops in agents. _Proposed Solution:_ **Add a dedicated parameter to the API and Modelfile to toggle reasoning generation off, ensuring direct, fast, and stable responses without relying on fragile prompt engineering.**
GiteaMirror added the feature request label 2026-04-12 22:31:40 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 20, 2026):

Set "think":false in the API request. gpt-oss can't disable thinking, but it will be reduced.

$ curl -s localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"hello","think":false,"stream":false}' | jq 'del(.context)'
{
  "model": "qwen3.5",
  "created_at": "2026-03-20T08:55:37.015107541Z",
  "response": "Hello! How can I help you today?",
  "done": true,
  "done_reason": "stop",
  "total_duration": 601682609,
  "load_duration": 506548726,
  "prompt_eval_count": 13,
  "prompt_eval_duration": 26263746,
  "eval_count": 10,
  "eval_duration": 65306368
}
<!-- gh-comment-id:4096674820 --> @rick-github commented on GitHub (Mar 20, 2026): Set `"think":false` in the API request. gpt-oss can't disable thinking, but it will be reduced. ```console $ curl -s localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"hello","think":false,"stream":false}' | jq 'del(.context)' { "model": "qwen3.5", "created_at": "2026-03-20T08:55:37.015107541Z", "response": "Hello! How can I help you today?", "done": true, "done_reason": "stop", "total_duration": 601682609, "load_duration": 506548726, "prompt_eval_count": 13, "prompt_eval_duration": 26263746, "eval_count": 10, "eval_duration": 65306368 } ```
Author
Owner

@HavenCTO commented on GitHub (Mar 20, 2026):

Thanks, but this request is for server side disablement of reasoning

<!-- gh-comment-id:4098244641 --> @HavenCTO commented on GitHub (Mar 20, 2026): Thanks, but this request is for server side disablement of reasoning
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9626