[GH-ISSUE #11279] Add Option to Disable Automatic <think> Tag Stripping for Input to the LLM #33197

Open
opened 2026-04-22 15:38:06 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @dandydan888 on GitHub (Jul 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11279

Certain models like qwen3 and deepseek-r1 have hardcoded filtering logic in filterThinkTags() that strips out <think>…</think> blocks from assistant messages before they’re passed into the model. This happens even when those tags were deliberately added by the developer, since in recent releases, the <think> content is no longer preserved in the message's content.

This can lead to unintended consequences, where the assistant loses access to its own reasoning from earlier turns, which can make follow-up responses feel inconsistent or misaligned. I've seen this happen especially around tool usage declarations and other multi-step workflows.

Right now, the easiest workaround is to replace <think> with a different arbitrary tag that doesn’t trigger the stripping logic. But that means losing the semantic connection to how the model was likely trained which isn't ideal.

Just to be clear: I’m not asking to change how the .thinking property works in the output or UI. That part’s great. This is only about preserving the full message input that gets sent back to the LLM.

Feature Request
Introduce a flag or option or anyway to disable automatic filtering, or specify the count of past "think" blocks to keep. something like:

{
  "model": "qwen3:4b",
  "messages": [
    {
      "role": "user",
      "content": "hello world!"
    }
  ],
  "think": true,
  "stream": false,
  "preserve_think_block": true    // preserves <think> block in .content (false by default)
}

This would bypass filterThinkTags() and preserve .content exactly as submitted. Sometimes you need the model to see exactly what it said before, including its thought process. Right now, that info gets stripped before it reaches the model. This breaks multi-turn logic, memory chaining, and agent behavior. A simple opt-out would make things more predictable and easier to debug

Originally created by @dandydan888 on GitHub (Jul 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11279 Certain models like `qwen3` and `deepseek-r1` have hardcoded filtering logic in [filterThinkTags()](https://github.com/ollama/ollama/blob/5d8c1735296299c3d81bb40f00038398dc729579/server/routes.go#L1652) that strips out `<think>…</think>` blocks from assistant messages before they’re passed into the model. This happens even when those tags were deliberately added by the developer, since in recent releases, the `<think>` content is no longer preserved in the message's `content`. This can lead to unintended consequences, where the assistant loses access to its own reasoning from earlier turns, which can make follow-up responses feel inconsistent or misaligned. I've seen this happen especially around tool usage declarations and other multi-step workflows. Right now, the easiest workaround is to replace `<think>` with a different arbitrary tag that doesn’t trigger the stripping logic. But that means losing the semantic connection to how the model was likely trained which isn't ideal. _Just to be clear: I’m not asking to change how the `.thinking` property works in the output or UI. That part’s great. This is only about preserving the full message input that gets sent back to the LLM._ **Feature Request** Introduce a flag or option or anyway to disable automatic filtering, or specify the count of past "think" blocks to keep. something like: ``` { "model": "qwen3:4b", "messages": [ { "role": "user", "content": "hello world!" } ], "think": true, "stream": false, "preserve_think_block": true // preserves <think> block in .content (false by default) } ``` This would bypass `filterThinkTags()` and preserve `.content` exactly as submitted. Sometimes you need the model to see exactly what it said before, including its thought process. Right now, that info gets stripped before it reaches the model. This breaks multi-turn logic, memory chaining, and agent behavior. A simple opt-out would make things more predictable and easier to debug
GiteaMirror added the feature request label 2026-04-22 15:38:06 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33197