[GH-ISSUE #8529] feat: OpenAI reasoning_content compatibility #67556

Closed
opened 2026-05-04 10:48:29 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @EntropyYue on GitHub (Jan 22, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8529

Originally assigned to: @drifkin on GitHub.

The current thinking model outputs XML tags to distinguish between thinking and answering, and we need the reasoning_content feature

Originally created by @EntropyYue on GitHub (Jan 22, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8529 Originally assigned to: @drifkin on GitHub. The current thinking model outputs XML tags to distinguish between thinking and answering, and we need the `reasoning_content` feature
GiteaMirror added the feature request label 2026-05-04 10:48:29 -05:00
Author
Owner

@sunburst-yz commented on GitHub (Jan 22, 2025):

+1

<!-- gh-comment-id:2606470064 --> @sunburst-yz commented on GitHub (Jan 22, 2025): +1
Author
Owner

@whisper-bye commented on GitHub (Feb 3, 2025):

should support

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
<!-- gh-comment-id:2631829987 --> @whisper-bye commented on GitHub (Feb 3, 2025): should support ``` reasoning_content = response.choices[0].message.reasoning_content content = response.choices[0].message.content ```
Author
Owner

@goactiongo commented on GitHub (Feb 6, 2025):

+1

<!-- gh-comment-id:2638967449 --> @goactiongo commented on GitHub (Feb 6, 2025): +1
Author
Owner

@leslie2046 commented on GitHub (Feb 10, 2025):

+1

<!-- gh-comment-id:2646791167 --> @leslie2046 commented on GitHub (Feb 10, 2025): +1
Author
Owner

@asgraf commented on GitHub (Feb 10, 2025):

+1

<!-- gh-comment-id:2646991526 --> @asgraf commented on GitHub (Feb 10, 2025): +1
Author
Owner

@kasnerz commented on GitHub (Feb 12, 2025):

Separating these two would also enable reasoning along structured outputs, right? Or does that need to be implemented separately?

For clarification: with deepseek-r1, structured outputs are currently working but I assume it eliminates the reasoning trace.

<!-- gh-comment-id:2653717646 --> @kasnerz commented on GitHub (Feb 12, 2025): Separating these two would also enable reasoning along structured outputs, right? Or does that need to be implemented separately? For clarification: with deepseek-r1, structured outputs are currently working but I assume it eliminates the reasoning trace.
Author
Owner

@wuyanfeiwork commented on GitHub (Feb 15, 2025):

+1

<!-- gh-comment-id:2660960423 --> @wuyanfeiwork commented on GitHub (Feb 15, 2025): +1
Author
Owner

@danny-avila commented on GitHub (Feb 15, 2025):

+1

This should be the standard over the XML tags to avoid parsing text for reasoning tokens.

<!-- gh-comment-id:2661046439 --> @danny-avila commented on GitHub (Feb 15, 2025): +1 This should be the standard over the XML tags to avoid parsing text for reasoning tokens.
Author
Owner

@gongzhang commented on GitHub (Feb 19, 2025):

+1

The <think> and </think> are special tokens in the tokenizer and should not be directly output as plain string. With the current use of XML tags, it cannot properly handle prompts like "Please output ten <think> and ten </think>, one per line."

<!-- gh-comment-id:2667974605 --> @gongzhang commented on GitHub (Feb 19, 2025): +1 The `<think>` and `</think>` are special tokens in the tokenizer and **should not be directly output as plain string**. With the current use of XML tags, it cannot properly handle prompts like "Please output ten \<think> and ten <\/think>, one per line."
Author
Owner

@wuyanfeiwork commented on GitHub (Feb 19, 2025):

I trust that the team will address and optimize this issue promptly. #9137

<!-- gh-comment-id:2668080722 --> @wuyanfeiwork commented on GitHub (Feb 19, 2025): I trust that the team will address and optimize this issue promptly. #9137
Author
Owner

@anunknowperson commented on GitHub (May 2, 2025):

This issue has become much more prevalent with the release of the Qwen3 models. With the correct prompt template from transformers, an empty \n block is always added to the assistant's response if thought mode is turned off. It would be more convenient if ollama could parse it automatically into reasoning content. With thought mode enabled (supposedly there should be a prefill, which ollama doesn't do in its prompt template either), it would be better if thoughts were in reasoning_content (though reasoning parsing must be a feature with ability to enable/disable).

<!-- gh-comment-id:2847074997 --> @anunknowperson commented on GitHub (May 2, 2025): This issue has become much more prevalent with the release of the Qwen3 models. With the correct prompt template from transformers, an empty <think>\n</think> block is always added to the assistant's response if thought mode is turned off. It would be more convenient if ollama could parse it automatically into reasoning content. With thought mode enabled (supposedly there should be a prefill, which ollama doesn't do in its prompt template either), it would be better if thoughts were in reasoning_content (though reasoning parsing must be a feature with ability to enable/disable).
Author
Owner

@ParthSareen commented on GitHub (May 16, 2025):

We're getting thinking separation in the Ollama API but I don't see any reasoning being provided in the OpenAI API for compatibility: https://platform.openai.com/docs/guides/reasoning?api-mode=responses

I might be looking at the wrong doc feel free to link the right one.

<!-- gh-comment-id:2887619397 --> @ParthSareen commented on GitHub (May 16, 2025): We're getting thinking separation in the Ollama API but I don't see any reasoning being provided in the OpenAI API for compatibility: https://platform.openai.com/docs/guides/reasoning?api-mode=responses I might be looking at the wrong doc feel free to link the right one.
Author
Owner

@williamlzw commented on GitHub (May 20, 2025):

+1
https://github.com/imxcstar/Tinvo/blob/master/Tinvo.Provider.OpenAI/AIScheduler/OpenAIProviderParser.cs
https://api-docs.deepseek.com/guides/reasoning_model
This is the reasoning_content output, and currently deepseek-r1 and qwen3 support this output.

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True
)

reasoning_content = ""
content = ""

for chunk in response:
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    else:
        content += chunk.choices[0].delta.content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"})
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True
)
<!-- gh-comment-id:2893161174 --> @williamlzw commented on GitHub (May 20, 2025): +1 https://github.com/imxcstar/Tinvo/blob/master/Tinvo.Provider.OpenAI/AIScheduler/OpenAIProviderParser.cs https://api-docs.deepseek.com/guides/reasoning_model This is the reasoning_content output, and currently deepseek-r1 and qwen3 support this output. ``` from openai import OpenAI client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com") # Round 1 messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}] response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=True ) reasoning_content = "" content = "" for chunk in response: if chunk.choices[0].delta.reasoning_content: reasoning_content += chunk.choices[0].delta.reasoning_content else: content += chunk.choices[0].delta.content # Round 2 messages.append({"role": "assistant", "content": content}) messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"}) response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=True ) ```
Author
Owner

@danny-avila commented on GitHub (May 25, 2025):

Does PR #10584 address programmatic differentiation between reasoning and final output?

My main need, as I suspect others', is distinguishing "thinking" from answers without text parsing - ideally following Deepseek-R1's standard with separate fields, but any structured approach would work.

<!-- gh-comment-id:2907858055 --> @danny-avila commented on GitHub (May 25, 2025): Does PR #10584 address programmatic differentiation between reasoning and final output? My main need, as I suspect others', is distinguishing "thinking" from answers without text parsing - ideally following Deepseek-R1's standard with separate fields, but any structured approach would work.
Author
Owner

@ParthSareen commented on GitHub (May 27, 2025):

Yes @danny-avila, there are separate fields to identify and enable thinking content as well as get a final output returned.

<!-- gh-comment-id:2910893497 --> @ParthSareen commented on GitHub (May 27, 2025): Yes @danny-avila, there are separate fields to identify and enable thinking content as well as get a final output returned.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67556