[GH-ISSUE #11738] GPT-OSS: Hiding ChatCompletions reasoning traces with reasoning.exclude=True #7775

Open
opened 2026-04-12 19:56:12 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @lukestanley on GitHub (Aug 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11738

@drifkin Please correct me if I am wrong, I think the reasoning effort parameter is implemented but the exclude parameter is not implemented yet.

OpenRouter has more complete Chat Completions specification for reasoning parameter support.
It makes sense that yesterday in OpenAI's Cookbook Dominik Kundel suggests API implementations copy the OpenRouter API:

If you are implementing a Chat Completions API, there is no official spec for handling chain of thought in the published OpenAI specs, as our hosted models will not offer this feature for the time being. We ask you to follow the following convention from OpenRouter instead.

OpenRouter uses the format below:

{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    // One of the following (not both):
    "effort": "high", // Can be "high", "medium", or "low" (OpenAI-style)
    "max_tokens": 2000, // Specific token limit (Anthropic-style)

    // Optional: Default is false. All models support this.
    "exclude": false, // Set to true to exclude reasoning tokens from response

    // Or enable reasoning with the default parameters:
    "enabled": true // Default: inferred from `effort` or `max_tokens`
  }
}

( taken from https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens )

Implementing the exclude property would be a good next step for better support.

Other aspects could be good to implement after.

This would help with https://github.com/ollama/ollama/issues/11689

Originally created by @lukestanley on GitHub (Aug 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11738 @drifkin Please correct me if I am wrong, I think [the reasoning effort parameter is implemented](https://github.com/ollama/ollama/blob/4742e12c2360bd2b43aedcf6d11cefc3a048f791/openai/openai.go#L517-L539) but the exclude parameter is not implemented yet. OpenRouter has more complete Chat Completions specification for reasoning parameter support. It makes sense that yesterday in OpenAI's Cookbook [Dominik Kundel](https://github.com/openai/openai-cookbook/pull/1982) suggests API implementations [copy the OpenRouter API](https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot#chat-completions-api): > If you are implementing a Chat Completions API, there is no official spec for handling chain of thought in the published OpenAI specs, as our hosted models will not offer this feature for the time being. We ask you to follow [the following convention from OpenRouter instead](https://openrouter.ai/docs/use-cases/reasoning-tokens). OpenRouter uses the format below: ```js { "model": "your-model", "messages": [], "reasoning": { // One of the following (not both): "effort": "high", // Can be "high", "medium", or "low" (OpenAI-style) "max_tokens": 2000, // Specific token limit (Anthropic-style) // Optional: Default is false. All models support this. "exclude": false, // Set to true to exclude reasoning tokens from response // Or enable reasoning with the default parameters: "enabled": true // Default: inferred from `effort` or `max_tokens` } } ``` ( taken from https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens ) Implementing the `exclude` property would be a good next step for better support. Other aspects could be good to implement after. This would help with https://github.com/ollama/ollama/issues/11689
GiteaMirror added the feature request label 2026-04-12 19:56:12 -05:00
Author
Owner

@drifkin commented on GitHub (Aug 7, 2025):

that's correct, for the compat layer we don't yet implement exclude. Previously we've been reluctant to add extensions that OAI themselves don't use, but now things are changing with these new models.

For gpt-oss we always parse thinking out into a separate field, so you can just ignore it if you don't want to use it. Are you hoping for this field to avoid sending data you're going to ignore anyway?

<!-- gh-comment-id:3162135468 --> @drifkin commented on GitHub (Aug 7, 2025): that's correct, for the compat layer we don't yet implement `exclude`. Previously we've been reluctant to add extensions that OAI themselves don't use, but now things are changing with these new models. For gpt-oss we always parse thinking out into a separate field, so you can just ignore it if you don't want to use it. Are you hoping for this field to avoid sending data you're going to ignore anyway?
Author
Owner

@debruyckere commented on GitHub (Aug 8, 2025):

I'm using the OpenAI .Net SDK to connect with Ollama. There is no exclude reasoning option available.
https://github.com/openai/openai-dotnet/tree/OpenAI_2.3.0

I found here that the API shouldn't return reasoning tokens:
"While reasoning tokens are not visible via the API ..."
https://platform.openai.com/docs/guides/reasoning

Yet Ollama does return them by default when using gpt-oss (20b). There doesn't seem to be a way to diferentiate them from other output (no XML tags or so).

<!-- gh-comment-id:3167693928 --> @debruyckere commented on GitHub (Aug 8, 2025): I'm using the OpenAI .Net SDK to connect with Ollama. There is no exclude reasoning option available. https://github.com/openai/openai-dotnet/tree/OpenAI_2.3.0 I found here that the API shouldn't return reasoning tokens: "While reasoning tokens are not visible via the API ..." https://platform.openai.com/docs/guides/reasoning Yet Ollama does return them by default when using gpt-oss (20b). There doesn't seem to be a way to diferentiate them from other output (no XML tags or so).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7775