[GH-ISSUE #10538] Structured outputs for reasoning models / thinking-mode #68993

New Issue

GiteaMirror · 2026-05-04T16:42:33-05:00

GiteaMirror commented

2026-05-04 16:42:33 -05:00

Originally created by @bjoernhommel on GitHub (May 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10538

Originally assigned to: @drifkin, @ParthSareen on GitHub.

Problem: As far as I know, structured outputs and reasoning models / thinking mode are currently incompatible. As I understand it, using format suppresses next-token probability in the model output for any tokens not complying with the desired output structure, thereby "forcing" structured generation.

I assume the reason for the incompatibility with reasoning models is that the probability for <think> token also will be set to zero, effectively disabling thinking mode for structured outputs.

Proposed solution(s):
As @kasnerz suggested here, separating thinking output from the rest of the response should fix this.

A more hacky solution could be to first only suppress token probabilities for output outside of thinking tags, and secondly pruning the thinking-content from the final output.

Originally created by @bjoernhommel on GitHub (May 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10538 Originally assigned to: @drifkin, @ParthSareen on GitHub. **Problem**: As far as I know, [structured outputs](https://ollama.com/blog/structured-outputs) and reasoning models / thinking mode are currently incompatible. As I understand it, using `format` [suppresses next-token probability](https://blog.danielclayton.co.uk/posts/ollama-structured-outputs/) in the model output for any tokens not complying with the desired output structure, thereby "forcing" structured generation. I assume the reason for the incompatibility with reasoning models is that the probability for `<think>` token also will be set to zero, effectively disabling thinking mode for structured outputs. **Proposed solution(s)**: As @kasnerz suggested [here](https://github.com/ollama/ollama/issues/8529#issuecomment-2653717646), **separating thinking output from the rest of the response** should fix this. A more hacky solution could be to first only suppress token probabilities for output outside of thinking tags, and secondly pruning the thinking-content from the final output.

GiteaMirror added the feature request label 2026-05-04 16:42:33 -05:00

GiteaMirror commented

2026-05-04 16:42:36 -05:00

@rick-github commented on GitHub (May 2, 2025):

What issues are you seeing with structured outputs and reasoning models? I did a quick check and got structured output from qwen3. Admittedly a simple test and maybe not the use case your are shooting for.

#!/usr/bin/env python3

from pydantic import BaseModel

import ollama
import argparse

class Answer(BaseModel):
  response: bool
  thinking: str
  reasoning: str

  def __str__(self):
    return f"response: {self.response}\nthinking: {self.thinking}\nreasoning: {self.reasoning}"

parser = argparse.ArgumentParser()
parser.add_argument("--prompt", default="Is the sky blue?")
parser.add_argument("--model", default="qwen3")
args = parser.parse_args()

def answer(prompt, use_structured_output=False):
  format = None
  if use_structured_output:
    format = Answer.model_json_schema()
  response = ollama.chat(
      model=args.model,
      messages=[
        {"role":"user", "content":args.prompt}
      ],
      options={"temperature":0, "seed":0},
      format=format)
  if use_structured_output:
    return Answer.model_validate_json(response.message.content)
  return response.message.content

print(answer(args.prompt, False))
print("*******")
print(answer(args.prompt, True))

$ ./yesno.py 
<think>
Okay, the user is asking if the sky is blue. I need to explain why the sky appears blue. Let me start by recalling the basic physics behind this. I remember that sunlight is white, but it's made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the molecules in the air scatter the light. 

Wait, which part of the spectrum is scattered more? Oh right, Rayleigh scattering. Shorter wavelengths like blue and violet are scattered more than longer ones like red or yellow. But why do we see blue instead of violet? Maybe because the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere.

So the scattered blue light comes from all directions, making the sky appear blue. But during sunrise or sunset, the light has to pass through more atmosphere, so the blue light is scattered away, leaving the longer wavelengths like red and orange. That's why the sky turns red or orange during those times.

I should also mention that the actual color of the sky can vary depending on weather and pollution. For example, on hazy days, the sky might look more gray or white because the particles in the air scatter light differently. Also, in space, without an atmosphere, the sky would appear black because there's no scattering of light.

Wait, did I cover all the key points? Let me check. The main reason is Rayleigh scattering, the role of different wavelengths, the dominance of blue over violet, and the exceptions like sunrise/sunset and atmospheric conditions. Yeah, that should cover it. I should present this in a clear, conversational way without getting too technical. Make sure to explain the concepts simply so it's easy to understand.
</think>

Yes, the sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a simple explanation:

1. **Sunlight and Colors**: Sunlight is white, but it consists of all colors of the visible spectrum (red, orange, yellow, green, blue, indigo, violet). Each color has a different wavelength.

2. **Atmospheric Scattering**: When sunlight enters Earth's atmosphere, it interacts with gas molecules and small particles. Shorter wavelengths (like blue and violet) are scattered more than longer wavelengths (like red or yellow) by these particles. This is called **Rayleigh scattering**.

3. **Why Blue, Not Violet**: While violet light is scattered even more than blue, the sun emits less violet light, and our eyes are more sensitive to blue. Additionally, some violet light is absorbed by the upper atmosphere. This combination makes the sky appear **blue** to us.

4. **Exceptions**: 
   - **Sunrise/Sunset**: During these times, sunlight travels through more atmosphere, scattering out the blue light and leaving reds and oranges to dominate.
   - **Pollution/Weather**: Hazy or polluted air can scatter light differently, making the sky appear gray or white.

5. **In Space**: Without an atmosphere, the sky would be black because there are no molecules to scatter sunlight.

So, the blue sky is a result of how Earth's atmosphere interacts with sunlight! 🌈
*******
response: True
thinking: Okay, the user is asking if the sky is blue. I know that the sky appears blue due to Rayleigh scattering. Let me recall how that works. Sunlight is made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the shorter wavelengths (like blue and violet) scatter more than the longer ones (like red and yellow). But why do we see blue instead of violet? Well, the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere. So the scattered blue light fills the sky, making it appear blue. However, during sunrise or sunset, the light has to pass through more atmosphere, scattering the blue light out of our line of sight, which is why the sky turns red or orange. I should also mention that the sky can appear different colors under various conditions, like pollution or weather. But the main reason is Rayleigh scattering. Let me make sure I'm not missing any other factors. Maybe mention that the color can vary depending on the time of day and atmospheric conditions. Yeah, that's right. So the answer is yes, the sky is blue, but with some nuances.
reasoning: The sky appears blue due to Rayleigh scattering, where shorter wavelengths of light (blue and violet) are scattered more by the atmosphere. The sun emits more blue light, and our eyes are more sensitive to it, making the sky appear blue. During sunrise or sunset, the light travels a longer path, scattering blue light out of our view, resulting in red or orange hues. The color can also vary with atmospheric conditions.

@rick-github commented on GitHub (May 2, 2025): What issues are you seeing with structured outputs and reasoning models? I did a quick check and got structured output from qwen3. Admittedly a simple test and maybe not the use case your are shooting for. ```python #!/usr/bin/env python3 from pydantic import BaseModel import ollama import argparse class Answer(BaseModel): response: bool thinking: str reasoning: str def __str__(self): return f"response: {self.response}\nthinking: {self.thinking}\nreasoning: {self.reasoning}" parser = argparse.ArgumentParser() parser.add_argument("--prompt", default="Is the sky blue?") parser.add_argument("--model", default="qwen3") args = parser.parse_args() def answer(prompt, use_structured_output=False): format = None if use_structured_output: format = Answer.model_json_schema() response = ollama.chat( model=args.model, messages=[ {"role":"user", "content":args.prompt} ], options={"temperature":0, "seed":0}, format=format) if use_structured_output: return Answer.model_validate_json(response.message.content) return response.message.content print(answer(args.prompt, False)) print("*******") print(answer(args.prompt, True)) ``` ```console $ ./yesno.py <think> Okay, the user is asking if the sky is blue. I need to explain why the sky appears blue. Let me start by recalling the basic physics behind this. I remember that sunlight is white, but it's made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the molecules in the air scatter the light. Wait, which part of the spectrum is scattered more? Oh right, Rayleigh scattering. Shorter wavelengths like blue and violet are scattered more than longer ones like red or yellow. But why do we see blue instead of violet? Maybe because the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere. So the scattered blue light comes from all directions, making the sky appear blue. But during sunrise or sunset, the light has to pass through more atmosphere, so the blue light is scattered away, leaving the longer wavelengths like red and orange. That's why the sky turns red or orange during those times. I should also mention that the actual color of the sky can vary depending on weather and pollution. For example, on hazy days, the sky might look more gray or white because the particles in the air scatter light differently. Also, in space, without an atmosphere, the sky would appear black because there's no scattering of light. Wait, did I cover all the key points? Let me check. The main reason is Rayleigh scattering, the role of different wavelengths, the dominance of blue over violet, and the exceptions like sunrise/sunset and atmospheric conditions. Yeah, that should cover it. I should present this in a clear, conversational way without getting too technical. Make sure to explain the concepts simply so it's easy to understand. </think> Yes, the sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a simple explanation: 1. **Sunlight and Colors**: Sunlight is white, but it consists of all colors of the visible spectrum (red, orange, yellow, green, blue, indigo, violet). Each color has a different wavelength. 2. **Atmospheric Scattering**: When sunlight enters Earth's atmosphere, it interacts with gas molecules and small particles. Shorter wavelengths (like blue and violet) are scattered more than longer wavelengths (like red or yellow) by these particles. This is called **Rayleigh scattering**. 3. **Why Blue, Not Violet**: While violet light is scattered even more than blue, the sun emits less violet light, and our eyes are more sensitive to blue. Additionally, some violet light is absorbed by the upper atmosphere. This combination makes the sky appear **blue** to us. 4. **Exceptions**: - **Sunrise/Sunset**: During these times, sunlight travels through more atmosphere, scattering out the blue light and leaving reds and oranges to dominate. - **Pollution/Weather**: Hazy or polluted air can scatter light differently, making the sky appear gray or white. 5. **In Space**: Without an atmosphere, the sky would be black because there are no molecules to scatter sunlight. So, the blue sky is a result of how Earth's atmosphere interacts with sunlight! 🌈 ******* response: True thinking: Okay, the user is asking if the sky is blue. I know that the sky appears blue due to Rayleigh scattering. Let me recall how that works. Sunlight is made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the shorter wavelengths (like blue and violet) scatter more than the longer ones (like red and yellow). But why do we see blue instead of violet? Well, the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere. So the scattered blue light fills the sky, making it appear blue. However, during sunrise or sunset, the light has to pass through more atmosphere, scattering the blue light out of our line of sight, which is why the sky turns red or orange. I should also mention that the sky can appear different colors under various conditions, like pollution or weather. But the main reason is Rayleigh scattering. Let me make sure I'm not missing any other factors. Maybe mention that the color can vary depending on the time of day and atmospheric conditions. Yeah, that's right. So the answer is yes, the sky is blue, but with some nuances. reasoning: The sky appears blue due to Rayleigh scattering, where shorter wavelengths of light (blue and violet) are scattered more by the atmosphere. The sun emits more blue light, and our eyes are more sensitive to it, making the sky appear blue. During sunrise or sunset, the light travels a longer path, scattering blue light out of our view, resulting in red or orange hues. The color can also vary with atmospheric conditions. ```

GiteaMirror commented

2026-05-04 16:42:38 -05:00

@nicolab28 commented on GitHub (May 3, 2025):

I use ai sdk, and I add a string "thinking" in my json.
With the model qwen3 4b 8b 30b and 32b it works, I can see its thinking
but the model 14b returns nothing at all, an empty json.
on the other hand I have $schema, and I've even already had the whole schema in the response.

Here are a few examples:

modelFullName ollama/qwen3:4b-q8_0
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "L'utilisateur demande d'apporter une bière. Cependant, je n'ai pas d'appareil pour cela. Je dois informer qu'il ne peut pas faire cela.",
"message": "Désolé, je ne peux pas apporter de bière. Je n'ai pas d'appareil pour cela.",
"toolCalls": []
}

modelFullName ollama/qwen3:8b
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "L'utilisateur demande à avoir une bière. Je dois vérifier si je peux effectuer cette action. Je n'ai pas d'outil pour apporter une bière. Je dois informer l'utilisateur que je ne peux pas répondre à cette demande et préciser que ma mise à jour est nécessaire.",
"message": "Désolé, je ne peux pas apporter de bière. Une mise à jour de mon système est nécessaire pour effectuer cette action.",
"toolCalls": [],
"$schema": "http://json-schema.org/draft-07/schema#"
}

modelFullName ollama/qwen3:14b
prompt: tu pourrais m'apporter une bière ?
lastObject {}

modelFullName ollama/qwen3:30b-a3b-q4_K_M
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "L'utilisateur demande une bière. Je n'ai pas d'outil pour apporter une bière. Je dois lui dire que je ne peux pas le faire.",
"message": "Je ne peux pas vous apporter une bière. Je suis un assistant vocal.",
"toolCalls": [],
"$schema": "http://json-schema.org/draft-07/schema#"
}

modelFullName ollama/qwen3:32b-q4_K_M
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "Je ne peux pas apporter de bière à Nicolas, car je n'ai pas de fonctionnalité permettant de manipuler des objets physiques. Je peux suggérer qu'il aille en chercher une lui-même.",
"message": "Désolé, je ne peux pas apporter de bière. Tu devras aller en chercher une toi-même.",
"toolCalls": []
}

modelFullName ollama/qwen3:4b-q8_0
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois utiliser l'outil 'onoff' pour allumer la cuisine et ensuite 'wait' pour attendre 10 secondes.",
"message": "La lumière de la cuisine est allumée. Elle restera allumée pendant 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
}
]
}

modelFullName ollama/qwen3:8b
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer la lumière, puis attendre 10 secondes avant de l'éteindre.",
"message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
},
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": false
}
}
]
}

modelFullName ollama/qwen3:14b
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {}

modelFullName ollama/qwen3:30b-a3b-q4_K_M
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Pour cela, je dois d'abord allumer l'appareil CUISINE en utilisant le tool onoff. Ensuite, je dois attendre 10 secondes avec le tool wait. Enfin, je dois éteindre l'appareil CUISINE avec le tool onoff. Je vais donc appeler les outils onoff, wait, puis onoff à nouveau.",
"message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
},
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": false
}
}
]
}

modelFullName ollama/qwen3:32b-q4_K_M
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur souhaite allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer l'appareil, puis l'éteindre après le délai.",
"message": "Je vais allumer la lumière de la cuisine et l'éteindre après 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
},
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": false
}
}
]
}

@nicolab28 commented on GitHub (May 3, 2025): I use ai sdk, and I add a string "thinking" in my json. With the model qwen3 4b 8b 30b and 32b it works, I can see its thinking but the model 14b returns nothing at all, an empty json. on the other hand I have $schema, and I've even already had the whole schema in the response. Here are a few examples: modelFullName ollama/qwen3:4b-q8_0 prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "L'utilisateur demande d'apporter une bière. Cependant, je n'ai pas d'appareil pour cela. Je dois informer qu'il ne peut pas faire cela.", "message": "Désolé, je ne peux pas apporter de bière. Je n'ai pas d'appareil pour cela.", "toolCalls": [] } modelFullName ollama/qwen3:8b prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "L'utilisateur demande à avoir une bière. Je dois vérifier si je peux effectuer cette action. Je n'ai pas d'outil pour apporter une bière. Je dois informer l'utilisateur que je ne peux pas répondre à cette demande et préciser que ma mise à jour est nécessaire.", "message": "Désolé, je ne peux pas apporter de bière. Une mise à jour de mon système est nécessaire pour effectuer cette action.", "toolCalls": [], "$schema": "http://json-schema.org/draft-07/schema#" } modelFullName ollama/qwen3:14b prompt: tu pourrais m'apporter une bière ? lastObject {} modelFullName ollama/qwen3:30b-a3b-q4_K_M prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "L'utilisateur demande une bière. Je n'ai pas d'outil pour apporter une bière. Je dois lui dire que je ne peux pas le faire.", "message": "Je ne peux pas vous apporter une bière. Je suis un assistant vocal.", "toolCalls": [], "$schema": "http://json-schema.org/draft-07/schema#" } modelFullName ollama/qwen3:32b-q4_K_M prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "Je ne peux pas apporter de bière à Nicolas, car je n'ai pas de fonctionnalité permettant de manipuler des objets physiques. Je peux suggérer qu'il aille en chercher une lui-même.", "message": "Désolé, je ne peux pas apporter de bière. Tu devras aller en chercher une toi-même.", "toolCalls": [] } modelFullName ollama/qwen3:4b-q8_0 prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois utiliser l'outil 'onoff' pour allumer la cuisine et ensuite 'wait' pour attendre 10 secondes.", "message": "La lumière de la cuisine est allumée. Elle restera allumée pendant 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } } ] } modelFullName ollama/qwen3:8b prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer la lumière, puis attendre 10 secondes avant de l'éteindre.", "message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } }, { "tool": "onoff", "args": { "dev": "CUISINE", "on": false } } ] } modelFullName ollama/qwen3:14b prompt: allume la lumière de la cuisine pendant 10 secondes lastObject {} modelFullName ollama/qwen3:30b-a3b-q4_K_M prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Pour cela, je dois d'abord allumer l'appareil CUISINE en utilisant le tool onoff. Ensuite, je dois attendre 10 secondes avec le tool wait. Enfin, je dois éteindre l'appareil CUISINE avec le tool onoff. Je vais donc appeler les outils onoff, wait, puis onoff à nouveau.", "message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } }, { "tool": "onoff", "args": { "dev": "CUISINE", "on": false } } ] } modelFullName ollama/qwen3:32b-q4_K_M prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur souhaite allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer l'appareil, puis l'éteindre après le délai.", "message": "Je vais allumer la lumière de la cuisine et l'éteindre après 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } }, { "tool": "onoff", "args": { "dev": "CUISINE", "on": false } } ] }

GiteaMirror commented

2026-05-04 16:42:41 -05:00

@rick-github commented on GitHub (May 3, 2025):

If other models work and qwen3:14b doesn't, that would seem to be a problem with the model.

@rick-github commented on GitHub (May 3, 2025): If other models work and qwen3:14b doesn't, that would seem to be a problem with the model.

GiteaMirror commented

2026-05-04 16:42:42 -05:00

@functorism commented on GitHub (May 16, 2025):

@rick-github Can you please read the original issue again.

@bjoernhommel You're absolutely right - there indeed needs to exist more flexibility/control over the logit biasing for structured output when reasoning models are used.

The entire point here is that using a reasoning model with structured output should not require full omission of reasoning. Ollama needs to facilitate the co-existence of reasoning + structured output.

@functorism commented on GitHub (May 16, 2025): @rick-github Can you please read the original issue again. @bjoernhommel You're absolutely right - there indeed needs to exist more flexibility/control over the logit biasing for structured output when reasoning models are used. The entire point here is that using a reasoning model with structured output should not **require full omission of reasoning**. Ollama needs to facilitate the co-existence of reasoning + structured output.

GiteaMirror commented

2026-05-04 16:42:44 -05:00

@functorism commented on GitHub (May 16, 2025):

https://github.com/ollama/ollama/pull/10584 - WIP thinking API support

@functorism commented on GitHub (May 16, 2025): ![Image](https://github.com/user-attachments/assets/f1add1e1-31b8-4945-ae46-057eadb6dee9) https://github.com/ollama/ollama/pull/10584 - WIP thinking API support

GiteaMirror commented

2026-05-04 16:42:48 -05:00

@ParthSareen commented on GitHub (May 16, 2025):

Once the thinking API lands, I can take a look and see how to do thinking generation -> applying the masks only for content after. In an ideal world we can have "thinking" enabled and then do structured outputs after that portion. Although the example @rick-github gave is a good way to go about it atm.

@ParthSareen commented on GitHub (May 16, 2025): Once the thinking API lands, I can take a look and see how to do thinking generation -> applying the masks only for content after. In an ideal world we can have "thinking" enabled and then do structured outputs after that portion. Although the example @rick-github gave is a good way to go about it atm.

GiteaMirror commented

2026-05-04 16:42:51 -05:00

@iamFIREcracker commented on GitHub (Jun 27, 2025):

Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by @rick-github? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption.

@iamFIREcracker commented on GitHub (Jun 27, 2025): Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by @rick-github? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard <think> tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the <think> tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption.

GiteaMirror commented

2026-05-04 16:42:57 -05:00

@morrisalp commented on GitHub (Aug 17, 2025):

Too bad this is not supported now! I am migrating from ollama to llama.cpp specifically because of this issue.
Impressionistically, I found using a reasoning tag in JSON output performs much worse.
Supporting GBNF would also solve this issue but it seems like it's not being considered (#6237 #11911 #5899) for reasons that I don't fully understand.

@morrisalp commented on GitHub (Aug 17, 2025): Too bad this is not supported now! I am migrating from ollama to llama.cpp specifically because of this issue. Impressionistically, I found using a reasoning tag in JSON output performs much worse. Supporting GBNF would also solve this issue but it seems like it's not being considered (#6237 #11911 #5899) for reasons that I don't fully understand.

GiteaMirror commented

2026-05-04 16:42:59 -05:00

@sjsone commented on GitHub (Aug 23, 2025):

@morrisalp I had the same issue and created a MR https://github.com/ollama/ollama/pull/12055
Please test it and give the MR some props

@sjsone commented on GitHub (Aug 23, 2025): @morrisalp I had the same issue and created a MR https://github.com/ollama/ollama/pull/12055 Please test it and give the MR some props

GiteaMirror commented

2026-05-04 16:43:00 -05:00

@guilhermecxe commented on GitHub (Sep 25, 2025):

Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by @rick-github? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption.

The paper that showed the benefits of CoT (https://arxiv.org/abs/2201.11903) concluded that exposing the reasoning after the answer did't was as effective as reasoning before answer, so I think that if we ensure the generation of thinking before the answer of interest, we can get similar results, yes.

@guilhermecxe commented on GitHub (Sep 25, 2025): > Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by [@rick-github](https://github.com/rick-github)? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption. The paper that showed the benefits of CoT (https://arxiv.org/abs/2201.11903) concluded that exposing the reasoning after the answer did't was as effective as reasoning before answer, so I think that if we ensure the generation of thinking before the answer of interest, we can get similar results, yes.

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#68993