[GH-ISSUE #10538] Structured outputs for reasoning models / thinking-mode #68993

Open
opened 2026-05-04 16:42:33 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @bjoernhommel on GitHub (May 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10538

Originally assigned to: @drifkin, @ParthSareen on GitHub.

Problem: As far as I know, structured outputs and reasoning models / thinking mode are currently incompatible. As I understand it, using format suppresses next-token probability in the model output for any tokens not complying with the desired output structure, thereby "forcing" structured generation.

I assume the reason for the incompatibility with reasoning models is that the probability for <think> token also will be set to zero, effectively disabling thinking mode for structured outputs.

Proposed solution(s):
As @kasnerz suggested here, separating thinking output from the rest of the response should fix this.

A more hacky solution could be to first only suppress token probabilities for output outside of thinking tags, and secondly pruning the thinking-content from the final output.

Originally created by @bjoernhommel on GitHub (May 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10538 Originally assigned to: @drifkin, @ParthSareen on GitHub. **Problem**: As far as I know, [structured outputs](https://ollama.com/blog/structured-outputs) and reasoning models / thinking mode are currently incompatible. As I understand it, using `format` [suppresses next-token probability](https://blog.danielclayton.co.uk/posts/ollama-structured-outputs/) in the model output for any tokens not complying with the desired output structure, thereby "forcing" structured generation. I assume the reason for the incompatibility with reasoning models is that the probability for `<think>` token also will be set to zero, effectively disabling thinking mode for structured outputs. **Proposed solution(s)**: As @kasnerz suggested [here](https://github.com/ollama/ollama/issues/8529#issuecomment-2653717646), **separating thinking output from the rest of the response** should fix this. A more hacky solution could be to first only suppress token probabilities for output outside of thinking tags, and secondly pruning the thinking-content from the final output.
GiteaMirror added the feature request label 2026-05-04 16:42:33 -05:00
Author
Owner

@rick-github commented on GitHub (May 2, 2025):

What issues are you seeing with structured outputs and reasoning models? I did a quick check and got structured output from qwen3. Admittedly a simple test and maybe not the use case your are shooting for.

#!/usr/bin/env python3

from pydantic import BaseModel

import ollama
import argparse

class Answer(BaseModel):
  response: bool
  thinking: str
  reasoning: str

  def __str__(self):
    return f"response: {self.response}\nthinking: {self.thinking}\nreasoning: {self.reasoning}"

parser = argparse.ArgumentParser()
parser.add_argument("--prompt", default="Is the sky blue?")
parser.add_argument("--model", default="qwen3")
args = parser.parse_args()

def answer(prompt, use_structured_output=False):
  format = None
  if use_structured_output:
    format = Answer.model_json_schema()
  response = ollama.chat(
      model=args.model,
      messages=[
        {"role":"user", "content":args.prompt}
      ],
      options={"temperature":0, "seed":0},
      format=format)
  if use_structured_output:
    return Answer.model_validate_json(response.message.content)
  return response.message.content

print(answer(args.prompt, False))
print("*******")
print(answer(args.prompt, True))
$ ./yesno.py 
<think>
Okay, the user is asking if the sky is blue. I need to explain why the sky appears blue. Let me start by recalling the basic physics behind this. I remember that sunlight is white, but it's made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the molecules in the air scatter the light. 

Wait, which part of the spectrum is scattered more? Oh right, Rayleigh scattering. Shorter wavelengths like blue and violet are scattered more than longer ones like red or yellow. But why do we see blue instead of violet? Maybe because the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere.

So the scattered blue light comes from all directions, making the sky appear blue. But during sunrise or sunset, the light has to pass through more atmosphere, so the blue light is scattered away, leaving the longer wavelengths like red and orange. That's why the sky turns red or orange during those times.

I should also mention that the actual color of the sky can vary depending on weather and pollution. For example, on hazy days, the sky might look more gray or white because the particles in the air scatter light differently. Also, in space, without an atmosphere, the sky would appear black because there's no scattering of light.

Wait, did I cover all the key points? Let me check. The main reason is Rayleigh scattering, the role of different wavelengths, the dominance of blue over violet, and the exceptions like sunrise/sunset and atmospheric conditions. Yeah, that should cover it. I should present this in a clear, conversational way without getting too technical. Make sure to explain the concepts simply so it's easy to understand.
</think>

Yes, the sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a simple explanation:

1. **Sunlight and Colors**: Sunlight is white, but it consists of all colors of the visible spectrum (red, orange, yellow, green, blue, indigo, violet). Each color has a different wavelength.

2. **Atmospheric Scattering**: When sunlight enters Earth's atmosphere, it interacts with gas molecules and small particles. Shorter wavelengths (like blue and violet) are scattered more than longer wavelengths (like red or yellow) by these particles. This is called **Rayleigh scattering**.

3. **Why Blue, Not Violet**: While violet light is scattered even more than blue, the sun emits less violet light, and our eyes are more sensitive to blue. Additionally, some violet light is absorbed by the upper atmosphere. This combination makes the sky appear **blue** to us.

4. **Exceptions**: 
   - **Sunrise/Sunset**: During these times, sunlight travels through more atmosphere, scattering out the blue light and leaving reds and oranges to dominate.
   - **Pollution/Weather**: Hazy or polluted air can scatter light differently, making the sky appear gray or white.

5. **In Space**: Without an atmosphere, the sky would be black because there are no molecules to scatter sunlight.

So, the blue sky is a result of how Earth's atmosphere interacts with sunlight! 🌈
*******
response: True
thinking: Okay, the user is asking if the sky is blue. I know that the sky appears blue due to Rayleigh scattering. Let me recall how that works. Sunlight is made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the shorter wavelengths (like blue and violet) scatter more than the longer ones (like red and yellow). But why do we see blue instead of violet? Well, the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere. So the scattered blue light fills the sky, making it appear blue. However, during sunrise or sunset, the light has to pass through more atmosphere, scattering the blue light out of our line of sight, which is why the sky turns red or orange. I should also mention that the sky can appear different colors under various conditions, like pollution or weather. But the main reason is Rayleigh scattering. Let me make sure I'm not missing any other factors. Maybe mention that the color can vary depending on the time of day and atmospheric conditions. Yeah, that's right. So the answer is yes, the sky is blue, but with some nuances.
reasoning: The sky appears blue due to Rayleigh scattering, where shorter wavelengths of light (blue and violet) are scattered more by the atmosphere. The sun emits more blue light, and our eyes are more sensitive to it, making the sky appear blue. During sunrise or sunset, the light travels a longer path, scattering blue light out of our view, resulting in red or orange hues. The color can also vary with atmospheric conditions.
<!-- gh-comment-id:2847524492 --> @rick-github commented on GitHub (May 2, 2025): What issues are you seeing with structured outputs and reasoning models? I did a quick check and got structured output from qwen3. Admittedly a simple test and maybe not the use case your are shooting for. ```python #!/usr/bin/env python3 from pydantic import BaseModel import ollama import argparse class Answer(BaseModel): response: bool thinking: str reasoning: str def __str__(self): return f"response: {self.response}\nthinking: {self.thinking}\nreasoning: {self.reasoning}" parser = argparse.ArgumentParser() parser.add_argument("--prompt", default="Is the sky blue?") parser.add_argument("--model", default="qwen3") args = parser.parse_args() def answer(prompt, use_structured_output=False): format = None if use_structured_output: format = Answer.model_json_schema() response = ollama.chat( model=args.model, messages=[ {"role":"user", "content":args.prompt} ], options={"temperature":0, "seed":0}, format=format) if use_structured_output: return Answer.model_validate_json(response.message.content) return response.message.content print(answer(args.prompt, False)) print("*******") print(answer(args.prompt, True)) ``` ```console $ ./yesno.py <think> Okay, the user is asking if the sky is blue. I need to explain why the sky appears blue. Let me start by recalling the basic physics behind this. I remember that sunlight is white, but it's made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the molecules in the air scatter the light. Wait, which part of the spectrum is scattered more? Oh right, Rayleigh scattering. Shorter wavelengths like blue and violet are scattered more than longer ones like red or yellow. But why do we see blue instead of violet? Maybe because the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere. So the scattered blue light comes from all directions, making the sky appear blue. But during sunrise or sunset, the light has to pass through more atmosphere, so the blue light is scattered away, leaving the longer wavelengths like red and orange. That's why the sky turns red or orange during those times. I should also mention that the actual color of the sky can vary depending on weather and pollution. For example, on hazy days, the sky might look more gray or white because the particles in the air scatter light differently. Also, in space, without an atmosphere, the sky would appear black because there's no scattering of light. Wait, did I cover all the key points? Let me check. The main reason is Rayleigh scattering, the role of different wavelengths, the dominance of blue over violet, and the exceptions like sunrise/sunset and atmospheric conditions. Yeah, that should cover it. I should present this in a clear, conversational way without getting too technical. Make sure to explain the concepts simply so it's easy to understand. </think> Yes, the sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a simple explanation: 1. **Sunlight and Colors**: Sunlight is white, but it consists of all colors of the visible spectrum (red, orange, yellow, green, blue, indigo, violet). Each color has a different wavelength. 2. **Atmospheric Scattering**: When sunlight enters Earth's atmosphere, it interacts with gas molecules and small particles. Shorter wavelengths (like blue and violet) are scattered more than longer wavelengths (like red or yellow) by these particles. This is called **Rayleigh scattering**. 3. **Why Blue, Not Violet**: While violet light is scattered even more than blue, the sun emits less violet light, and our eyes are more sensitive to blue. Additionally, some violet light is absorbed by the upper atmosphere. This combination makes the sky appear **blue** to us. 4. **Exceptions**: - **Sunrise/Sunset**: During these times, sunlight travels through more atmosphere, scattering out the blue light and leaving reds and oranges to dominate. - **Pollution/Weather**: Hazy or polluted air can scatter light differently, making the sky appear gray or white. 5. **In Space**: Without an atmosphere, the sky would be black because there are no molecules to scatter sunlight. So, the blue sky is a result of how Earth's atmosphere interacts with sunlight! 🌈 ******* response: True thinking: Okay, the user is asking if the sky is blue. I know that the sky appears blue due to Rayleigh scattering. Let me recall how that works. Sunlight is made up of different colors, each with different wavelengths. When sunlight enters Earth's atmosphere, the shorter wavelengths (like blue and violet) scatter more than the longer ones (like red and yellow). But why do we see blue instead of violet? Well, the sun emits more blue light than violet, and our eyes are more sensitive to blue. Also, some of the violet light is absorbed by the upper atmosphere. So the scattered blue light fills the sky, making it appear blue. However, during sunrise or sunset, the light has to pass through more atmosphere, scattering the blue light out of our line of sight, which is why the sky turns red or orange. I should also mention that the sky can appear different colors under various conditions, like pollution or weather. But the main reason is Rayleigh scattering. Let me make sure I'm not missing any other factors. Maybe mention that the color can vary depending on the time of day and atmospheric conditions. Yeah, that's right. So the answer is yes, the sky is blue, but with some nuances. reasoning: The sky appears blue due to Rayleigh scattering, where shorter wavelengths of light (blue and violet) are scattered more by the atmosphere. The sun emits more blue light, and our eyes are more sensitive to it, making the sky appear blue. During sunrise or sunset, the light travels a longer path, scattering blue light out of our view, resulting in red or orange hues. The color can also vary with atmospheric conditions. ```
Author
Owner

@nicolab28 commented on GitHub (May 3, 2025):

I use ai sdk, and I add a string "thinking" in my json.
With the model qwen3 4b 8b 30b and 32b it works, I can see its thinking
but the model 14b returns nothing at all, an empty json.
on the other hand I have $schema, and I've even already had the whole schema in the response.

Here are a few examples:

modelFullName ollama/qwen3:4b-q8_0
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "L'utilisateur demande d'apporter une bière. Cependant, je n'ai pas d'appareil pour cela. Je dois informer qu'il ne peut pas faire cela.",
"message": "Désolé, je ne peux pas apporter de bière. Je n'ai pas d'appareil pour cela.",
"toolCalls": []
}

modelFullName ollama/qwen3:8b
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "L'utilisateur demande à avoir une bière. Je dois vérifier si je peux effectuer cette action. Je n'ai pas d'outil pour apporter une bière. Je dois informer l'utilisateur que je ne peux pas répondre à cette demande et préciser que ma mise à jour est nécessaire.",
"message": "Désolé, je ne peux pas apporter de bière. Une mise à jour de mon système est nécessaire pour effectuer cette action.",
"toolCalls": [],
"$schema": "http://json-schema.org/draft-07/schema#"
}

modelFullName ollama/qwen3:14b
prompt: tu pourrais m'apporter une bière ?
lastObject {}

modelFullName ollama/qwen3:30b-a3b-q4_K_M
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "L'utilisateur demande une bière. Je n'ai pas d'outil pour apporter une bière. Je dois lui dire que je ne peux pas le faire.",
"message": "Je ne peux pas vous apporter une bière. Je suis un assistant vocal.",
"toolCalls": [],
"$schema": "http://json-schema.org/draft-07/schema#"
}

modelFullName ollama/qwen3:32b-q4_K_M
prompt: tu pourrais m'apporter une bière ?
lastObject {
"thinking": "Je ne peux pas apporter de bière à Nicolas, car je n'ai pas de fonctionnalité permettant de manipuler des objets physiques. Je peux suggérer qu'il aille en chercher une lui-même.",
"message": "Désolé, je ne peux pas apporter de bière. Tu devras aller en chercher une toi-même.",
"toolCalls": []
}

modelFullName ollama/qwen3:4b-q8_0
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois utiliser l'outil 'onoff' pour allumer la cuisine et ensuite 'wait' pour attendre 10 secondes.",
"message": "La lumière de la cuisine est allumée. Elle restera allumée pendant 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
}
]
}

modelFullName ollama/qwen3:8b
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer la lumière, puis attendre 10 secondes avant de l'éteindre.",
"message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
},
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": false
}
}
]
}

modelFullName ollama/qwen3:14b
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {}

modelFullName ollama/qwen3:30b-a3b-q4_K_M
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Pour cela, je dois d'abord allumer l'appareil CUISINE en utilisant le tool onoff. Ensuite, je dois attendre 10 secondes avec le tool wait. Enfin, je dois éteindre l'appareil CUISINE avec le tool onoff. Je vais donc appeler les outils onoff, wait, puis onoff à nouveau.",
"message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
},
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": false
}
}
]
}

modelFullName ollama/qwen3:32b-q4_K_M
prompt: allume la lumière de la cuisine pendant 10 secondes
lastObject {
"thinking": "L'utilisateur souhaite allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer l'appareil, puis l'éteindre après le délai.",
"message": "Je vais allumer la lumière de la cuisine et l'éteindre après 10 secondes.",
"toolCalls": [
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": true
}
},
{
"tool": "wait",
"args": {
"sec": 10
}
},
{
"tool": "onoff",
"args": {
"dev": "CUISINE",
"on": false
}
}
]
}

<!-- gh-comment-id:2848480509 --> @nicolab28 commented on GitHub (May 3, 2025): I use ai sdk, and I add a string "thinking" in my json. With the model qwen3 4b 8b 30b and 32b it works, I can see its thinking but the model 14b returns nothing at all, an empty json. on the other hand I have $schema, and I've even already had the whole schema in the response. Here are a few examples: modelFullName ollama/qwen3:4b-q8_0 prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "L'utilisateur demande d'apporter une bière. Cependant, je n'ai pas d'appareil pour cela. Je dois informer qu'il ne peut pas faire cela.", "message": "Désolé, je ne peux pas apporter de bière. Je n'ai pas d'appareil pour cela.", "toolCalls": [] } modelFullName ollama/qwen3:8b prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "L'utilisateur demande à avoir une bière. Je dois vérifier si je peux effectuer cette action. Je n'ai pas d'outil pour apporter une bière. Je dois informer l'utilisateur que je ne peux pas répondre à cette demande et préciser que ma mise à jour est nécessaire.", "message": "Désolé, je ne peux pas apporter de bière. Une mise à jour de mon système est nécessaire pour effectuer cette action.", "toolCalls": [], "$schema": "http://json-schema.org/draft-07/schema#" } modelFullName ollama/qwen3:14b prompt: tu pourrais m'apporter une bière ? lastObject {} modelFullName ollama/qwen3:30b-a3b-q4_K_M prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "L'utilisateur demande une bière. Je n'ai pas d'outil pour apporter une bière. Je dois lui dire que je ne peux pas le faire.", "message": "Je ne peux pas vous apporter une bière. Je suis un assistant vocal.", "toolCalls": [], "$schema": "http://json-schema.org/draft-07/schema#" } modelFullName ollama/qwen3:32b-q4_K_M prompt: tu pourrais m'apporter une bière ? lastObject { "thinking": "Je ne peux pas apporter de bière à Nicolas, car je n'ai pas de fonctionnalité permettant de manipuler des objets physiques. Je peux suggérer qu'il aille en chercher une lui-même.", "message": "Désolé, je ne peux pas apporter de bière. Tu devras aller en chercher une toi-même.", "toolCalls": [] } modelFullName ollama/qwen3:4b-q8_0 prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois utiliser l'outil 'onoff' pour allumer la cuisine et ensuite 'wait' pour attendre 10 secondes.", "message": "La lumière de la cuisine est allumée. Elle restera allumée pendant 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } } ] } modelFullName ollama/qwen3:8b prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer la lumière, puis attendre 10 secondes avant de l'éteindre.", "message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } }, { "tool": "onoff", "args": { "dev": "CUISINE", "on": false } } ] } modelFullName ollama/qwen3:14b prompt: allume la lumière de la cuisine pendant 10 secondes lastObject {} modelFullName ollama/qwen3:30b-a3b-q4_K_M prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur demande d'allumer la lumière de la cuisine pendant 10 secondes. Pour cela, je dois d'abord allumer l'appareil CUISINE en utilisant le tool onoff. Ensuite, je dois attendre 10 secondes avec le tool wait. Enfin, je dois éteindre l'appareil CUISINE avec le tool onoff. Je vais donc appeler les outils onoff, wait, puis onoff à nouveau.", "message": "Je vais allumer la lumière de la cuisine pendant 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } }, { "tool": "onoff", "args": { "dev": "CUISINE", "on": false } } ] } modelFullName ollama/qwen3:32b-q4_K_M prompt: allume la lumière de la cuisine pendant 10 secondes lastObject { "thinking": "L'utilisateur souhaite allumer la lumière de la cuisine pendant 10 secondes. Je dois d'abord allumer l'appareil, puis l'éteindre après le délai.", "message": "Je vais allumer la lumière de la cuisine et l'éteindre après 10 secondes.", "toolCalls": [ { "tool": "onoff", "args": { "dev": "CUISINE", "on": true } }, { "tool": "wait", "args": { "sec": 10 } }, { "tool": "onoff", "args": { "dev": "CUISINE", "on": false } } ] }
Author
Owner

@rick-github commented on GitHub (May 3, 2025):

If other models work and qwen3:14b doesn't, that would seem to be a problem with the model.

<!-- gh-comment-id:2848653549 --> @rick-github commented on GitHub (May 3, 2025): If other models work and qwen3:14b doesn't, that would seem to be a problem with the model.
Author
Owner

@functorism commented on GitHub (May 16, 2025):

@rick-github Can you please read the original issue again.

@bjoernhommel You're absolutely right - there indeed needs to exist more flexibility/control over the logit biasing for structured output when reasoning models are used.

The entire point here is that using a reasoning model with structured output should not require full omission of reasoning. Ollama needs to facilitate the co-existence of reasoning + structured output.

<!-- gh-comment-id:2886721654 --> @functorism commented on GitHub (May 16, 2025): @rick-github Can you please read the original issue again. @bjoernhommel You're absolutely right - there indeed needs to exist more flexibility/control over the logit biasing for structured output when reasoning models are used. The entire point here is that using a reasoning model with structured output should not **require full omission of reasoning**. Ollama needs to facilitate the co-existence of reasoning + structured output.
Author
Owner

@functorism commented on GitHub (May 16, 2025):

Image
https://github.com/ollama/ollama/pull/10584 - WIP thinking API support

<!-- gh-comment-id:2886824904 --> @functorism commented on GitHub (May 16, 2025): ![Image](https://github.com/user-attachments/assets/f1add1e1-31b8-4945-ae46-057eadb6dee9) https://github.com/ollama/ollama/pull/10584 - WIP thinking API support
Author
Owner

@ParthSareen commented on GitHub (May 16, 2025):

Once the thinking API lands, I can take a look and see how to do thinking generation -> applying the masks only for content after. In an ideal world we can have "thinking" enabled and then do structured outputs after that portion. Although the example @rick-github gave is a good way to go about it atm.

<!-- gh-comment-id:2887367893 --> @ParthSareen commented on GitHub (May 16, 2025): Once the thinking API lands, I can take a look and see how to do thinking generation -> applying the masks only for content after. In an ideal world we can have "thinking" enabled and then do structured outputs after that portion. Although the example @rick-github gave is a good way to go about it atm.
Author
Owner

@iamFIREcracker commented on GitHub (Jun 27, 2025):

Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by @rick-github? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption.

<!-- gh-comment-id:3013162338 --> @iamFIREcracker commented on GitHub (Jun 27, 2025): Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by @rick-github? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard <think> tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the <think> tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption.
Author
Owner

@morrisalp commented on GitHub (Aug 17, 2025):

Too bad this is not supported now! I am migrating from ollama to llama.cpp specifically because of this issue.
Impressionistically, I found using a reasoning tag in JSON output performs much worse.
Supporting GBNF would also solve this issue but it seems like it's not being considered (#6237 #11911 #5899) for reasons that I don't fully understand.

<!-- gh-comment-id:3194548388 --> @morrisalp commented on GitHub (Aug 17, 2025): Too bad this is not supported now! I am migrating from ollama to llama.cpp specifically because of this issue. Impressionistically, I found using a reasoning tag in JSON output performs much worse. Supporting GBNF would also solve this issue but it seems like it's not being considered (#6237 #11911 #5899) for reasons that I don't fully understand.
Author
Owner

@sjsone commented on GitHub (Aug 23, 2025):

@morrisalp I had the same issue and created a MR https://github.com/ollama/ollama/pull/12055
Please test it and give the MR some props

<!-- gh-comment-id:3217382887 --> @sjsone commented on GitHub (Aug 23, 2025): @morrisalp I had the same issue and created a MR https://github.com/ollama/ollama/pull/12055 Please test it and give the MR some props
Author
Owner

@guilhermecxe commented on GitHub (Sep 25, 2025):

Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by @rick-github? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption.

The paper that showed the benefits of CoT (https://arxiv.org/abs/2201.11903) concluded that exposing the reasoning after the answer did't was as effective as reasoning before answer, so I think that if we ensure the generation of thinking before the answer of interest, we can get similar results, yes.

<!-- gh-comment-id:3331327169 --> @guilhermecxe commented on GitHub (Sep 25, 2025): > Assuming we could indeed get an LLM to output both thinking and structured output, would there be any advantages to the solution proposed by [@rick-github](https://github.com/rick-github)? I mean, anyone has any idea if reasoning LLMs will produce better chain of thoughts if done inside the standard tags vs using structured output that includes one slot for the reasoning? I would assume the chain of thought inside the tag to be qualitatively better as the model got trained on it, but I have no data to back up this assumption. The paper that showed the benefits of CoT (https://arxiv.org/abs/2201.11903) concluded that exposing the reasoning after the answer did't was as effective as reasoning before answer, so I think that if we ensure the generation of thinking before the answer of interest, we can get similar results, yes.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68993