[GH-ISSUE #15386] Gemma4:31b Structured Output inconsistent with Reasoning / Thinking #71900

Open
opened 2026-05-05 02:53:55 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @martinpintar-pixel on GitHub (Apr 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15386

What is the issue?

ISSUE: model makes a decision in reasoning that is not reflected in final JSON output.

Happens on edge cases in my system. Never noticed this when running models like gpt-oss:120b. I don't know if this is a problem with Ollama or the model itself.

I am running gemma4:31b to make a decision in an automated ticketing system. The decisions are 'SEND' and 'SKIP'. I enforce the values with:

class Decision(BaseModel): decision: Literal['SEND', 'SKIP']

As described in docs I use the format= to pass this structure to the model. I have not passed the think= to the chat function.

REASONING EXAMPLE (only the ending):

Conclusion: All criteria for SEND are met.

Self-Correction/Double Check:
Is there any ambiguity?

  • Ticket: "Ne pridem do dokumentov" (I can't get to the documents).

  • Problem: "Uporabnik nima ustreznega dostopa (read / modify)".

  • Solution: Use FIM to add yourself to the group.

  • Problem Details: K:\ is covered.

  • Everything aligns.

    Final Decision: SEND.

OUTPUT:

Decision(decision='SKIP')

Any info on this inconsistency would be very helpful!

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.20.2

Originally created by @martinpintar-pixel on GitHub (Apr 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15386 ### What is the issue? ISSUE: model makes a decision in reasoning that is not reflected in final JSON output. Happens on edge cases in my system. Never noticed this when running models like gpt-oss:120b. I don't know if this is a problem with Ollama or the model itself. I am running `gemma4:31b` to make a decision in an automated ticketing system. The decisions are 'SEND' and 'SKIP'. I enforce the values with: `class Decision(BaseModel): decision: Literal['SEND', 'SKIP']` As described in docs I use the `format=` to pass this structure to the model. I have not passed the `think=` to the `chat` function. REASONING EXAMPLE (only the ending): > Conclusion: All criteria for SEND are met. > > *Self-Correction/Double Check:* > Is there any ambiguity? > - Ticket: "Ne pridem do dokumentov" (I can't get to the documents). > - Problem: "Uporabnik nima ustreznega dostopa (read / modify)". > - Solution: Use FIM to add yourself to the group. > - Problem Details: `K:\` is covered. > - Everything aligns. > > Final Decision: SEND. OUTPUT: > Decision(decision='SKIP') Any info on this inconsistency would be very helpful! ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.20.2
GiteaMirror added the bug label 2026-05-05 02:53:55 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

Does your prompt instruct the model that JSON output is required?

<!-- gh-comment-id:4198223881 --> @rick-github commented on GitHub (Apr 7, 2026): Does your prompt instruct the model that JSON output is required?
Author
Owner

@martinpintar-pixel commented on GitHub (Apr 7, 2026):

No, my system prompt does not mention any JSON outputs. It only instructs the LLM to respond with either 'SEND' or 'SKIP'. I thought the format= handles all the formatting instructions for the model.

<!-- gh-comment-id:4198260548 --> @martinpintar-pixel commented on GitHub (Apr 7, 2026): No, my system prompt does not mention any JSON outputs. It only instructs the LLM to respond with either 'SEND' or 'SKIP'. I thought the `format=` handles all the formatting instructions for the model.
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

format is just a constraint on the output generated during the non-thinking phase, the model is not aware during the thinking phase that its output is going to be forced to comply to a pattern. So it could be that the tokens that are being generated are expected to follow a more conversational approach, since that's the usual way a model delivers results. Suddenly when it comes time to fill out the content part of the response, the model finds it can't generates the tokens it thought it was going to, so the response may diverge from what was intended. Informing the model that JSON output is expected may allow it to prepare the output that more readily conforms to the format. Ideally you would include the schema in the instructions.

<!-- gh-comment-id:4198298526 --> @rick-github commented on GitHub (Apr 7, 2026): `format` is just a constraint on the output generated during the non-thinking phase, the model is not aware during the thinking phase that its output is going to be forced to comply to a pattern. So it could be that the tokens that are being generated are expected to follow a more conversational approach, since that's the usual way a model delivers results. Suddenly when it comes time to fill out the `content` part of the response, the model finds it can't generates the tokens it thought it was going to, so the response may diverge from what was intended. Informing the model that JSON output is expected may allow it to prepare the output that more readily conforms to the `format`. Ideally you would include the schema in the instructions.
Author
Owner

@martinpintar-pixel commented on GitHub (Apr 8, 2026):

Thank you for your response. I followed your advice but unfortunately to no visible improvement. I tested further and expanded my JSON schema to include also the reasoning:
class Decision(BaseModel): decision: Literal['SEND', 'SKIP'] reasoning: str

The reasoning I receive is completely different to the thinking it produces. As it was another model making the final decision.

Once I remove the format= the output finally becomes consistent with the thinking.

Do you maybe have more info on this issue? There are also other issues describing difficulties with thinking in gemma4 #15416

<!-- gh-comment-id:4206145459 --> @martinpintar-pixel commented on GitHub (Apr 8, 2026): Thank you for your response. I followed your advice but unfortunately to no visible improvement. I tested further and expanded my JSON schema to include also the reasoning: `class Decision(BaseModel): decision: Literal['SEND', 'SKIP'] reasoning: str` The `reasoning` I receive is completely different to the thinking it produces. As it was another model making the final decision. Once I remove the `format=` the output finally becomes consistent with the `thinking`. Do you maybe have more info on this issue? There are also other issues describing difficulties with thinking in gemma4 #15416
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

Can you provide a prompt that results in the output being inconsistent with the reasoning?

<!-- gh-comment-id:4206181740 --> @rick-github commented on GitHub (Apr 8, 2026): Can you provide a prompt that results in the output being inconsistent with the reasoning?
Author
Owner

@martinpintar-pixel commented on GitHub (Apr 8, 2026):

I am attaching system and user prompts, thinking and final output (decision).

system_prompt.txt

user_prompt.txt

thinking.txt

decision.txt

<!-- gh-comment-id:4206946052 --> @martinpintar-pixel commented on GitHub (Apr 8, 2026): I am attaching system and user prompts, thinking and final output (decision). [system_prompt.txt](https://github.com/user-attachments/files/26571078/system_prompt.txt) [user_prompt.txt](https://github.com/user-attachments/files/26571081/user_prompt.txt) [thinking.txt](https://github.com/user-attachments/files/26571079/thinking.txt) [decision.txt](https://github.com/user-attachments/files/26571086/decision.txt)
Author
Owner

@dwohlfahrt commented on GitHub (Apr 13, 2026):

i'm seeing this exact behavior. i don't believe this is an issue with ollama as i'm seeing the same behavior when running via llama.cpp directly. hoping it's a chat template issue... if it's an inherent issue with the model itself, i'm going to be very sad.

<!-- gh-comment-id:4239433702 --> @dwohlfahrt commented on GitHub (Apr 13, 2026): i'm seeing this exact behavior. i don't believe this is an issue with ollama as i'm seeing the same behavior when running via llama.cpp directly. hoping it's a chat template issue... if it's an inherent issue with the model itself, i'm going to be very sad.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15386
Analyzed: 2026-04-18T18:22:24.751502

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274309954 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15386 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15386 **Analyzed**: 2026-04-18T18:22:24.751502 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71900