[GH-ISSUE #14196] Delayed Constrained Decoding for Thinking with JSON/Structured Outputs #71311

Open
opened 2026-05-05 01:11:10 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ardafincan on GitHub (Feb 10, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14196

Currently when using models with thinking ability alongside the format: parameter, the thinking process is entirely suppressed.

So I would like to propose a "Delayed Constrained Decoding" mechanism.

The inference engine should allow model to generate tokens unconstrained until it reaches the </think> token.
Only after that JSON/Schema should be forced on the remaining steps.

I am thinking of working on this one, what do you think?

Originally created by @ardafincan on GitHub (Feb 10, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14196 Currently when using models with `thinking` ability alongside the `format:` parameter, the thinking process is entirely suppressed. So I would like to propose a *"Delayed Constrained Decoding"* mechanism. The inference engine should allow model to generate tokens unconstrained until it reaches the `</think>` token. Only after that JSON/Schema should be forced on the remaining steps. I am thinking of working on this one, what do you think?
GiteaMirror added the feature request label 2026-05-05 01:11:10 -05:00
Author
Owner

@ardafincan commented on GitHub (Feb 14, 2026):

I am correcting this: the thinking ability gets suppressed when the request is sent to /api/generate someone solved the issue in /api/chat but it remains in generate endpoint. I am working on this.

<!-- gh-comment-id:3902006352 --> @ardafincan commented on GitHub (Feb 14, 2026): I am correcting this: the thinking ability gets suppressed when the request is sent to `/api/generate` someone solved the issue in `/api/chat` but it remains in generate endpoint. I am working on this.
Author
Owner

@rick-github commented on GitHub (Feb 17, 2026):

FYI #14288

<!-- gh-comment-id:3912042578 --> @rick-github commented on GitHub (Feb 17, 2026): FYI #14288
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71311