[PR #15678] server: apply format when think=false with thinking-capable parser #25747

Open
opened 2026-04-19 18:25:35 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15678
Author: @ParthSareen
Created: 4/18/2026
Status: 🔄 Open

Base: mainHead: parth/fix-format-think-false


📝 Commits (1)

  • 815f043 server: apply format when think=false with thinking-capable parser

📊 Changes

2 files changed (+133 additions, -2 deletions)

View changed files

📝 server/routes.go (+7 -2)
📝 server/routes_generate_test.go (+126 -0)

📄 Description

Summary

  • Fixes #15260. When a model uses a thinking-capable builtin parser (e.g. gemma4, qwen3.5) and the request sets think: false together with format, the schema was silently dropped and the model returned unconstrained plain text.
  • Root cause: format was deferred for all thinking-capable parsers and only re-enabled after a thinking→content transition. With think=false, no thinking span is emitted, so the transition never fires and the format is never applied.
  • Fix: gate the deferral on req.Think — if thinking is off for this request, apply the format from the first token instead. think=true and think omitted keep the existing two-request flow.

Test plan

  • New unit test TestChatFormatWithThinkFalse in server/routes_generate_test.go creates a model with the gemma4 parser, sends a think=false + format chat request, and asserts the first (and only) completion call receives the format. Verified it fails on main (got "") and passes with the patch.
  • go test ./server/ -run 'TestChat|TestGenerate' passes.
  • End-to-end live test with qwen3.5:latest:
    • System 0.21.0 (unpatched), think=false + schema → plain text (bug reproduced).
    • Patched build, think=false + schema → {"emotion":"neutral","response_text":"..."}, thinking empty.
    • Patched build, think=true + schema → valid JSON with thinking populated (unchanged).
    • Patched build, think omitted + schema → valid JSON with thinking populated (unchanged).

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15678 **Author:** [@ParthSareen](https://github.com/ParthSareen) **Created:** 4/18/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `parth/fix-format-think-false` --- ### 📝 Commits (1) - [`815f043`](https://github.com/ollama/ollama/commit/815f043695e9725d7be9c5c4d7ec79633d0f55d1) server: apply format when think=false with thinking-capable parser ### 📊 Changes **2 files changed** (+133 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `server/routes.go` (+7 -2) 📝 `server/routes_generate_test.go` (+126 -0) </details> ### 📄 Description ## Summary - Fixes #15260. When a model uses a thinking-capable builtin parser (e.g. `gemma4`, `qwen3.5`) and the request sets `think: false` together with `format`, the schema was silently dropped and the model returned unconstrained plain text. - Root cause: format was deferred for all thinking-capable parsers and only re-enabled after a thinking→content transition. With `think=false`, no thinking span is emitted, so the transition never fires and the format is never applied. - Fix: gate the deferral on `req.Think` — if thinking is off for this request, apply the format from the first token instead. `think=true` and `think` omitted keep the existing two-request flow. ## Test plan - [x] New unit test `TestChatFormatWithThinkFalse` in `server/routes_generate_test.go` creates a model with the gemma4 parser, sends a `think=false` + `format` chat request, and asserts the first (and only) completion call receives the format. Verified it fails on `main` (`got ""`) and passes with the patch. - [x] `go test ./server/ -run 'TestChat|TestGenerate'` passes. - [x] End-to-end live test with `qwen3.5:latest`: - System 0.21.0 (unpatched), `think=false` + schema → plain text (bug reproduced). - Patched build, `think=false` + schema → `{"emotion":"neutral","response_text":"..."}`, `thinking` empty. - Patched build, `think=true` + schema → valid JSON with thinking populated (unchanged). - Patched build, `think` omitted + schema → valid JSON with thinking populated (unchanged). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:25:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25747