[GH-ISSUE #10929] Ollama produces invalid JSON when using thinking mode with structured output #7192

Open
opened 2026-04-12 19:11:25 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @roosephu on GitHub (May 31, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10929

Originally assigned to: @drifkin on GitHub.

What is the issue?

Here is the script to reproduce

curl -X POST http://localhost:11434/api/generate \
  -d '{
    "model": "qwen3:0.6b", 
    "prompt": "Generate JSON with summary for: # Test Post\nThis is about Python programming.",
    "stream": false,
    "think": true,
    "format": {
      "type": "object",
      "properties": {
        "summary": {"type": "string"}
      },
      "required": ["summary"]
    }
  }'

Output:

{"model":"qwen3:0.6b","created_at":"2025-05-31T11:12:37.379051Z","response":"{\"{\"summary\": \"This is a test post about Python programming. It is a demonstration of how to create and use Python scripts to solve programming problems. The post provides information on how to write and test Python code, including syntax, variables, and functions. It is for educational purposes and may be used to teach Python to students or developers. The post is not intended to be used for any specific application or purpose. The post provides a general overview of Python programming and its use in various fields. The summary is concise and includes key points for a user to understand. The post is in English and is for a general audience.\"}\n","done":true,"done_reason":"stop","context":[151644,872,198,31115,4718,448,12126,369,25,671,3393,3877,198,1986,374,911,13027,15473,13,608,26865,151645,198,151644,77091,198,4913,1708,788,330,1986,374,264,1273,1736,911,13027,15473,13,1084,374,264,29716,315,1246,311,1855,323,990,13027,19502,311,11625,15473,5322,13,576,1736,5707,1995,389,1246,311,3270,323,1273,13027,2038,11,2670,19482,11,7332,11,323,5746,13,1084,374,369,16229,9895,323,1231,387,1483,311,4538,13027,311,4143,476,13402,13,576,1736,374,537,10602,311,387,1483,369,894,3151,3766,476,7428,13,576,1736,5707,264,4586,23251,315,13027,15473,323,1181,990,304,5257,5043,13,576,12126,374,63594,323,5646,1376,3501,369,264,1196,311,3535,13,576,1736,374,304,6364,323,374,369,264,4586,10650,1189,532],"total_duration":1114459042,"load_duration":27570000,"prompt_eval_count":26,"prompt_eval_duration":28529667,"eval_count":127,"eval_duration":1057852250}

Note {" occurs twice at the beginning of response.

If the thinking mode is turned off, response is a valid JSON:


curl -X POST http://localhost:11434/api/generate \
  -d '{
    "model": "qwen3:0.6b", 
    "prompt": "Generate JSON with summary for: # Test Post\nThis is about Python programming.",
    "stream": false,
    "think": false,
    "format": {
      "type": "object",
      "properties": {
        "summary": {"type": "string"}
      },
      "required": ["summary"]
    }
  }'
{"model":"qwen3:0.6b","created_at":"2025-05-31T11:13:24.352585Z","response":"{\n  \"summary\": \"This test post is about Python programming. It includes code snippets that demonstrate basic concepts such as variables, functions, and data types.\"\n}","done":true,"done_reason":"stop","context":[151644,872,198,31115,4718,448,12126,369,25,671,3393,3877,198,1986,374,911,13027,15473,13,608,2152,5854,766,151645,198,151644,77091,198,151667,271,151668,271,515,220,330,1708,788,330,1986,1273,1736,374,911,13027,15473,13,1084,5646,2038,68642,429,19869,6770,18940,1741,438,7332,11,5746,11,323,821,4494,10040,92],"total_duration":374914458,"load_duration":29720833,"prompt_eval_count":32,"prompt_eval_duration":44193625,"eval_count":34,"eval_duration":300610708}

(It seems that when structured output is requested, thinking process is omitted, but that's another topic.)

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.9.0

Originally created by @roosephu on GitHub (May 31, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10929 Originally assigned to: @drifkin on GitHub. ### What is the issue? Here is the script to reproduce ```sh curl -X POST http://localhost:11434/api/generate \ -d '{ "model": "qwen3:0.6b", "prompt": "Generate JSON with summary for: # Test Post\nThis is about Python programming.", "stream": false, "think": true, "format": { "type": "object", "properties": { "summary": {"type": "string"} }, "required": ["summary"] } }' ``` Output: ```json {"model":"qwen3:0.6b","created_at":"2025-05-31T11:12:37.379051Z","response":"{\"{\"summary\": \"This is a test post about Python programming. It is a demonstration of how to create and use Python scripts to solve programming problems. The post provides information on how to write and test Python code, including syntax, variables, and functions. It is for educational purposes and may be used to teach Python to students or developers. The post is not intended to be used for any specific application or purpose. The post provides a general overview of Python programming and its use in various fields. The summary is concise and includes key points for a user to understand. The post is in English and is for a general audience.\"}\n","done":true,"done_reason":"stop","context":[151644,872,198,31115,4718,448,12126,369,25,671,3393,3877,198,1986,374,911,13027,15473,13,608,26865,151645,198,151644,77091,198,4913,1708,788,330,1986,374,264,1273,1736,911,13027,15473,13,1084,374,264,29716,315,1246,311,1855,323,990,13027,19502,311,11625,15473,5322,13,576,1736,5707,1995,389,1246,311,3270,323,1273,13027,2038,11,2670,19482,11,7332,11,323,5746,13,1084,374,369,16229,9895,323,1231,387,1483,311,4538,13027,311,4143,476,13402,13,576,1736,374,537,10602,311,387,1483,369,894,3151,3766,476,7428,13,576,1736,5707,264,4586,23251,315,13027,15473,323,1181,990,304,5257,5043,13,576,12126,374,63594,323,5646,1376,3501,369,264,1196,311,3535,13,576,1736,374,304,6364,323,374,369,264,4586,10650,1189,532],"total_duration":1114459042,"load_duration":27570000,"prompt_eval_count":26,"prompt_eval_duration":28529667,"eval_count":127,"eval_duration":1057852250} ``` Note `{"` occurs twice at the beginning of `response`. If the thinking mode is turned off, `response` is a valid JSON: ```shell curl -X POST http://localhost:11434/api/generate \ -d '{ "model": "qwen3:0.6b", "prompt": "Generate JSON with summary for: # Test Post\nThis is about Python programming.", "stream": false, "think": false, "format": { "type": "object", "properties": { "summary": {"type": "string"} }, "required": ["summary"] } }' ``` ```json {"model":"qwen3:0.6b","created_at":"2025-05-31T11:13:24.352585Z","response":"{\n \"summary\": \"This test post is about Python programming. It includes code snippets that demonstrate basic concepts such as variables, functions, and data types.\"\n}","done":true,"done_reason":"stop","context":[151644,872,198,31115,4718,448,12126,369,25,671,3393,3877,198,1986,374,911,13027,15473,13,608,2152,5854,766,151645,198,151644,77091,198,151667,271,151668,271,515,220,330,1708,788,330,1986,1273,1736,374,911,13027,15473,13,1084,5646,2038,68642,429,19869,6770,18940,1741,438,7332,11,5746,11,323,821,4494,10040,92],"total_duration":374914458,"load_duration":29720833,"prompt_eval_count":32,"prompt_eval_duration":44193625,"eval_count":34,"eval_duration":300610708} ``` (It seems that when structured output is requested, `thinking` process is omitted, but that's another topic.) ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.9.0
GiteaMirror added the thinkingbug labels 2026-04-12 19:11:26 -05:00
Author
Owner

@mlaihk commented on GitHub (Jun 2, 2025):

I noticed weird performance issue with the Ollama 0.7 and newer (including 0.9.0) with Gemma3 models. I removed OLLAMA_KV_CACHE_TYPE (originally set to q8_0) and the performance of Gemma3 is back (in terms of tops and output accuracy).
Something is not working right with the KV cache quantization and Gemma3 models....

<!-- gh-comment-id:2928556849 --> @mlaihk commented on GitHub (Jun 2, 2025): I noticed weird performance issue with the Ollama 0.7 and newer (including 0.9.0) with Gemma3 models. I removed OLLAMA_KV_CACHE_TYPE (originally set to q8_0) and the performance of Gemma3 is back (in terms of tops and output accuracy). Something is not working right with the KV cache quantization and Gemma3 models....
Author
Owner

@jbcallaghan commented on GitHub (Jul 30, 2025):

Any update to this? I am having the same issue with qwen3 and Ollama 0.9.6

<!-- gh-comment-id:3136753384 --> @jbcallaghan commented on GitHub (Jul 30, 2025): Any update to this? I am having the same issue with qwen3 and Ollama 0.9.6
Author
Owner

@jbcallaghan commented on GitHub (Jul 31, 2025):

I created a rudimentary work around for this with structured output.

from json import JSONDecodeError
from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers.json import JsonOutputParser
from langchain_core.outputs import Generation
from langchain_core.utils.json import parse_json_markdown
from typing import Any 

class CustomJsonOutputParser(JsonOutputParser):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self._buffer = ""
            
    def parse_result(self, result: list[Generation], *, partial: bool = False) -> Any:
        chunk = result[0].text
        self._buffer += chunk

        # Fix malformed prefix if it appears exactly at start
        if self._buffer.startswith('{"{"'):
            self._buffer = '{' + self._buffer[3:]

        try:
            parsed = parse_json_markdown(self._buffer)
            self._buffer = ""  # clear buffer after success
            return parsed
        except JSONDecodeError:
            if partial:
                return None
            raise OutputParserException(f"Invalid json output: {self._buffer}")

structured_chain = model | self._extract_response_metadata | CustomJsonOutputParser() | self._extract_response_answer_fields 
<!-- gh-comment-id:3139130070 --> @jbcallaghan commented on GitHub (Jul 31, 2025): I created a rudimentary work around for this with structured output. ``` from json import JSONDecodeError from langchain_core.exceptions import OutputParserException from langchain_core.output_parsers.json import JsonOutputParser from langchain_core.outputs import Generation from langchain_core.utils.json import parse_json_markdown from typing import Any class CustomJsonOutputParser(JsonOutputParser): def __init__(self, **kwargs): super().__init__(**kwargs) self._buffer = "" def parse_result(self, result: list[Generation], *, partial: bool = False) -> Any: chunk = result[0].text self._buffer += chunk # Fix malformed prefix if it appears exactly at start if self._buffer.startswith('{"{"'): self._buffer = '{' + self._buffer[3:] try: parsed = parse_json_markdown(self._buffer) self._buffer = "" # clear buffer after success return parsed except JSONDecodeError: if partial: return None raise OutputParserException(f"Invalid json output: {self._buffer}") structured_chain = model | self._extract_response_metadata | CustomJsonOutputParser() | self._extract_response_answer_fields ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7192