[GH-ISSUE #14570] qwen3 tool call parser returns 500 when model output is truncated #71507

Open
opened 2026-05-05 01:57:36 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @lefoulkrod on GitHub (Mar 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14570

Summary

When using Qwen3 models with tool calling, if the model's generated tool call JSON is truncated (due to hitting num_predict or context window limits), qwen3.go:108 fails to parse the incomplete JSON and returns an HTTP 500 instead of gracefully handling the situation.

This makes Qwen3 tool calling unreliable for tasks that produce large tool call arguments (e.g., writing files with substantial content).

Behavior

Expected: When the model's output is truncated mid-tool-call, Ollama should either:

  1. Return the generated text as regular content (not as a tool call), or
  2. Return a response with done_reason: "length" so the client can handle it

Actual: Ollama returns HTTP 500 with no usable response. The server log shows:

level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input"
[GIN] | 500 | 1m35s | 127.0.0.1 | POST "/api/chat"

Reproduction

  1. Use any Qwen3 model with tools enabled
  2. Send a prompt that causes the model to emit a tool call with a very large argument (e.g., "write a complete HTML game to a file" where the file content is a tool call argument)
  3. Use default num_predict (or set it low enough that the tool call JSON gets truncated)

The model generates 1-3 minutes of output, hits the token limit, and the incomplete JSON fails parsing.

With num_predict: -1: Most calls succeed (200), but if the conversation history + generation fills the context window, the same truncation/500 occurs.

Minimal repro

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3.5:35b-a3b",
  "stream": false,
  "options": {"num_predict": 200},
  "tools": [{
    "type": "function",
    "function": {
      "name": "write_file",
      "description": "Write content to a file",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {"type": "string"},
          "content": {"type": "string"}
        },
        "required": ["path", "content"]
      }
    }
  }],
  "messages": [
    {"role": "user", "content": "Write a 500-line Python web server to /tmp/server.py"}
  ]
}'

With num_predict: 200, the model will attempt a write_file tool call but the JSON will be truncated, resulting in a 500.

Log excerpt

Over a ~20 minute window with repeated retries on the same prompt:

19:16:02 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input"
19:16:02 [GIN] | 500 | 43.17s | POST "/api/chat"
19:20:08 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input"
19:20:08 [GIN] | 500 | 1m47s | POST "/api/chat"
19:21:47 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input"
19:21:47 [GIN] | 500 | 1m38s | POST "/api/chat"
(repeated ~12 more times, every ~1.5 min)

After setting num_predict: -1, most calls succeed but the 500 still occurs when context window fills:

19:48:52 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input"
19:48:52 [GIN] | 500 | 1m15s | POST "/api/chat"

Environment

  • Ollama version: 0.17.4
  • Model: qwen3.5:35b-a3b
  • OS: Ubuntu 24.04.3 LTS, kernel 6.17.0-14-generic
  • GPU: 3x NVIDIA (RTX 3090 Ti 24GB + RTX 3090 24GB + RTX 3060 12GB)
  • num_ctx: 60000
  • stream: false
Originally created by @lefoulkrod on GitHub (Mar 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14570 ## Summary When using Qwen3 models with tool calling, if the model's generated tool call JSON is truncated (due to hitting `num_predict` or context window limits), `qwen3.go:108` fails to parse the incomplete JSON and returns an HTTP 500 instead of gracefully handling the situation. This makes Qwen3 tool calling unreliable for tasks that produce large tool call arguments (e.g., writing files with substantial content). ## Behavior **Expected:** When the model's output is truncated mid-tool-call, Ollama should either: 1. Return the generated text as regular `content` (not as a tool call), or 2. Return a response with `done_reason: "length"` so the client can handle it **Actual:** Ollama returns HTTP 500 with no usable response. The server log shows: ``` level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input" [GIN] | 500 | 1m35s | 127.0.0.1 | POST "/api/chat" ``` ## Reproduction 1. Use any Qwen3 model with tools enabled 2. Send a prompt that causes the model to emit a tool call with a very large argument (e.g., "write a complete HTML game to a file" where the file content is a tool call argument) 3. Use default `num_predict` (or set it low enough that the tool call JSON gets truncated) The model generates 1-3 minutes of output, hits the token limit, and the incomplete JSON fails parsing. **With `num_predict: -1`:** Most calls succeed (200), but if the conversation history + generation fills the context window, the same truncation/500 occurs. ### Minimal repro ```bash curl http://localhost:11434/api/chat -d '{ "model": "qwen3.5:35b-a3b", "stream": false, "options": {"num_predict": 200}, "tools": [{ "type": "function", "function": { "name": "write_file", "description": "Write content to a file", "parameters": { "type": "object", "properties": { "path": {"type": "string"}, "content": {"type": "string"} }, "required": ["path", "content"] } } }], "messages": [ {"role": "user", "content": "Write a 500-line Python web server to /tmp/server.py"} ] }' ``` With `num_predict: 200`, the model will attempt a `write_file` tool call but the JSON will be truncated, resulting in a 500. ## Log excerpt Over a ~20 minute window with repeated retries on the same prompt: ``` 19:16:02 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input" 19:16:02 [GIN] | 500 | 43.17s | POST "/api/chat" 19:20:08 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input" 19:20:08 [GIN] | 500 | 1m47s | POST "/api/chat" 19:21:47 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input" 19:21:47 [GIN] | 500 | 1m38s | POST "/api/chat" (repeated ~12 more times, every ~1.5 min) ``` After setting `num_predict: -1`, most calls succeed but the 500 still occurs when context window fills: ``` 19:48:52 level=WARN source=qwen3.go:108 msg="qwen3 tool call parsing failed" error="failed to parse JSON: unexpected end of JSON input" 19:48:52 [GIN] | 500 | 1m15s | POST "/api/chat" ``` ## Environment - **Ollama version:** 0.17.4 - **Model:** qwen3.5:35b-a3b - **OS:** Ubuntu 24.04.3 LTS, kernel 6.17.0-14-generic - **GPU:** 3x NVIDIA (RTX 3090 Ti 24GB + RTX 3090 24GB + RTX 3060 12GB) - **num_ctx:** 60000 - **stream:** false
GiteaMirror added the bug label 2026-05-05 01:57:36 -05:00
Author
Owner

@airhand commented on GitHub (Mar 3, 2026):

the same problems on qwen3.5-4b、qwen3.5-9b、qwen3.5-27b

<!-- gh-comment-id:3989975277 --> @airhand commented on GitHub (Mar 3, 2026): the same problems on qwen3.5-4b、qwen3.5-9b、qwen3.5-27b
Author
Owner

@EmilioSchi commented on GitHub (Mar 9, 2026):

Loading the follow Modelfile

FROM ./Qwen3.5-9B-Q4_K_M.gguf

PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER min_p 0
PARAMETER repeat_penalty 1.01
PARAMETER num_ctx 32768
PARAMETER num_predict -1
PARAMETER repeat_last_n -1

PARAMETER stop "</tool_call>"

SYSTEM """
You are an assistant with exactly one tool: bash.

The bash tool executes a shell command on the local system.

When a shell command is needed, respond with ONLY:
<tool_call>
{"name":"bash","arguments":{"command":"bash -lc \\"<command>\\""}}
</tool_call>

Rules:
- Use bash for filesystem inspection, searching, editing files, running programs, and system inspection.
- Prefer combining related operations in one command using && and |.
- Prefer multi-pattern search with grep -E "a|b|c".
- Before creating a file, check whether it exists.
- For complex work, create and maintain TODO.md with one small task per line.
- Write code incrementally in small steps.
- Do not write full files in one large heredoc.
- Prefer small appends, safe replacements, or diff/patch workflows.
- After each command, include a status message inside the shell command:
  && echo "DONE: description" || echo "ERROR: description"

Useful command patterns:
- pwd && ls -la | head
- test -f FILE && echo EXISTS || echo MISSING
- test -d DIR && echo EXISTS || echo MISSING
- grep -nE "TODO|FIXME|BUG" FILE | head
- find . -type f -name "*.py" | xargs grep -nE "pattern"
- wc -l FILE && head -n 20 FILE && tail -n 20 FILE
- echo "line of code" >> FILE
- printf "line1\nline2\n" >> FILE
- sed 's/OLD/NEW/g' FILE > FILE.tmp && mv FILE.tmp FILE
- cp FILE FILE.new && diff -u FILE FILE.new > change.patch
- patch --dry-run FILE change.patch && patch FILE change.patch

If no tool is needed, answer normally.
"""

TEMPLATE """{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}
Available tool:
<tools>
{{- range .Tools }}
{"type":"function","function":{{ .Function }}}
{{- end }}
</tools>

When calling the tool, return ONLY:
<tool_call>
{"name":"bash","arguments":{"command":"bash -lc \\"<command>\\""}}
</tool_call>
{{- end }}<|im_end|>
{{- end }}

{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}

{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>

{{- else if eq .Role "assistant" }}<|im_start|>assistant
{{- if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{- range .ToolCalls }}{"name":"{{ .Function.Name }}","arguments":{{ .Function.Arguments }}}
{{- end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>{{ end }}

{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{- end }}

{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{- end }}
{{- end }}

{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{- end }}
{{- if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{- end }}<|im_start|>assistant
{{- end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""

And running (on Macbook M2 16GB of Ram) with
OLLAMA_CONTEXT_LENGTH=49000 ollama run qwen3.5-9b --experimental --experimental-yolo

The Qwen3.5-9B-Q4_K_M.gguf model seems to work better for me. But maybe it isn't a solution

<!-- gh-comment-id:4026296797 --> @EmilioSchi commented on GitHub (Mar 9, 2026): Loading the follow Modelfile ```md FROM ./Qwen3.5-9B-Q4_K_M.gguf PARAMETER temperature 0.6 PARAMETER top_p 0.95 PARAMETER top_k 20 PARAMETER min_p 0 PARAMETER repeat_penalty 1.01 PARAMETER num_ctx 32768 PARAMETER num_predict -1 PARAMETER repeat_last_n -1 PARAMETER stop "</tool_call>" SYSTEM """ You are an assistant with exactly one tool: bash. The bash tool executes a shell command on the local system. When a shell command is needed, respond with ONLY: <tool_call> {"name":"bash","arguments":{"command":"bash -lc \\"<command>\\""}} </tool_call> Rules: - Use bash for filesystem inspection, searching, editing files, running programs, and system inspection. - Prefer combining related operations in one command using && and |. - Prefer multi-pattern search with grep -E "a|b|c". - Before creating a file, check whether it exists. - For complex work, create and maintain TODO.md with one small task per line. - Write code incrementally in small steps. - Do not write full files in one large heredoc. - Prefer small appends, safe replacements, or diff/patch workflows. - After each command, include a status message inside the shell command: && echo "DONE: description" || echo "ERROR: description" Useful command patterns: - pwd && ls -la | head - test -f FILE && echo EXISTS || echo MISSING - test -d DIR && echo EXISTS || echo MISSING - grep -nE "TODO|FIXME|BUG" FILE | head - find . -type f -name "*.py" | xargs grep -nE "pattern" - wc -l FILE && head -n 20 FILE && tail -n 20 FILE - echo "line of code" >> FILE - printf "line1\nline2\n" >> FILE - sed 's/OLD/NEW/g' FILE > FILE.tmp && mv FILE.tmp FILE - cp FILE FILE.new && diff -u FILE FILE.new > change.patch - patch --dry-run FILE change.patch && patch FILE change.patch If no tool is needed, answer normally. """ TEMPLATE """{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{- end }} {{- if .Tools }} Available tool: <tools> {{- range .Tools }} {"type":"function","function":{{ .Function }}} {{- end }} </tools> When calling the tool, return ONLY: <tool_call> {"name":"bash","arguments":{"command":"bash -lc \\"<command>\\""}} </tool_call> {{- end }}<|im_end|> {{- end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|im_start|>user {{ .Content }}<|im_end|> {{- else if eq .Role "assistant" }}<|im_start|>assistant {{- if .Content }}{{ .Content }} {{- else if .ToolCalls }}<tool_call> {{- range .ToolCalls }}{"name":"{{ .Function.Name }}","arguments":{{ .Function.Arguments }}} {{- end }}</tool_call> {{- end }}{{ if not $last }}<|im_end|>{{ end }} {{- else if eq .Role "tool" }}<|im_start|>user <tool_response> {{ .Content }} </tool_response><|im_end|> {{- end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant {{- end }} {{- end }} {{- else }} {{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{- end }} {{- if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{- end }}<|im_start|>assistant {{- end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}""" ``` And running (on Macbook M2 16GB of Ram) with ` OLLAMA_CONTEXT_LENGTH=49000 ollama run qwen3.5-9b --experimental --experimental-yolo ` The Qwen3.5-9B-Q4_K_M.gguf model seems to work better for me. But maybe it isn't a solution
Author
Owner

@MunemHashmi commented on GitHub (Mar 13, 2026):

Opened a fix for this in #14835 — the parser was returning a hard error on malformed tool call JSON which bubbled up as a 500. Now it falls back to returning the raw content instead, and also drains the buffer properly when generation gets cut short mid-tool-call.

<!-- gh-comment-id:4058591947 --> @MunemHashmi commented on GitHub (Mar 13, 2026): Opened a fix for this in #14835 — the parser was returning a hard error on malformed tool call JSON which bubbled up as a 500. Now it falls back to returning the raw content instead, and also drains the buffer properly when generation gets cut short mid-tool-call.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71507