[GH-ISSUE #15465] fix: recover truncated tool call JSON when max_tokens cuts output #9885

Open
opened 2026-04-12 22:44:40 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Rih0z on GitHub (Apr 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15465

Problem

When a model generates a large tool call (e.g., write_file with HTML content), the JSON arguments can be truncated if the output hits the token limit. The tool call is silently dropped instead of returning partial arguments.

This causes missing tool calls when:

  • write_file produces large HTML/code content
  • Qwen3/Gemma4 models hit their output token limit mid-JSON

Proposed Fix

Add recovery for truncated tool call JSON so that at least the successfully parsed key-value pairs are returned rather than dropping the entire call.

Reproduction

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "Write a complete HTML dashboard page"}],
  "tools": [{"type": "function", "function": {"name": "write_file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}}}}],
  "options": {"num_predict": 2048}
}'

Tool call JSON gets truncated, no tool_calls in response.

Note: PRs #14835/#14915 fix this for model-specific parsers (Qwen3, etc.). This fix targets the generic tools/tools.go parser used when no model-specific parser is available.

Originally created by @Rih0z on GitHub (Apr 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15465 ## Problem When a model generates a large tool call (e.g., `write_file` with HTML content), the JSON arguments can be truncated if the output hits the token limit. The tool call is silently dropped instead of returning partial arguments. This causes missing tool calls when: - `write_file` produces large HTML/code content - Qwen3/Gemma4 models hit their output token limit mid-JSON ## Proposed Fix Add recovery for truncated tool call JSON so that at least the successfully parsed key-value pairs are returned rather than dropping the entire call. ## Reproduction ```bash curl http://localhost:11434/api/chat -d '{ "model": "qwen3:8b", "messages": [{"role": "user", "content": "Write a complete HTML dashboard page"}], "tools": [{"type": "function", "function": {"name": "write_file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}}}}], "options": {"num_predict": 2048} }' ``` Tool call JSON gets truncated, no tool_calls in response. Note: PRs #14835/#14915 fix this for model-specific parsers (Qwen3, etc.). This fix targets the generic `tools/tools.go` parser used when no model-specific parser is available.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9885