[GH-ISSUE #15798] gemma4-64k chat-template special tokens (<|tool_call|>, <|"|>, <|channel|>) leak into OpenAI-compat tool-calling output #72125

Closed
opened 2026-05-05 03:30:55 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @mons-bot on GitHub (Apr 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15798

Observed

Running gemma4-64k:latest through Ollama's OpenAI-compat endpoint (/v1/chat/completions with tool schemas), the model emits tool calls as plain text inside the assistant message — often inside a thinking block — with Gemma chat-template special tokens leaked verbatim (<|tool_call|>, <|"|>, <|channel|>, <|tool_response|>). finish_reason returns stop, so the harness ends the turn as if the model had spoken a normal reply, and no tool call is dispatched.

Ollama version

0.21.1. Prior related issues (#15241, #15315) claim a fix in 0.20.6 but the behavior is still reproducible on 0.21.1.

Pattern

  • String arguments wrapped in <|"|>...<|"|> instead of real quotes.
  • call: prefix where a structured tool-call should be.
  • Surrounding <|tool_call|> / <|channel|> template tokens emitted as plain text inside content.

Workaround

None. Swapping model family is the only reliable option.

Closing: will redo with a generic minimal repro if needed.

Originally created by @mons-bot on GitHub (Apr 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15798 ### Observed Running `gemma4-64k:latest` through Ollama's OpenAI-compat endpoint (`/v1/chat/completions` with tool schemas), the model emits tool calls as plain text inside the assistant message — often inside a `thinking` block — with Gemma chat-template special tokens leaked verbatim (`<|tool_call|>`, `<|"|>`, `<|channel|>`, `<|tool_response|>`). `finish_reason` returns `stop`, so the harness ends the turn as if the model had spoken a normal reply, and no tool call is dispatched. ### Ollama version 0.21.1. Prior related issues (#15241, #15315) claim a fix in 0.20.6 but the behavior is still reproducible on 0.21.1. ### Pattern - String arguments wrapped in `<|"|>...<|"|>` instead of real quotes. - `call:` prefix where a structured tool-call should be. - Surrounding `<|tool_call|>` / `<|channel|>` template tokens emitted as plain text inside `content`. ### Workaround None. Swapping model family is the only reliable option. Closing: will redo with a generic minimal repro if needed.
Author
Owner

@fabiopolimeni commented on GitHub (Apr 30, 2026):

This happens and makes the gemma4 family unusable for agentic workflows that hit OpenAI compatible API (almost all by default use it).

<!-- gh-comment-id:4350863772 --> @fabiopolimeni commented on GitHub (Apr 30, 2026): This happens and makes the gemma4 family unusable for agentic workflows that hit OpenAI compatible API (almost all by default use it).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72125