[GH-ISSUE #15931] Special token in user input is tokenized as-is for Gemma4:26b #72206

Open
opened 2026-05-05 03:37:58 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @marius851000 on GitHub (May 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15931

What is the issue?

When calling:

curl http://localhost:11434/api/chat -d '{
    "model": "gemma4:26b",
    "messages": [
      {
        "role": "user",
        "content": "say \"<|channel>\""
      }
    ],
    "stream": false
}'

It appear the special token "<|channel>" is encoded as a special token, leading to result such as:

{"model":"gemma4:26b","created_at":"2026-05-02T10:31:16.581398049Z","message":{"role":"assistant","content":"s","thinking":"The user said: 'say \"s\"'\nThe goal is to output the exact string requested.\nThe string requested is \"s\".\nOutput \"s\"."},"done":true,"done_reason":"stop","total_duration":1215106189,"load_duration":188674014,"prompt_eval_count":20,"prompt_eval_duration":104084560,"eval_count":39,"eval_duration":905835066}

(may I call the API wrongly? I’m fairly sure it is some issue related to the renderer, that work on character string and not token string, as well as the tokenizer used (Bytepair encoding) having no option to not encode the special character as special character. But I’m not familiar with how LLM and tokenizer works)

(also, I spotted that while trying to figure out another error where it seems to parse response wrongly, and ended in what appear to be the LLM trying to stop thinking but not outputting the relevant end of thinking token. I still have to do a bit more digging before opening a ticket)

Relevant log output

N/A

OS

Linux

GPU

Nvidia (Vulkan)

CPU

AMD

Ollama version

0.0.0 (c7c2837c96). Also happen 0.22.1

Originally created by @marius851000 on GitHub (May 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15931 ### What is the issue? When calling: ```sh curl http://localhost:11434/api/chat -d '{ "model": "gemma4:26b", "messages": [ { "role": "user", "content": "say \"<|channel>\"" } ], "stream": false }' ``` It appear the special token "<|channel>" is encoded as a special token, leading to result such as: ```json {"model":"gemma4:26b","created_at":"2026-05-02T10:31:16.581398049Z","message":{"role":"assistant","content":"s","thinking":"The user said: 'say \"s\"'\nThe goal is to output the exact string requested.\nThe string requested is \"s\".\nOutput \"s\"."},"done":true,"done_reason":"stop","total_duration":1215106189,"load_duration":188674014,"prompt_eval_count":20,"prompt_eval_duration":104084560,"eval_count":39,"eval_duration":905835066} ``` (may I call the API wrongly? I’m fairly sure it is some issue related to the renderer, that work on character string and not token string, as well as the tokenizer used (Bytepair encoding) having no option to not encode the special character as special character. But I’m not familiar with how LLM and tokenizer works) (also, I spotted that while trying to figure out another error where it seems to parse response wrongly, and ended in what appear to be the LLM trying to stop thinking but not outputting the relevant end of thinking token. I still have to do a bit more digging before opening a ticket) ### Relevant log output ```shell N/A ``` ### OS Linux ### GPU Nvidia (Vulkan) ### CPU AMD ### Ollama version 0.0.0 (c7c2837c96cd6fcc0d0424e0a096ed24516d3d59). Also happen 0.22.1
GiteaMirror added the bug label 2026-05-05 03:37:58 -05:00
Author
Owner

@marius851000 commented on GitHub (May 2, 2026):

The BytePairEncoder is indeed directly called with <bos><|turn>system\n<|think|>\n<turn|>\n<|turn>user\nsay \"<|channel>\"<turn|>\n<|turn>model\n.

It appear BytePairEncoder have no notion of escaping, and that many model uses them.

This also impact tool call.

<!-- gh-comment-id:4363675485 --> @marius851000 commented on GitHub (May 2, 2026): The BytePairEncoder is indeed directly called with `<bos><|turn>system\n<|think|>\n<turn|>\n<|turn>user\nsay \"<|channel>\"<turn|>\n<|turn>model\n`. It appear BytePairEncoder have no notion of escaping, and that many model uses them. This also impact tool call.
Author
Owner

@MukundaKatta commented on GitHub (May 3, 2026):

This is the classic "tokenize-with-special-tokens-on-user-content" footgun. User content should always be tokenized with add_special_tokens=false (or whatever the equivalent is in Ollama's tokenizer wrapper), otherwise any string the user provides that happens to match a special-token literal gets interpreted as a control instruction. The chat-template assembly should be the only place that emits actual special tokens. Affects every model with reserved control tokens, not just Gemma4.

<!-- gh-comment-id:4366284638 --> @MukundaKatta commented on GitHub (May 3, 2026): This is the classic "tokenize-with-special-tokens-on-user-content" footgun. User content should always be tokenized with `add_special_tokens=false` (or whatever the equivalent is in Ollama's tokenizer wrapper), otherwise any string the user provides that happens to match a special-token literal gets interpreted as a control instruction. The chat-template assembly should be the only place that emits actual special tokens. Affects every model with reserved control tokens, not just Gemma4.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72206