[GH-ISSUE #15278] Bug: gemma4 tokenizer drops multi-byte characters (German Umlaute ä, ö, ü, ß) #9773

Closed
opened 2026-04-12 22:39:30 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Tweschke3 on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15278

What is the issue?

Environment

  • OS: Ubuntu Linux
  • Ollama version: 0.20.0-rc1
  • Model: gemma4

Describe the bug

When generating German text using the new gemma4 model, all multi-byte characters like German Umlaute (ä, ö, ü) and the letter "ß" are completely swallowed/omitted in the output. This results in missing characters or concatenated words.

Steps to reproduce

  1. Run ollama run gemma4
  2. Use a prompt that forces the model to use Umlaute, for example:
    Bitte schreibe genau diesen Satz: Der süße Bär isst schöne Äpfel.
  3. Observe the output. The characters ä, ö, ü, and ß will be missing.

Expected behavior

The model should correctly output multi-byte UTF-8 characters.
Expected: Der süße Bär isst schöne Äpfel.

Actual behavior

The characters are dropped entirely.
Actual output looks something like this: Der s e B r isst sch ne pfiel. (or similar gaps where the Umlaute should be).

Additional context

  • This is not a local terminal encoding issue. My system locale is strictly set to UTF-8 (LANG=de_DE.UTF-8, LC_ALL=).
  • I tested ollama run gemma3 in the exact same environment, and the Umlaute work perfectly fine there.
  • This heavily points towards a tokenizer mapping/encoding issue specific to the gemma4 GGUF conversion currently served in the Ollama registry.

Relevant log output

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "Write this exact sentence: Der süße Bär isst schöne Äpfel.",
  "stream": false
}'

{"model":"gemma4","created_at":"2026-04-03T11:50:07.762519177Z","response":"Der s B isst sch el.","done":true,"done_reason":"stop","context":[2,105,9731,107,98,106,107,105,2364,107,6974,672,4453,13315,236787,9356,23618,16756,603,4793,563,540,113775,31083,22744,535,236761,106,107,105,4368,107,100,45518,107,818,2430,8150,786,531,4903,496,3530,13315,236787,623,17361,503,603,563,540,4058,862,1781,564,1202,531,3938,7121,1144,901,3847,236761,101,17361,503,603,563,540,4058,862,236761],"total_duration":950221041,"load_duration":281304745,"prompt_eval_count":32,"prompt_eval_duration":25841759,"eval_count":49,"eval_duration":619591180}

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.20.0-rc1

Originally created by @Tweschke3 on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15278 ### What is the issue? ### Environment * **OS:** Ubuntu Linux * **Ollama version:** 0.20.0-rc1 * **Model:** gemma4 ### Describe the bug When generating German text using the new `gemma4` model, all multi-byte characters like German Umlaute (ä, ö, ü) and the letter "ß" are completely swallowed/omitted in the output. This results in missing characters or concatenated words. ### Steps to reproduce 1. Run `ollama run gemma4` 2. Use a prompt that forces the model to use Umlaute, for example: `Bitte schreibe genau diesen Satz: Der süße Bär isst schöne Äpfel.` 3. Observe the output. The characters ä, ö, ü, and ß will be missing. ### Expected behavior The model should correctly output multi-byte UTF-8 characters. Expected: `Der süße Bär isst schöne Äpfel.` ### Actual behavior The characters are dropped entirely. Actual output looks something like this: `Der s e B r isst sch ne pfiel.` (or similar gaps where the Umlaute should be). ### Additional context * This is **not** a local terminal encoding issue. My system locale is strictly set to UTF-8 (`LANG=de_DE.UTF-8`, `LC_ALL=`). * I tested `ollama run gemma3` in the exact same environment, and the Umlaute work perfectly fine there. * This heavily points towards a tokenizer mapping/encoding issue specific to the `gemma4` GGUF conversion currently served in the Ollama registry. ### Relevant log output ```shell curl http://localhost:11434/api/generate -d '{ "model": "gemma4", "prompt": "Write this exact sentence: Der süße Bär isst schöne Äpfel.", "stream": false }' {"model":"gemma4","created_at":"2026-04-03T11:50:07.762519177Z","response":"Der s B isst sch el.","done":true,"done_reason":"stop","context":[2,105,9731,107,98,106,107,105,2364,107,6974,672,4453,13315,236787,9356,23618,16756,603,4793,563,540,113775,31083,22744,535,236761,106,107,105,4368,107,100,45518,107,818,2430,8150,786,531,4903,496,3530,13315,236787,623,17361,503,603,563,540,4058,862,1781,564,1202,531,3938,7121,1144,901,3847,236761,101,17361,503,603,563,540,4058,862,236761],"total_duration":950221041,"load_duration":281304745,"prompt_eval_count":32,"prompt_eval_duration":25841759,"eval_count":49,"eval_duration":619591180} ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.20.0-rc1
GiteaMirror added the bug label 2026-04-12 22:39:30 -05:00
Author
Owner

@szmarczak commented on GitHub (Apr 3, 2026):

You're using an out of date ollama. Upgrade to newest version.

<!-- gh-comment-id:4183379128 --> @szmarczak commented on GitHub (Apr 3, 2026): You're using an out of date ollama. Upgrade to newest version.
Author
Owner

@Tweschke3 commented on GitHub (Apr 3, 2026):

Thank you for your response. Upgrading to version 0.20.0 using the installation script solved the problem.

<!-- gh-comment-id:4183407698 --> @Tweschke3 commented on GitHub (Apr 3, 2026): Thank you for your response. Upgrading to version 0.20.0 using the installation script solved the problem.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9773