[PR #10577] [MERGED] model: handle multiple eos tokens #23823

Closed
opened 2026-04-19 17:14:23 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10577
Author: @mxyng
Created: 5/5/2025
Status: Merged
Merged: 5/16/2025
Merged by: @mxyng

Base: mainHead: mxyng/multiple-eos-tokens


📝 Commits (5)

📊 Changes

18 files changed (+282 additions, -182 deletions)

View changed files

📝 convert/convert.go (+4 -1)
📝 convert/tokenizer.go (+32 -0)
📝 convert/tokenizer_test.go (+61 -0)
📝 llama/llama.go (+2 -2)
📝 model/bytepairencoding.go (+4 -122)
📝 model/bytepairencoding_test.go (+0 -0)
📝 model/models/gemma2/model.go (+7 -4)
📝 model/models/gemma3/model.go (+8 -4)
📝 model/models/llama/model.go (+5 -5)
📝 model/models/llama4/model.go (+5 -5)
📝 model/models/mistral3/model.go (+9 -9)
📝 model/models/mllama/model.go (+5 -5)
📝 model/models/qwen25vl/model.go (+6 -5)
📝 model/sentencepiece.go (+4 -19)
📝 model/sentencepiece_test.go (+0 -0)
model/textprocessor.go (+17 -0)
model/vocabulary.go (+112 -0)
📝 sample/samplers.go (+1 -1)

📄 Description

this change allows arbitrary eos tokens to be defined using eos_token_ids. for backwards compatibility, this will also ingest the singular eos_token_id and eot_token_id.

one source for this value may be the model's generation_config.json


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10577 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 5/5/2025 **Status:** ✅ Merged **Merged:** 5/16/2025 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/multiple-eos-tokens` --- ### 📝 Commits (5) - [`9c799c2`](https://github.com/ollama/ollama/commit/9c799c2649b6d8bc5a253db6d6c5ad46d50fb091) get eos_token_id from generation_config.json - [`d44ad19`](https://github.com/ollama/ollama/commit/d44ad19e7e7565ce63af36e778c4f5b0aafd79e8) refactor - [`42e616a`](https://github.com/ollama/ollama/commit/42e616ab45edefa1b8eb8891484c7724dde72eac) include both ids and strings in trace - [`f7c6885`](https://github.com/ollama/ollama/commit/f7c6885aa31baab5d95f8318a3f2b571fd97e4b0) comments - [`cb5fc8d`](https://github.com/ollama/ollama/commit/cb5fc8dbbb149d5cb5387c20ab968b0ad7687561) remove special case for gemma3 special vocab (#10743) ### 📊 Changes **18 files changed** (+282 additions, -182 deletions) <details> <summary>View changed files</summary> 📝 `convert/convert.go` (+4 -1) 📝 `convert/tokenizer.go` (+32 -0) 📝 `convert/tokenizer_test.go` (+61 -0) 📝 `llama/llama.go` (+2 -2) 📝 `model/bytepairencoding.go` (+4 -122) 📝 `model/bytepairencoding_test.go` (+0 -0) 📝 `model/models/gemma2/model.go` (+7 -4) 📝 `model/models/gemma3/model.go` (+8 -4) 📝 `model/models/llama/model.go` (+5 -5) 📝 `model/models/llama4/model.go` (+5 -5) 📝 `model/models/mistral3/model.go` (+9 -9) 📝 `model/models/mllama/model.go` (+5 -5) 📝 `model/models/qwen25vl/model.go` (+6 -5) 📝 `model/sentencepiece.go` (+4 -19) 📝 `model/sentencepiece_test.go` (+0 -0) ➕ `model/textprocessor.go` (+17 -0) ➕ `model/vocabulary.go` (+112 -0) 📝 `sample/samplers.go` (+1 -1) </details> ### 📄 Description this change allows arbitrary eos tokens to be defined using `eos_token_ids`. for backwards compatibility, this will also ingest the singular `eos_token_id` and `eot_token_id`. one source for this value may be the model's `generation_config.json` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:14:23 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23823