[PR #11301] [CLOSED] model: precompute special tokens in NewVocabulary to avoid repeated alloc #75796

Closed
opened 2026-05-05 08:13:38 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11301
Author: @Stogas
Created: 7/4/2025
Status: Closed

Base: mainHead: main


📝 Commits (2)

  • a4fd04a model: precompute special tokens in NewVocabulary to avoid repeated allocations
  • 0bb3dcd model: add benchmark for SpecialVocabulary()

📊 Changes

10 files changed (+125 additions, -77 deletions)

View changed files

📝 model/bytepairencoding_test.go (+1 -5)
📝 model/models/llama/model.go (+10 -9)
📝 model/models/llama4/model.go (+16 -13)
📝 model/models/mistral3/model.go (+10 -9)
📝 model/models/qwen2/model.go (+10 -9)
📝 model/models/qwen25vl/model.go (+10 -9)
📝 model/models/qwen3/model.go (+10 -9)
📝 model/vocabulary.go (+22 -10)
model/vocabulary_bench_test.go (+31 -0)
📝 model/vocabulary_test.go (+5 -4)

📄 Description

This moves the special token list creation into the constructor, so that SpecialVocabulary() doesn't need to run sync.Once.Do() repeatedly. Doing this only once per model load avoids excessive memory allocations during chat requests when sync.Once.Do() is called with a Closure / Anonymous function.

Fixes #11299


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11301 **Author:** [@Stogas](https://github.com/Stogas) **Created:** 7/4/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (2) - [`a4fd04a`](https://github.com/ollama/ollama/commit/a4fd04a3b9677bfe1816ef370a24f30438f20ac3) model: precompute special tokens in NewVocabulary to avoid repeated allocations - [`0bb3dcd`](https://github.com/ollama/ollama/commit/0bb3dcd8d0e5ab833f11e1239ecfa43d0b6aa030) model: add benchmark for SpecialVocabulary() ### 📊 Changes **10 files changed** (+125 additions, -77 deletions) <details> <summary>View changed files</summary> 📝 `model/bytepairencoding_test.go` (+1 -5) 📝 `model/models/llama/model.go` (+10 -9) 📝 `model/models/llama4/model.go` (+16 -13) 📝 `model/models/mistral3/model.go` (+10 -9) 📝 `model/models/qwen2/model.go` (+10 -9) 📝 `model/models/qwen25vl/model.go` (+10 -9) 📝 `model/models/qwen3/model.go` (+10 -9) 📝 `model/vocabulary.go` (+22 -10) ➕ `model/vocabulary_bench_test.go` (+31 -0) 📝 `model/vocabulary_test.go` (+5 -4) </details> ### 📄 Description This moves the special token list creation into the constructor, so that `SpecialVocabulary()` doesn't need to run `sync.Once.Do()` repeatedly. Doing this only once per model load avoids excessive memory allocations during chat requests when `sync.Once.Do()` is called with a Closure / Anonymous function. Fixes #11299 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:13:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75796