[PR #12839] [MERGED] truncation: fixed runner truncation logic + removed server truncation #76271

Closed
opened 2026-05-05 08:46:57 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12839
Author: @npardal
Created: 10/29/2025
Status: Merged
Merged: 12/8/2025
Merged by: @npardal

Base: mainHead: nicole/truncation


📝 Commits (10+)

📊 Changes

6 files changed (+278 additions, -88 deletions)

View changed files

📝 integration/embed_test.go (+190 -15)
📝 llm/server.go (+17 -13)
📝 runner/llamarunner/runner.go (+7 -6)
📝 runner/ollamarunner/runner.go (+9 -8)
📝 server/routes.go (+53 -44)
📝 server/sched_test.go (+2 -2)

📄 Description

server/runner: Consolidate embedding truncation and token counting in runner

This PR consolidates all embedding prompt-length checking, truncation, and prompt token counting into the runner to ensure a single source of truth.

Previously, the server and runner each performed context-length checks and truncation - leading to redundancies.

Runner now:

  • Checks if tokenized input would exceed model's context window
  • If model auto-inserts BOS/EOS, reserve space for them by subtracting up to 2 from available context (numCtx)
  • If input exceeds input: chop off extra tokens from the end (instead of the middle)
  • Returns PromptEvalCount (only real tokens in prompt skipping BOS/EOS/any multimodal image inputs) so clients can see how many tokens were processed.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12839 **Author:** [@npardal](https://github.com/npardal) **Created:** 10/29/2025 **Status:** ✅ Merged **Merged:** 12/8/2025 **Merged by:** [@npardal](https://github.com/npardal) **Base:** `main` ← **Head:** `nicole/truncation` --- ### 📝 Commits (10+) - [`53add2e`](https://github.com/ollama/ollama/commit/53add2e0f5f211e692dd1f40e3a9e93f49f963cd) allowed base64 encoding - [`c563286`](https://github.com/ollama/ollama/commit/c563286e63748b9b61d014ca44d006bde230780b) simplfied logic and added tests - [`3261247`](https://github.com/ollama/ollama/commit/3261247af05382506fb0ad9bc619604cec265f47) removed truncate - [`a633983`](https://github.com/ollama/ollama/commit/a6339837e1d02ae409d52cf9f88136a0adbd7119) fixed tests - [`48b9247`](https://github.com/ollama/ollama/commit/48b9247e87b01c2181a2bd6dd0eb4e8212f3a9e2) added runnign counter and removed redundancy - [`baadc46`](https://github.com/ollama/ollama/commit/baadc46d0048c15660c13e7fe3c9f59aa6666a45) integration tests - [`9aeffd8`](https://github.com/ollama/ollama/commit/9aeffd8fa55d0a5120c718df57fe2dd12e3a4221) added truncation test - [`1650722`](https://github.com/ollama/ollama/commit/16507225528379e8ca291601cdf31306855ea70e) added better erorr handling - [`7d0fe02`](https://github.com/ollama/ollama/commit/7d0fe02a120335a0dd935c5785b8d84c320d1643) nits - [`26b1dd0`](https://github.com/ollama/ollama/commit/26b1dd074bd5866fa3fb4315876c51cebb54df8f) erorr test ### 📊 Changes **6 files changed** (+278 additions, -88 deletions) <details> <summary>View changed files</summary> 📝 `integration/embed_test.go` (+190 -15) 📝 `llm/server.go` (+17 -13) 📝 `runner/llamarunner/runner.go` (+7 -6) 📝 `runner/ollamarunner/runner.go` (+9 -8) 📝 `server/routes.go` (+53 -44) 📝 `server/sched_test.go` (+2 -2) </details> ### 📄 Description **server/runner:** Consolidate embedding truncation and token counting in runner This PR consolidates all embedding prompt-length checking, truncation, and prompt token counting into the runner to ensure a single source of truth. Previously, the server and runner each performed context-length checks and truncation - leading to redundancies. Runner now: - Checks if tokenized input would exceed model's context window - If model auto-inserts BOS/EOS, reserve space for them by subtracting up to 2 from available context (numCtx) - If input exceeds input: chop off extra tokens from the end (instead of the middle) - Returns PromptEvalCount (only real tokens in prompt skipping BOS/EOS/any multimodal image inputs) so clients can see how many tokens were processed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:46:57 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#76271