[PR #12730] [MERGED] server: Consolidate embedding truncation in runner #76223

Closed
opened 2026-05-05 08:44:14 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12730
Author: @npardal
Created: 10/22/2025
Status: Merged
Merged: 10/27/2025
Merged by: @npardal

Base: mainHead: nicole/truncation


📝 Commits (10+)

📊 Changes

6 files changed (+264 additions, -84 deletions)

View changed files

📝 integration/embed_test.go (+215 -0)
📝 llm/server.go (+20 -15)
📝 runner/llamarunner/runner.go (+7 -6)
📝 runner/ollamarunner/runner.go (+7 -6)
📝 server/routes.go (+13 -55)
📝 server/sched_test.go (+2 -2)

📄 Description

server: Consolidate embedding truncation in runner

Currently, checking the length of prompts for embeddings to ensure they fit in the context window (and possible truncation) occurs in two places - the Ollama server and runner.

This can lead to inconsistencies in both the checks and reported number of tokens processed.

Since we have to do this processing in the runner, this consolidates all of the logic there.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12730 **Author:** [@npardal](https://github.com/npardal) **Created:** 10/22/2025 **Status:** ✅ Merged **Merged:** 10/27/2025 **Merged by:** [@npardal](https://github.com/npardal) **Base:** `main` ← **Head:** `nicole/truncation` --- ### 📝 Commits (10+) - [`53add2e`](https://github.com/ollama/ollama/commit/53add2e0f5f211e692dd1f40e3a9e93f49f963cd) allowed base64 encoding - [`c563286`](https://github.com/ollama/ollama/commit/c563286e63748b9b61d014ca44d006bde230780b) simplfied logic and added tests - [`3261247`](https://github.com/ollama/ollama/commit/3261247af05382506fb0ad9bc619604cec265f47) removed truncate - [`a633983`](https://github.com/ollama/ollama/commit/a6339837e1d02ae409d52cf9f88136a0adbd7119) fixed tests - [`48b9247`](https://github.com/ollama/ollama/commit/48b9247e87b01c2181a2bd6dd0eb4e8212f3a9e2) added runnign counter and removed redundancy - [`baadc46`](https://github.com/ollama/ollama/commit/baadc46d0048c15660c13e7fe3c9f59aa6666a45) integration tests - [`9aeffd8`](https://github.com/ollama/ollama/commit/9aeffd8fa55d0a5120c718df57fe2dd12e3a4221) added truncation test - [`1650722`](https://github.com/ollama/ollama/commit/16507225528379e8ca291601cdf31306855ea70e) added better erorr handling - [`7d0fe02`](https://github.com/ollama/ollama/commit/7d0fe02a120335a0dd935c5785b8d84c320d1643) nits - [`26b1dd0`](https://github.com/ollama/ollama/commit/26b1dd074bd5866fa3fb4315876c51cebb54df8f) erorr test ### 📊 Changes **6 files changed** (+264 additions, -84 deletions) <details> <summary>View changed files</summary> 📝 `integration/embed_test.go` (+215 -0) 📝 `llm/server.go` (+20 -15) 📝 `runner/llamarunner/runner.go` (+7 -6) 📝 `runner/ollamarunner/runner.go` (+7 -6) 📝 `server/routes.go` (+13 -55) 📝 `server/sched_test.go` (+2 -2) </details> ### 📄 Description server: Consolidate embedding truncation in runner Currently, checking the length of prompts for embeddings to ensure they fit in the context window (and possible truncation) occurs in two places - the Ollama **server** and **runner.** This can lead to inconsistencies in both the checks and reported number of tokens processed. Since we have to do this processing in the runner, this consolidates all of the logic there. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:44:14 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#76223