[PR #14230] fix(server): implement robust truncation loop for embeddings #40455

Open
opened 2026-04-23 01:20:58 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14230
Author: @BurakBebek1
Created: 2/13/2026
Status: 🔄 Open

Base: mainHead: fix-embed-truncation-final


📝 Commits (1)

  • 580b10d fix(server): implement robust truncation loop for embeddings

📊 Changes

1 file changed (+25 additions, -3 deletions)

View changed files

📝 server/routes.go (+25 -3)

📄 Description

Description:

The Issue

Currently, the /api/embed endpoint can return a 400 Bad Request with the error the input length exceeds the context length, even when truncate: true is provided.

This happens because of token expansion during the detokenization-to-retokenization cycle. When input contains special characters, emojis, or specific multilingual strings (e.g., Turkish or Japanese), the initial truncation to ctxLen may result in a string that, when re-tokenized by the runner, exceeds the limit by a few tokens.

The Solution
I have implemented a robust verification loop in server/routes.go within the embedWithRetry function.

Key features of this fix:

  1. Validation Loop: After the initial truncation and detokenization, the code re-tokenizes the resulting string to verify the actual token count.
  2. Dynamic Reduction: If the count still exceeds ctxLen, it calculates the overshoot and reduces the token slice. I've added a small buffer (overshoot / 10) to the reduction to minimize loop iterations while staying as close to the limit as possible.
    63 Strict Enforcement: It guarantees that the final payload sent to the model will never exceed the context limit, regardless of character encoding complexities.

Testing & Verification

Tested on Windows/WSL2 using the all-minilm model (256 context limit).

  • Test Input: A 1000+ token string containing Turkish text, Japanese characters, and 150+ emojis (🌟🌀).
  • Result Before Fix: Failed with 400 Bad Request.
  • Result After Fix: Successfully returns embeddings with prompt_eval_count consistently at 256.
    Fixes [PR #13383] [MERGED] renderers/parsers: olmo3 instruct (#14186)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14230 **Author:** [@BurakBebek1](https://github.com/BurakBebek1) **Created:** 2/13/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix-embed-truncation-final` --- ### 📝 Commits (1) - [`580b10d`](https://github.com/ollama/ollama/commit/580b10d4c61c75c1c1997e1c17df9c0737711a9b) fix(server): implement robust truncation loop for embeddings ### 📊 Changes **1 file changed** (+25 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `server/routes.go` (+25 -3) </details> ### 📄 Description ### Description: ### The Issue Currently, the /api/embed endpoint can return a 400 Bad Request with the error the input length exceeds the context length, even when truncate: true is provided. This happens because of token expansion during the detokenization-to-retokenization cycle. When input contains special characters, emojis, or specific multilingual strings (e.g., Turkish or Japanese), the initial truncation to ctxLen may result in a string that, when re-tokenized by the runner, exceeds the limit by a few tokens. The Solution I have implemented a robust verification loop in server/routes.go within the embedWithRetry function. --- ### Key features of this fix: 1. Validation Loop: After the initial truncation and detokenization, the code re-tokenizes the resulting string to verify the actual token count. 2. Dynamic Reduction: If the count still exceeds ctxLen, it calculates the overshoot and reduces the token slice. I've added a small buffer (overshoot / 10) to the reduction to minimize loop iterations while staying as close to the limit as possible. 63 Strict Enforcement: It guarantees that the final payload sent to the model will never exceed the context limit, regardless of character encoding complexities. --- ### Testing & Verification Tested on Windows/WSL2 using the all-minilm model (256 context limit). - Test Input: A 1000+ token string containing Turkish text, Japanese characters, and 150+ emojis (🌟🌀). - Result Before Fix: Failed with 400 Bad Request. - Result After Fix: Successfully returns embeddings with prompt_eval_count consistently at 256. Fixes #14186 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:20:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#40455