[PR #12061] Add truncate option to generate/chat to error on overflow #13700

Open
opened 2026-04-13 00:33:12 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12061
Author: @eastlondoner
Created: 8/24/2025
Status: 🔄 Open

Base: mainHead: feat/truncate-generate-chat


📝 Commits (1)

  • 3e0cc49 server(api): add truncate option to generate/chat to error on overflow\n\n- Add Truncate to GenerateRequest and ChatRequest\n- GenerateHandler validates when truncate=false\n- ChatHandler passes truncate to chatPrompt; chatPrompt validates\n- Tests for generate/chat truncate=false 400 behavior\n- Docs: document truncate for generate/chat\n\nBackwards compatible: default remains truncation.\n

📊 Changes

6 files changed (+117 additions, -7 deletions)

View changed files

📝 api/types.go (+12 -0)
📝 docs/api.md (+2 -0)
📝 server/prompt.go (+37 -5)
📝 server/prompt_test.go (+1 -1)
📝 server/routes.go (+30 -1)
📝 server/routes_generate_test.go (+35 -0)

📄 Description

This PR adds a truncate switch to /api/generate and /api/chat to optionally error when inputs exceed num_ctx.

This mirrors the truncate key that is already available on the embeddings API.

Motivation

  • Current behavior always truncates using a sliding window, which can hide issues and lead to surprising outputs
  • Strict mode helps surface upstream problems sooner

Changes

  • API: add truncate to GenerateRequest and ChatRequest (default true)
  • Generate: when truncate=false, validate rendered prompt (+ image token estimate) and return 400 if overflow
  • Chat: wire truncate into chatPrompt; validate and error when disabled
  • Tests: add unit tests for generate/chat overflow with truncate=false
  • Docs: update docs/api.md to document truncate for generate/chat

Backwards compatibility

  • Defaults remain truncation; no behavior change unless truncate=false is passed

Examples

  • Generate: { truncate: false, options: { num_ctx: 64 }, prompt: 'very long...' } => 400 on overflow
  • Chat: { truncate: false, options: { num_ctx: 64 }, messages: [{ role: 'user', content: 'very long...' }] } => 400 on overflow

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12061 **Author:** [@eastlondoner](https://github.com/eastlondoner) **Created:** 8/24/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat/truncate-generate-chat` --- ### 📝 Commits (1) - [`3e0cc49`](https://github.com/ollama/ollama/commit/3e0cc49456289a77fc0264002e37bdca2e521436) server(api): add truncate option to generate/chat to error on overflow\n\n- Add Truncate to GenerateRequest and ChatRequest\n- GenerateHandler validates when truncate=false\n- ChatHandler passes truncate to chatPrompt; chatPrompt validates\n- Tests for generate/chat truncate=false 400 behavior\n- Docs: document truncate for generate/chat\n\nBackwards compatible: default remains truncation.\n ### 📊 Changes **6 files changed** (+117 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+12 -0) 📝 `docs/api.md` (+2 -0) 📝 `server/prompt.go` (+37 -5) 📝 `server/prompt_test.go` (+1 -1) 📝 `server/routes.go` (+30 -1) 📝 `server/routes_generate_test.go` (+35 -0) </details> ### 📄 Description This PR adds a truncate switch to /api/generate and /api/chat to optionally error when inputs exceed num_ctx. This mirrors the truncate key that is already available on the embeddings API. Motivation - Current behavior always truncates using a sliding window, which can hide issues and lead to surprising outputs - Strict mode helps surface upstream problems sooner Changes - API: add truncate to GenerateRequest and ChatRequest (default true) - Generate: when truncate=false, validate rendered prompt (+ image token estimate) and return 400 if overflow - Chat: wire truncate into chatPrompt; validate and error when disabled - Tests: add unit tests for generate/chat overflow with truncate=false - Docs: update docs/api.md to document truncate for generate/chat Backwards compatibility - Defaults remain truncation; no behavior change unless truncate=false is passed Examples - Generate: { truncate: false, options: { num_ctx: 64 }, prompt: 'very long...' } => 400 on overflow - Chat: { truncate: false, options: { num_ctx: 64 }, messages: [{ role: 'user', content: 'very long...' }] } => 400 on overflow --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:33:12 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13700