[PR #12519] [MERGED] add truncate and shift parameters #45104

Closed
opened 2026-04-25 00:47:22 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12519
Author: @jmorganca
Created: 10/7/2025
Status: Merged
Merged: 10/9/2025
Merged by: @jmorganca

Base: mainHead: jmorganca/truncate


📝 Commits (8)

📊 Changes

8 files changed (+272 additions, -67 deletions)

View changed files

📝 api/types.go (+16 -0)
📝 llm/server.go (+4 -2)
📝 runner/llamarunner/runner.go (+22 -0)
📝 runner/ollamarunner/runner.go (+25 -0)
📝 server/prompt.go (+2 -2)
📝 server/prompt_test.go (+64 -38)
📝 server/routes.go (+38 -25)
📝 server/routes_generate_test.go (+101 -0)

📄 Description

This adds two new optional fields to the API that should, for now, go mostly unused except for the OpenAI compatibility middleware:

  1. truncate: whether to automatically truncate the prompt (by truncating the list of messages, or a single message). If false, the model will return a 400 error with a descriptive message that the input length exceeds the context
  2. shift: whether to shift the context when reaching the context limit. If false, the model will instead return "done_reason": "limit".

Both fields default to true if unset, as this is the behavior in Ollama's API today. However they are named/designed such that they could eventually default to false, which is the behavior of other APIs (e.g. OpenAI, Anthropic) The plan is to make the OpenAI compatible API disable these fields by default (in a follow up change), and at some point in the future, disable them for Ollama's API as well.

Motivation: modern applications (e.g. codex) need and expect feedback when reaching the context limit in order to compress or manage it at the application layer.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12519 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 10/7/2025 **Status:** ✅ Merged **Merged:** 10/9/2025 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `jmorganca/truncate` --- ### 📝 Commits (8) - [`729bb18`](https://github.com/ollama/ollama/commit/729bb18f27400ec010b40589c9cf106d9a4a6683) add quiet truncate and shift parameters - [`684f4c6`](https://github.com/ollama/ollama/commit/684f4c6d78a1b07547d14468ffcb60729f117aa4) handle errors properly - [`8adbc59`](https://github.com/ollama/ollama/commit/8adbc5933348ffc47aef62620f81a2a5170a6715) fix chatPrompt changes - [`713d729`](https://github.com/ollama/ollama/commit/713d729317eeee874a56f619635319734849d418) add tests for new error returns - [`28cdebc`](https://github.com/ollama/ollama/commit/28cdebc84977967e3e6836eee3c19f4bb06ef719) gofumpt - [`b72fd22`](https://github.com/ollama/ollama/commit/b72fd226a74d61e5c0cc781fa46a3f6fd23b0ffd) update shifting logic - [`8506571`](https://github.com/ollama/ollama/commit/85065710c1b4f10c8e132607057e569cc35d5255) address comments - [`e484ab7`](https://github.com/ollama/ollama/commit/e484ab7111f5d01627e2adb2e7e3ff702d893f6c) trim error message ### 📊 Changes **8 files changed** (+272 additions, -67 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+16 -0) 📝 `llm/server.go` (+4 -2) 📝 `runner/llamarunner/runner.go` (+22 -0) 📝 `runner/ollamarunner/runner.go` (+25 -0) 📝 `server/prompt.go` (+2 -2) 📝 `server/prompt_test.go` (+64 -38) 📝 `server/routes.go` (+38 -25) 📝 `server/routes_generate_test.go` (+101 -0) </details> ### 📄 Description This adds two new optional fields to the API that should, for now, go mostly unused except for the OpenAI compatibility middleware: 1. `truncate`: whether to automatically truncate the prompt (by truncating the list of messages, or a single message). If `false`, the model will return a 400 error with a descriptive message that the input length exceeds the context 2. `shift`: whether to shift the context when reaching the context limit. If `false`, the model will instead return `"done_reason": "limit"`. Both fields default to `true` if unset, as this is the behavior in Ollama's API today. However they are named/designed such that they could eventually default to false, which is the behavior of other APIs (e.g. OpenAI, Anthropic) The plan is to make the OpenAI compatible API disable these fields by default (in a follow up change), and at some point in the future, disable them for Ollama's API as well. Motivation: modern applications (e.g. codex) need and expect feedback when reaching the context limit in order to compress or manage it at the application layer. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 00:47:22 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#45104