[GH-ISSUE #8386] Return in a response a flag if the input request was truncated #67439

Closed
opened 2026-05-04 10:20:42 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @MarkWard0110 on GitHub (Jan 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8386

Add to the response a flag that indicates true if Ollama truncated the input request.

As a developer, I would like the Ollama response to have a flag indicating that it truncated the input prompt so that I can initiate client-side behavior based on this information.

In some situations, the accuracy of the chat response depends upon processing the complete input prompt. A client is not informed if the input was truncated.

A client may be unable to match Ollama's calculation for the context size required for a given input.

The truncation works well for chat conversations, but the user may experience a degradation in the quality of the conversation.
The truncation might negatively affect an agent. An agent not interacting with a user might receive a longer-than-expected response in a previous iteration. In the following chat request, Ollama truncates the input prompt. The agent would not be aware the request was truncated. The agent may try to estimate but may not be able to match Ollama's calculation.

It also appears that in certain situations, the routes.go chatPrompt call will not truncate the input prompt, but the runner will. I saw this when I had put the context request size one less than the reported prompt eval for the same request. For example, a chat request that the prompt eval is 13. I sent the same request but with a context size of 12. The eval is 55. So, the runner also hit the context limit.

Originally created by @MarkWard0110 on GitHub (Jan 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8386 Add to the response a flag that indicates true if Ollama truncated the input request. As a developer, I would like the Ollama response to have a flag indicating that it truncated the input prompt so that I can initiate client-side behavior based on this information. In some situations, the accuracy of the chat response depends upon processing the complete input prompt. A client is not informed if the input was truncated. A client may be unable to match Ollama's calculation for the context size required for a given input. The truncation works well for chat conversations, but the user may experience a degradation in the quality of the conversation. The truncation might negatively affect an agent. An agent not interacting with a user might receive a longer-than-expected response in a previous iteration. In the following chat request, Ollama truncates the input prompt. The agent would not be aware the request was truncated. The agent may try to estimate but may not be able to match Ollama's calculation. It also appears that in certain situations, the routes.go chatPrompt call will not truncate the input prompt, but the runner will. I saw this when I had put the context request size one less than the reported prompt eval for the same request. For example, a chat request that the prompt eval is 13. I sent the same request but with a context size of 12. The eval is 55. So, the runner also hit the context limit.
GiteaMirror added the feature request label 2026-05-04 10:20:42 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 12, 2025):

#7043, #3839, #1005, #4967

<!-- gh-comment-id:2585535204 --> @rick-github commented on GitHub (Jan 12, 2025): #7043, #3839, #1005, #4967
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67439