[GH-ISSUE #14259] Chat history and embedding truncation happens silently with no user-visible indication #35045

Open
opened 2026-04-22 19:12:09 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @akuligowski9 on GitHub (Feb 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14259

Problem

When a conversation's token count exceeds the model's context length, Ollama silently drops older messages from the front of the conversation. The only log is at slog.Debug level in server/prompt.go line 73:

slog.Debug("truncating input messages which exceed context length", "truncated", len(msgs[currMsgIdx:]))

This means:

  • Users sending messages via the API receive no indication that their conversation history was truncated
  • The truncate parameter defaults to true, so this is the default behavior
  • There is no field in the API response indicating that truncation occurred
  • Users only discover this when the model "forgets" earlier context, leading to confusion

Similarly, the /api/embed endpoint silently truncates input that exceeds context length when truncate=true (the default).

Why this matters

Context length is one of the most confusing aspects of local LLM usage. Users expect models to maintain full conversation history, and when messages silently disappear, they often attribute the behavior to model quality rather than truncation.

Searching GitHub issues for context-related confusion returns 400+ results. Example issues:

  • #4967 — API silently truncates conversation
  • #3839 — Feature Request: Detect truncation due to exceeding context size
  • #2208 — Feature: API error response in case of exceeding context length
  • #14173 — Ollama 0.15.6 ignores requested context size
  • #14116 — Tiered context length can exhaust VRAM
  • #12407 — Long num_ctx values cause very slow output and timeouts
  • #6286 — Context window size cannot be changed
  • #12474 — num_ctx incorrect description in documentation

Proposed change

  1. Upgrade the truncation log in server/prompt.go from slog.Debug to slog.Warn so it appears in normal server logs
  2. Add the number of messages dropped and the active context length to the log message
  3. Add a similar slog.Warn in the embed endpoint when input is silently truncated

This is a logging-only change — no API surface area change, no behavior change.

How it will be tested

  • Existing tests continue to pass (no behavior change)
  • Manual verification: send a long conversation that exceeds context length and confirm the warning appears in server logs
Originally created by @akuligowski9 on GitHub (Feb 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14259 ## Problem When a conversation's token count exceeds the model's context length, Ollama silently drops older messages from the front of the conversation. The only log is at `slog.Debug` level in `server/prompt.go` line 73: ```go slog.Debug("truncating input messages which exceed context length", "truncated", len(msgs[currMsgIdx:])) ``` This means: - Users sending messages via the API receive no indication that their conversation history was truncated - The `truncate` parameter defaults to `true`, so this is the default behavior - There is no field in the API response indicating that truncation occurred - Users only discover this when the model "forgets" earlier context, leading to confusion Similarly, the `/api/embed` endpoint silently truncates input that exceeds context length when `truncate=true` (the default). ## Why this matters Context length is one of the most confusing aspects of local LLM usage. Users expect models to maintain full conversation history, and when messages silently disappear, they often attribute the behavior to model quality rather than truncation. Searching GitHub issues for context-related confusion returns 400+ results. Example issues: - #4967 — API silently truncates conversation - #3839 — Feature Request: Detect truncation due to exceeding context size - #2208 — Feature: API error response in case of exceeding context length - #14173 — Ollama 0.15.6 ignores requested context size - #14116 — Tiered context length can exhaust VRAM - #12407 — Long num_ctx values cause very slow output and timeouts - #6286 — Context window size cannot be changed - #12474 — num_ctx incorrect description in documentation ## Proposed change 1. Upgrade the truncation log in `server/prompt.go` from `slog.Debug` to `slog.Warn` so it appears in normal server logs 2. Add the number of messages dropped and the active context length to the log message 3. Add a similar `slog.Warn` in the embed endpoint when input is silently truncated This is a logging-only change — no API surface area change, no behavior change. ## How it will be tested - Existing tests continue to pass (no behavior change) - Manual verification: send a long conversation that exceeds context length and confirm the warning appears in server logs
GiteaMirror added the documentation label 2026-04-22 19:12:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35045