[GH-ISSUE #23606] feat: Option to auto truncate chat context when size doesn't fit model (not Ollama) #20027

Closed
opened 2026-04-20 02:36:39 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @xNissX233 on GitHub (Apr 11, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23606

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

When the chat grows too much in context, Open-WebUI displays an error if the context size is bigger than configured in llama.cpp.
This is handled by Ollama but when not using Ollama there's no context setting for Open-WebUI to handle it itself.
This problem is extremely relevant because everyone using LLM for long enough in the same chat will find this issue at some point, and the only solution is starting a new chat or manually summarizing or using knowledge or notes tools or external plugins, which is not ideal.

Desired Solution you'd like

Adding a "max_ctx" parameter that copies the feature of "num_ctx" for Ollama, but handled by Open-WebUI.
This would simply send to llama.cpp as much context as specified, discarding the older messages that wouldn't fit.

Alternatives Considered

No response

Additional Context

No response

Originally created by @xNissX233 on GitHub (Apr 11, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23606 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description When the chat grows too much in context, Open-WebUI displays an error if the context size is bigger than configured in llama.cpp. This is handled by Ollama but when not using Ollama there's no context setting for Open-WebUI to handle it itself. This problem is extremely relevant because everyone using LLM for long enough in the same chat will find this issue at some point, and the only solution is starting a new chat or manually summarizing or using knowledge or notes tools or external plugins, which is not ideal. ### Desired Solution you'd like Adding a "max_ctx" parameter that copies the feature of "num_ctx" for Ollama, but handled by Open-WebUI. This would simply send to llama.cpp as much context as specified, discarding the older messages that wouldn't fit. ### Alternatives Considered _No response_ ### Additional Context _No response_
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#20027