mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #23618] feat: Warn users if context is too large for Model #58696
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TomTheWise on GitHub (Apr 12, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23618
Check Existing Issues
Verify Feature Scope
Problem Description
But if it is running into a cap for the model - for example you query results in 60K token but you only have set up llama.cpp / ollama for 40K you have the statistics under a message and you will clearly see that the tokens are capped.
The issue: Currently every day to day users won't know that their token count was capped and they reached its limit. Theys might be whondering why so many stuff was ignored / forgotten but they do not know.
For example here with llama.cpp (that model currently has a limit of 12K configured in llama.cpp) it is still within the limits whcih is why completeion was successfull, but its super close. The next answer will be cut off and users won't know why - no matter how often they try.

After reaching 12K, llama.cpp cut off and ends the chat completion:


Desired Solution you'd like
AFTER the LLM answered and it uses an API that supports that, it sends the statistics.
Ollama reports the statistics, llm.cpp and ik_llama.cpp (different format than Ollama but still) reports the statistics and I guess other API providers with their "flavors" of the APIs will do so too.
My desired solution would be so that PER MODEL two additional, optional parameter are introduced: statistics max ctx warning key, and the statistics max ctx warning value at which point, Open WebUI should warn the users that quality of the current chat will be deteriorated / bad because context limit has reached. Maybe as waring banner under the answer - NOT just temporary warning in the right top corner!
Such warnings are know from Cloud AI service like Gemini web page / app.
Alternatives Considered
Additional Context
No response
@Classic298 commented on GitHub (Apr 12, 2026):
Can be done with a filter
Duplicate