mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #12292] feat: Sliding window context to handle long contexts #32066
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @AlbertoSinigaglia on GitHub (Apr 1, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12292
Check Existing Issues
Problem Description
I've discussed in this issue https://github.com/ollama/ollama/issues/9890 the fact that long context completely breaks the usability of models, due to the need of preloading the whole context.
Desired Solution you'd like
One of the founders/maintainers of Ollama intelligently suggested a sliding window approach on the client side: https://github.com/ollama/ollama/issues/9890#issuecomment-2740319483.
I think this is extremely interesting for OpenWeb UI to allow model-wide to preload a minimal context length and then increase it if the number of tokens in the chat gets close to it.
Alternatives Considered
No response
Additional Context
Ideally, with the introduction of the "Bypass Embedding and Retrieval" option, it's almost fundamental to have 128k as context length. Let alone the 1M token length served by Google with Gemini.
@Classic298 commented on GitHub (Apr 1, 2025):
should be implemented with filters
There are already some example filters available which you can use
https://openwebui.com/f/hub/context_clip_filter
https://openwebui.com/f/houxin/token_clip_filter
Here the search query: https://openwebui.com/functions?query=context
@AlbertoSinigaglia commented on GitHub (Apr 1, 2025):
These clips the history to fit in the context window, instead i'm looking at the opposite, which is gradually increasing the context size of the model to fit the chat, instead of allocating a 128k token KV cache in ollama for a "hi there my dear LLM"
@Classic298 commented on GitHub (Apr 1, 2025):
Ah. My bad I misunderstood the request then!
@AlbertoSinigaglia commented on GitHub (Apr 1, 2025):
No problem! Still nice functions to have