mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
Postfix title/tag query instead of prefix to avoid trashing KV cache #2548
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @robertvazan on GitHub (Nov 3, 2024).
Feature Request
Is your feature request related to a problem? Please describe.
Given some example chat:
Title is generated in a new chat that looks like this:
The same is done for tags. This forces language models to process the conversation three times even though the conversation can be very long, for example when the model was asked to summarize an article. Since the original content of the KV cache is destroyed in the process, any follow-up question will trigger fourth reprocessing of the conversation.
This is badly slowing things down in local models, which maintain dedicated KV cache for the user specifically to speed up prompt processing. Even in cloud models, this can increase cost, at least with OpenAI, which charges less for recycled context (cached input tokens).
Describe the solution you'd like
Just reformat the title query as follows:
Then discard the last two messages and repeat with tags, then discard the last two messages again and resume normal conversation.
Describe alternatives you've considered
Workarounds: