mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-12 08:54:31 -05:00
[GH-ISSUE #23583] feat: Improve document loading without RAG for prompt caching #58685
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @arty-hlr on GitHub (Apr 10, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23583
Check Existing Issues
Verify Feature Scope
Problem Description
With "Bypass Embedding and Retrieval", it seems that the RAG template is still used, and with a system prompt (for example from a folder), the order is very inefficient for prompt caching:
Expected prompt/document order:
System prompt
RAG template
Document
User query
Assistant answer
User query
Assistant answer
Actual prompt/document order:
System prompt
User query
Assistant answer
RAG template
Document
User query
Assistant answer
Problem
This behavior unfortunately invalidates prompt caching repeatedly as you'd have:
System prompt
RAG template
Document
User query
Assistant answer
but then at the next turn:
System prompt
User query
Assistant answer
RAG template
Document
User query
Assistant answer
This is especially problematic when the document is big (50-100k tokens) as moving its place around invalidates the whole cachine.
It is also clear in the UI in the way the document is retrieved:
It shows "Retrieved 1 source" before each assistant response. This makes sense when actually using RAG and dynamically getting chunks based on the user's query, but with "Bypass Embedding and Retrieval" mode the whole document is supposed to be included statically, so it shouldn't be added before each new user query.
Desired Solution you'd like
Have "Bypass Embedding and Retrieval" put the RAG template and the document at the place where the user asks for it. For example if the user adds the document in the first message, put the RAG template and document after the system prompt and before the user query. If the user adds a document in a later turn, put the RAG template and the document before that query. The main point would be not to add the document in "Bypass Embedding and Retrieval" before each user query, but leave it at the point where it was added.
Alternatives Considered
No alternative possible afaik.
Additional Context
No response
@arty-hlr commented on GitHub (Apr 11, 2026):
Based on https://docs.openwebui.com/troubleshooting/performance/ this should be fixed by setting RAG_SYSTEM_PROMPT which fixes the issue for prompt caching. This should definitely be default when "Bypass embedding and retrieval" is enabled though!