mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-10 15:54:15 -05:00
Automatic RAG Bypass for Small Documents #1505
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @lijiajun1997 on GitHub (Jul 12, 2024).
Is your feature request related to a problem? Please describe.
I'm always frustrated when the system automatically applies Retrieval-Augmented Generation (RAG) even when the uploaded file's token count is below a certain threshold. This is particularly problematic when users need to summarize documents, as RAG may not be the most suitable approach in such scenarios. Additionally, with the advent of models that support larger context windows (e.g., 32k, 128k, 1M tokens), the necessity for RAG diminishes. The use of embedding models also imposes an extra burden on users.
Describe the solution you'd like
I would like the system to bypass RAG when the uploaded file's token count is below a predefined threshold. Instead, the content of the file should be directly included in the conversation context. This would be more efficient and user-friendly, especially for scenarios involving document summarization or when the document size is relatively small.
Describe alternatives you've considered
Additional context
With the increasing context window sizes in modern models (e.g., 32k, 128k, 1M tokens), the necessity for RAG is reduced. Directly including the file content in the conversation context can simplify the user experience and reduce computational overhead. This feature would be particularly beneficial for users dealing with smaller documents or those requiring straightforward summarization tasks.
@jingyibo123 commented on GitHub (Jul 16, 2024):
+1 for optional direct content input into context.
Anyone has some thought on where to implement that?
@PlebeiusGaragicus commented on GitHub (Jul 29, 2024):
I'm here wondering about documents myself... I'm not sure how it works in detail but I'm unimpressed so far.
A few points:
1 - users should be able to upload documents similar to what admins are allowed.
2 - it appears file uploads for chats utilize a system prompt, but long documents appear to be truncated.
I'm using Open WebUI mainly for the Pipelines - I'm not here for single-pass LLM usage - RAG is too prone to error that way, anyways.
Using a technique like Self-RAG greatly improves retrieval.
I think we should steer more in that direction. I like Open WebUI so far, but it seems too focused on being simply an Ollama frontend.
@otorresXX commented on GitHub (Jul 30, 2024):
+1
@hkosm commented on GitHub (Aug 8, 2024):
+1
Option 3 seems convenient (with character limit)
@wggcch commented on GitHub (Aug 19, 2024):
This feature would be a valuable addition! It would also be highly advantageous to enable individual chunk size settings for RAG (Retrieval-Augmented Generation) based on the specific model being used. For instance, when working with different LLMs such as Gemma or LLaMA3 within Ollama, or when customizing an LLM with the context size parameter, having the flexibility to adjust the chunk size could greatly improve RAG’s performance and optimization for each model.
For example, if you examine the model files in Ollama, you’ll notice that LLaMA3.1:8b has a context length defined as llama.context_length = 131072, while Gemma2:9b has a context length of gemma2.context_length = 8192. This demonstrates the need for model-specific chunk size settings to maximize efficiency.
@lijiajun1997 commented on GitHub (Aug 22, 2024):
@tjbck Have we considered optimizing the issue?
@tjbck commented on GitHub (Aug 22, 2024):
I'll take a look at this when I have more time but it's not a priority at the moment, I'm hoping to somewhat integrate this for #3527 but in the meantime you can literally just edit the chunk size to a much larger number (e.g. 15,000) from the admin settings, which would pretty much do the exact same thing.
@hkosm commented on GitHub (Aug 22, 2024):
That does only help in some cases. E.g. when using PDFs (most common case i guess), the PyPDFLoader splits by page, regardless of the chunk size.
@tjbck Do you have a suggestion for an alternative loader? I would suggest something like unstructured, but that might be to heavy.
@PlebeiusGaragicus commented on GitHub (Aug 27, 2024):
I think we need a more mature solution. See this case study that uses the "Self-RAG" technique.
@tjbck commented on GitHub (Sep 6, 2024):
@hkosm Have you tried using tika as a content extraction engine? I've heard it's a lot more reliable than the built-in langchain document parser.
@tjbck commented on GitHub (Sep 21, 2024):
Closing in favour of #3129