Automatic RAG Bypass for Small Documents #1505

GiteaMirror commented

@jingyibo123 commented on GitHub (Jul 16, 2024):

+1 for optional direct content input into context.

Anyone has some thought on where to implement that?

@jingyibo123 commented on GitHub (Jul 16, 2024): +1 for optional direct content input into context. Anyone has some thought on where to implement that?

GiteaMirror commented

@PlebeiusGaragicus commented on GitHub (Jul 29, 2024):

I'm here wondering about documents myself... I'm not sure how it works in detail but I'm unimpressed so far.

A few points:
1 - users should be able to upload documents similar to what admins are allowed.
2 - it appears file uploads for chats utilize a system prompt, but long documents appear to be truncated.

I'm using Open WebUI mainly for the Pipelines - I'm not here for single-pass LLM usage - RAG is too prone to error that way, anyways.

Using a technique like Self-RAG greatly improves retrieval.

I think we should steer more in that direction. I like Open WebUI so far, but it seems too focused on being simply an Ollama frontend.

@PlebeiusGaragicus commented on GitHub (Jul 29, 2024): I'm here wondering about documents myself... I'm not sure how it works in detail but I'm unimpressed so far. A few points: 1 - users should be able to upload documents similar to what admins are allowed. 2 - it appears file uploads for chats utilize a system prompt, but long documents appear to be truncated. I'm using Open WebUI mainly for the Pipelines - I'm not here for single-pass LLM usage - RAG is too prone to error that way, anyways. Using a technique like [Self-RAG](https://arxiv.org/abs/2310.11511) greatly improves retrieval. I think we should steer more in that direction. I like Open WebUI so far, but it seems too focused on being simply an Ollama frontend.

GiteaMirror commented

@otorresXX commented on GitHub (Jul 30, 2024):

+1

@otorresXX commented on GitHub (Jul 30, 2024): +1

GiteaMirror commented

@hkosm commented on GitHub (Aug 8, 2024):

+1

Option 3 seems convenient (with character limit)

@hkosm commented on GitHub (Aug 8, 2024): +1 Option 3 seems convenient (with character limit)

GiteaMirror commented

@wggcch commented on GitHub (Aug 19, 2024):

This feature would be a valuable addition! It would also be highly advantageous to enable individual chunk size settings for RAG (Retrieval-Augmented Generation) based on the specific model being used. For instance, when working with different LLMs such as Gemma or LLaMA3 within Ollama, or when customizing an LLM with the context size parameter, having the flexibility to adjust the chunk size could greatly improve RAG’s performance and optimization for each model.

For example, if you examine the model files in Ollama, you’ll notice that LLaMA3.1:8b has a context length defined as llama.context_length = 131072, while Gemma2:9b has a context length of gemma2.context_length = 8192. This demonstrates the need for model-specific chunk size settings to maximize efficiency.

@wggcch commented on GitHub (Aug 19, 2024): This feature would be a valuable addition! It would also be highly advantageous to enable individual chunk size settings for RAG (Retrieval-Augmented Generation) based on the specific model being used. For instance, when working with different LLMs such as Gemma or LLaMA3 within Ollama, or when customizing an LLM with the context size parameter, having the flexibility to adjust the chunk size could greatly improve RAG’s performance and optimization for each model. For example, if you examine the model files in Ollama, you’ll notice that LLaMA3.1:8b has a context length defined as llama.context_length = 131072, while Gemma2:9b has a context length of gemma2.context_length = 8192. This demonstrates the need for model-specific chunk size settings to maximize efficiency.

GiteaMirror commented

@lijiajun1997 commented on GitHub (Aug 22, 2024):

@tjbck Have we considered optimizing the issue?

@lijiajun1997 commented on GitHub (Aug 22, 2024): @tjbck Have we considered optimizing the issue?

GiteaMirror commented

@tjbck commented on GitHub (Aug 22, 2024):

I'll take a look at this when I have more time but it's not a priority at the moment, I'm hoping to somewhat integrate this for #3527 but in the meantime you can literally just edit the chunk size to a much larger number (e.g. 15,000) from the admin settings, which would pretty much do the exact same thing.

@tjbck commented on GitHub (Aug 22, 2024): I'll take a look at this when I have more time but it's not a priority at the moment, I'm hoping to somewhat integrate this for #3527 but in the meantime you can literally just edit the chunk size to a much larger number (e.g. 15,000) from the admin settings, which would pretty much do the exact same thing. <img width="1047" alt="image" src="https://github.com/user-attachments/assets/8cbe6c1f-c3c8-4b40-828f-f0d182487671">

GiteaMirror commented

@hkosm commented on GitHub (Aug 22, 2024):

That does only help in some cases. E.g. when using PDFs (most common case i guess), the PyPDFLoader splits by page, regardless of the chunk size.
@tjbck Do you have a suggestion for an alternative loader? I would suggest something like unstructured, but that might be to heavy.

@hkosm commented on GitHub (Aug 22, 2024): That does only help in some cases. E.g. when using PDFs (most common case i guess), the [PyPDFLoader](https://github.com/open-webui/open-webui/blob/main/backend/apps/rag/main.py#L1120) splits by page, regardless of the chunk size. @tjbck Do you have a suggestion for an alternative loader? I would suggest something like unstructured, but that might be to heavy.

GiteaMirror commented

@PlebeiusGaragicus commented on GitHub (Aug 27, 2024):

I think we need a more mature solution. See this case study that uses the "Self-RAG" technique.

@PlebeiusGaragicus commented on GitHub (Aug 27, 2024): I think we need a more mature solution. See this [case study](https://arxiv.org/pdf/2408.05933) that uses the "Self-RAG" technique.

GiteaMirror commented

@tjbck commented on GitHub (Sep 6, 2024):

@hkosm Have you tried using tika as a content extraction engine? I've heard it's a lot more reliable than the built-in langchain document parser.

@tjbck commented on GitHub (Sep 6, 2024): @hkosm Have you tried using tika as a content extraction engine? I've heard it's a lot more reliable than the built-in langchain document parser.

GiteaMirror commented