mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #715] feat: Improvments to Collections RAG #27718
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jannikstdl on GitHub (Feb 12, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/715
Describe the solution you'd like
Collecting data from a wide range on docs and giving relevant info to a LLM is one of the most common ways to use RAG.
There is 1 point wich would improve the experience using RAG over collections:
For Collections it would be good to identify whehre the llm citates from to have a "source: " mentioned in the message. (Metadata like the name of the document is sored in the backend rag file)<- Already implemented@tjbck commented on GitHub (Mar 11, 2024):
Let's get the ball rolling for this. I'm open to suggestions!
@Therealkorris commented on GitHub (Mar 20, 2024):
I would love support for nougat in RAG docs :)
https://facebookresearch.github.io/nougat/
@jannikstdl commented on GitHub (Mar 20, 2024):
Maybe this could be wider improved. I get the idea behind nougat. Seems like the new "Llamaindex Parser" for academic formulas. But the normal text extraction in the classic way also has problems with tables for example.
I for now only know Llamaindex Parser but that seems commercial. Maybe there are other such parsers.
@slash-proc commented on GitHub (Mar 25, 2024):
@bozo32 commented on GitHub (Mar 29, 2024):
is there a way to iterate through rather than aggregate across a collections? where we can have the LLM answer by data source? This may be useful for comparison between data sources within a collection.
Is it possible to make collections conditional on a prior answer? e.g. For data sources meeting requirement (condition)...then treat as a collection.
@Entaigner commented on GitHub (Apr 14, 2024):
Is anyone else currently working on source references in the RAG system?
If not, i'd give it a shot.
@bozo32 commented on GitHub (Apr 14, 2024):
notme
but, for your info
there is a bunch of stuff in command-r that would be phenomenal to support
https://github.com/cohere-ai/notebooks/blob/main/notebooks/Vanilla_RAG.ipynb
they seem to have tweaked their embedding model to fit with the LLM...so something that brought that in as a package may be wise
and, going pretty nuts, I found this post on reddit:
https://www.reddit.com/r/LocalLLaMA/comments/1bsfsc1/comment/kxlllt1/
@yousecjoe commented on GitHub (Apr 16, 2024):
I'm eager to help work on RAG sources. Many of my requirements for RAG and cybersecurity involve cited sources from the RAG context.
I found three significant factors controlling the type of response you get from the open-webui RAG pipeline.
RAG_TEMPLATE* can be used to fine-tune the response further.When working with cited sources, the first change I must make with open-webui is to choose a model that will include cited sources. Not all models will generate sources.
For example, in my experience, a Modelfile (Hub/Cyber-Security-Specialist:Latest) based on Mistral and fine-tuned for cybersecurity will generate sources automatically, but Mistral:latest or OpenChat:7b does not.
Source(s) included using default

RAG_TEMPLATEprompt and custom Modelfile from Ollama:Sources not included using default

RAG_TEMPLATEprompt and Openchat:7b-V3.5-0106-Fp16:*Specifically, this line can be problematic when working with cited sources in the RAG pipeline.
L437 in
backend/config.py"Avoid mentioning that you obtained the information from the context."The expected result is that the response uses the context and includes cited sources with relevancy.
So, you can get cited sources that are relevant today in open-webui, but it helps to use a specific Modelfile which is fine-tuned to the contextual data.
@yousecjoe commented on GitHub (Apr 16, 2024):
Modifying the
RAG_TEMPLATEto include all my cited source requirements has significantly improved responses and relevant cited sources. Some helpful hints are to describe how you want the LLM to provide the sources, how many sources, and what details to include about the sources. I can provide more screenshots if you want to see it in practice.If you are not specific enough and use the wrong model, you can ask for APA formatted citations all you want, but the LLM will be unable to complete the task for "reasons" like the LLM does not have the ability to navigate or identify specific pages within the provided context. This ties back to a good embedding strategy.
@iMeany commented on GitHub (Apr 22, 2024):
Hi @yousecjoe,
I'm also currently experimenting with modifying the RAG template to include sources, but it doesn't always work, I agree that model choice seems to be the biggest factor (probably how the instruct is fine-tuned).
I've found that sometimes asking for citations (and then manually looking up the document) works better than just listing the sources. Removing the mentioned line also seems to be mandatory to do this, so as not to contradict the prompt.
Maybe you can share your RAG template if you have also spent some time on this?
It would be a very valuable feature to have this as an option, as I believe there are somewhat standard ways how to return sources (at least with langchain), I see this mentioned in other issues/feat requests and I might help with the implementation once I have some time.
@Entaigner commented on GitHub (Apr 22, 2024):
I may have misunderstood the assignment.
What i want is to show the document chunks used by the RAG query in the LLM response.


And i'm still figuring out how to implement everything.
However, if any of you'd like some new replacement variables for the RAG template i'm game.
Maybe [context_json] with metadata added to context like filename or page number.
Should make your task a little easier.
@bozo32 commented on GitHub (Apr 22, 2024):
adding it to the RAG template would be good.
seems that the parameter to pull context/quotations may vary by LLM (e.g. command-r has their own special approach)
this may need to be inserted by us manually somewhere in the flow it placing it in the JSON does not work...then we'll just have to be competent.
silly question, perhaps
is there an easy way to get RAG queries to work by document as well as across?...so that the process iterates through a set of documents and reports (preferably JSON) by document?
@yousecjoe commented on GitHub (Apr 26, 2024):
You didn't misunderstand. What you are doing is related to this topic. I like your approach. I add chunking strategy details to troubleshooting and evaluating the RAG pipeline performance. You're seeing the data that is crucial to us all in understanding how the response is augmented and generated, including any automatically generated response context citations.
I see at least two approaches here. One is to pull the chunk data and display it in the response, the other is to have the LLM automatically generate the citations using improvements to the RAG template. I want to explore both, but I believe displaying the chunked data and having that be formatted by the LLM into an APA style citation is the best way forward.
Background
When a response is augmented and generated, some models automatically create context citations from the RAG context. (See my previous comment for an example.) That generated response involves the chunks your solution identifies. It's helpful how your response displays the source document and page number. That's not APA format, but that's essentially what many are looking for.
@yousecjoe commented on GitHub (Apr 26, 2024):
Excellent point about how this feature will depend on the LLM being used. I was originally misguided by thinking all models generated properly formatted citations. Now, I know many do not. I'm not sure I understand the last question.
@bozo32 commented on GitHub (Apr 27, 2024):
The last comment about reporting by document or source versus across documents or source. Sometimes when we’re doing rag or question answer with documents, we actually don’t want to pull across a bunch of sources. What we’re trying to do is find out whether or not entities exist in each of the documents. For that use case what we want to do is report results by source. This is pretty easy to do with JSON format, I think. In the academic world, a classic would be which of the following articles contain clearly stated research questions.
Sent from Outlook for iOShttps://aka.ms/o0ukef
From: Joseph Young @.>
Sent: Friday, April 26, 2024 11:16:56 PM
To: open-webui/open-webui @.>
Cc: Tamas, Peter @.>; Comment @.>
Subject: Re: [open-webui/open-webui] feat: Improvments to Collections RAG (Issue #715)
adding it to the RAG template would be good. seems that the parameter to pull context/quotations may vary by LLM (e.g. command-r has their own special approach) this may need to be inserted by us manually somewhere in the flow it placing it in the JSON does not work...then we'll just have to be competent.
silly question, perhaps is there an easy way to get RAG queries to work by document as well as across?...so that the process iterates through a set of documents and reports (preferably JSON) by document?
Excellent point about how this feature will depend on the LLM being used. I was originally misguided by thinking all models generated properly formatted citations. Now, I know many do not. I'm not sure I understand the last question.
—
Reply to this email directly, view it on GitHubhttps://github.com/open-webui/open-webui/issues/715#issuecomment-2080129158, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYKOUNMYQP3YO3NBZVVWQQ3Y7K74RAVCNFSM6AAAAABDEP5LFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBQGEZDSMJVHA.
You are receiving this because you commented.Message ID: @.***>
@iMeany commented on GitHub (Apr 27, 2024):
I agree with @yousecjoe, but I think it's even more complicated than that (more than two approaches).
Models fine-tuned with sources
Some models do return formatted sources (this also depends on how instruct is fine-tuned, how the RAG is built, how context is passed, and what is in the RAG prompt template), but in a general sense, the context could just be a bunch of text with no clear way for the model to reference it other than repeating it (especially since how context is passed is often unknown to the user).
AFAIK a popular approach is adding some meta information to the document chunks when embedding and passing context (document name, page number, etc.) and asking the model to return it. This depends on LLM and template/implementation, so it is not easy to implement for every configuration and wouldn't work as an overall solution without adding all of that as instruction/formatting per LLM, but there might be some improvements that generally help to reference information.
Returning similar documents/chunks
What is somewhat universal and could have an approach that works with any LLM in a RAG scenario would be to use some similarity metric and return a list of sources that are similar enough (have to use some threshold) to LLM's answer. Not sure how it's built, but AnythingLLM returns some sources and I don't think its prompt/model itself.
The naive way would be to take LLMs answer (maybe together with the question/prompt), use cosine/euclidian distance lookup to text chunks, return top N if it's above threshold and have a standard way how to add that to the end of the message.
One downside to this is that in theory, it could have some edge cases where it doesn't have to match LLM's response or what information it actually used (maybe information got lost from super long context or base models bias is too strong/prompt is bad and it didn't answer using only context etc.), or in @bozo32 mentioned case where we are looking if information is in context it would still return something that's closest, then it's depending on threshold.
@yousecjoe commented on GitHub (May 1, 2024):
Excellent points!
I have been thinking a lot about this topic and found this medium article, which I added to our discord for discussion there, too.
https://medium.com/@codegpt/evaluation-of-an-rag-system-the-saga-5cea918fbc47
The article is very good at explaining RAG evaluation. I have found this research lacking in our space. I want to see more comparisons of all these experimental RAG middleware solutions.
Here is my brain dump:
I am curious to know what the end goal is for open-webui RAG. I am starting to see some projects stand out for some particular RAG features.
I now have a Docker container per service, following a micro-service architecture. This works for me and mirrors existing architectures.
My open-webui hands off embedding to an Ollama container dedicated to embedding, then another one dedicated to LLM, and third, fourth, or more containers for specific RAG features, vector databases, etc. Not many projects include this level of control today, and IMHO, it is the way to go for a professional solution. This level of control should be different from what a consumer solution exposes.
This aligns with open-webui, focusing on being the best gateway to services while not trying to provide the best specific RAG PDF solution or other specific RAG features.
Is it better to have RAG, but it's not the best? Or to say we have a simple (naive) RAG pipeline, but we recommended XYZ partner solution for ABC use case.
What do you all feel when you think about this?
Until the killer RAG app is developed, all these generative AI front ends must have multiple RAG solutions to choose from. Through hands-on experience, the end user learns quickly; there is no silver bullet one-size-fits-all solution for RAG yet.
I keep thinking there are so many variables any user should have control over in the RAG pipeline that I need to create a product requirements document for RAG.
@bozo32 commented on GitHub (May 2, 2024):
Hi
For people who are building very specific rag setups, something like flowise may be better.
These (langflow/flowise) allow granular control and a really idiot friendly way to see what is going on.
Perhaps create an ollama rag flow and make a link in the instructions to flowise?
No point in trying to do everything….
I’m currently running both beside openwebui
-peter
From: Joseph Young @.>
Reply to: open-webui/open-webui @.>
Date: Wednesday, 1 May 2024 at 14:43
To: open-webui/open-webui @.>
Cc: peter tamas @.>, Mention @.***>
Subject: Re: [open-webui/open-webui] feat: Improvments to Collections RAG (Issue #715)
Excellent points!
I have been thinking a lot about this topic and found this medium article, which I added to our discord for discussion there, too.
@.***/evaluation-of-an-rag-system-the-saga-5cea918fbc47
The article is very good at explaining RAG evaluation. I have found this research lacking in our space. I want to see more comparisons of all these experimental RAG middleware solutions.
Here is my brain dump:
I am curious to know what the end goal is for open-webui RAG. I am starting to see some projects stand out for some particular RAG features.
I now have a Docker container per service, following a micro-service architecture. This works for me and mirrors existing architectures.
My open-webui hands off embedding to an Ollama container dedicated to embedding, then another one dedicated to LLM, and third, fourth, or more containers for specific RAG features, vector databases, etc. Not many projects include this level of control today, and IMHO, it is the way to go for a professional solution. This level of control should be different from what a consumer solution exposes.
This aligns with open-webui, focusing on being the best gateway to services while not trying to provide the best specific RAG PDF solution or other specific RAG features.
Is it better to have RAG, but it's not the best? Or to say we have a simple (naive) RAG pipeline, but we recommended XYZ partner solution for ABC use case.
What do you all feel when you think about this?
Until the killer RAG app is developed, all these generative AI front ends must have multiple RAG solutions to choose from. Through hands-on experience, the end user learns quickly; there is no silver bullet one-size-fits-all solution for RAG yet.
I keep thinking there are so many variables any user should have control over in the RAG pipeline that I need to create a product requirements document for RAG.
—
Reply to this email directly, view it on GitHubhttps://github.com/open-webui/open-webui/issues/715#issuecomment-2088411938, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYKOUNNBATAHYMYIF573B7LZADPNNAVCNFSM6AAAAABDEP5LFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGQYTCOJTHA.
You are receiving this because you were mentioned.Message ID: @.***>
@MarlNox commented on GitHub (May 8, 2024):
ANy guide on integrating OpenWebui with Flowise?
@bozo32 commented on GitHub (May 8, 2024):
There is no guide for integration with openwebui nor should there be?....both flowise and langflow allow you to drop quite a number of different LLMs into the flows. One option, both for chat and embedding, is ollama. I use openwebui to manage ollama...and access ollama directly from langflow/flowise.
my comment was intended to suggest that, perhaps, openwebui should not try and duplicate their efforts. what they are doing here is great for starting up...but for folks whose needs are more demanding, perhaps we should move to another platform.
@justinh-rahb commented on GitHub (May 8, 2024):
I completely agree that it's crucial to utilize the tools that best suit individual needs. As developers, we're creating a solution that caters to our own requirements, which, fortunately, also resonates with a large following. If, however, our project falls short of addressing a specific requirement that another project excels in, then it's only logical to leverage that alternative solution. It's essential to recognize that attempting to cater to every individual's needs often results in a lack of satisfaction across the board. Our focus is on providing a tailored solution, rather than trying to be everything to everyone.
@yousecjoe commented on GitHub (May 12, 2024):
I also agree with this point of view, as well as the one from Justin.
It is increasingly difficult to compete with funded startups that are focused entirely on RAG features.
It is wise for this project to focus on interoperability and integration with those funded projects. This positions open-webui as the best "gateway" solution.
@Hugobox commented on GitHub (Jun 10, 2024):
Hi! I'd like to add a little feature request, it would be nice if URLs in the citation sources would be clickable.
Edit: For added context, we are using langchain to download confluence pages and for each page we get 1 line that contains page_content and metadata. In the metadata, there is the url for the original page. Metadata used to show up on the citation pop-up in version 0.2.4 but not systematically anymore in the v.0.3.1. Is there a way to get metadata back (with clickable links as a bonus) ?
@silentoplayz commented on GitHub (Jul 7, 2024):
Related - https://github.com/open-webui/open-webui/pull/3690
@tjbck commented on GitHub (Aug 22, 2024):
Closing in favour of #3527
@mdlmarkham commented on GitHub (Sep 12, 2024):
I like the use case of using OpenWebUI as an Ollama Router that provides LLM services to Flowise. I think that there are some other use cases that should be considered including:
Flowise is geared toward being the AI stack on the back-end of a website... it's end-user UI functionality seems to be targeted at testing and making it easy to embed the Flow into a website. https://docs.flowiseai.com/using-flowise/embed
@BigFoxMedia commented on GitHub (Jan 16, 2025):
Anybody has any suggestion on how to implement openWebUi as the chat interface in front of the FlowiseAI chats? I think what openWebUI excels at is not just a gateway to 'simple' RAG platforms, but as a generalized UI for chats in front of more flexible agentic frameworks like Flowise. Hope you guys will reconsider :)
@qdrddr commented on GitHub (Jul 23, 2025):
I think the current workaround is to consume LangFlow as an MCP Server with OpenWebUI