mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-09 23:35:09 -05:00
feat: RAG support #12
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tjbck on GitHub (Oct 25, 2023).
Originally assigned to: @tjbck on GitHub.
@JDRay42 commented on GitHub (Nov 6, 2023):
Document parsing should (probably) be implemented as an external service that can be called (Apache Tika, etc.). Similarly the RAG workflow, which involves a lot of "knowns" (vectorizer, vector store, block size, etc.). But having support for initiating those functions in the UI would be nice.
@justinh-rahb commented on GitHub (Jan 7, 2024):
YES! HOLY GRAIL DIVINE TRINITY ACHIEVED! We've got external APIs, we've got local AI, and now RAG 🚀

@justinh-rahb commented on GitHub (Jan 7, 2024):
Doesn't seem to work with .docx files, is that a known issue?
@tjbck commented on GitHub (Jan 7, 2024):
@justinh-rahb Added support for.docx file with #418, Try it out!
@justinh-rahb commented on GitHub (Jan 7, 2024):
Will do as soon as the docker image finishes building 👌
@justinh-rahb commented on GitHub (Jan 7, 2024):
.docx is working now, excellent work! There seems to be an issue with starting a new chat and uploading a document that has already been uploaded to another chat, it doesn't complete.
@tjbck commented on GitHub (Jan 7, 2024):
@justinh-rahb If you just updated the container, it might take a while to download the embedding model weights. You might also want to clear your browser cache, Let me know if the issue persists!
@justinh-rahb commented on GitHub (Jan 7, 2024):
Ah yes you may be right, seems to be working now several minutes later. Will need to note that for next time I re-pull the image.
EDIT: Is there a directory that can be persistent volume mounted to prevent redownloading of weights?
@justinh-rahb commented on GitHub (Jan 8, 2024):
Congratulations are in order Timothy, you really knocked this one out of the park. I can say without hint of exaggeration that this is the best and easiest local AI interface going right now for chat and now RAG too. I've tried a lot of them, I've even done my own forks of Chatbot-UI (who hasn't at this point), and none of them measure up. I don't know this is quite production ready for most folks yet, but it's good enough for me and my small team to hammer away on to see what comes loose 👍
@tjbck commented on GitHub (Jan 8, 2024):
@justinh-rahb Thank you so much for your kind words, I'm glad you and your team's finding this project useful, it means a lot to me! I would also like to take this opportunity to thank you for being a part of our journey by helping me with the troubleshooting at all. We're just getting started, and there will be a lot more to come! Stay tuned for more updates!🌟
Also! FYI, the docker image has been update to have the weights baked in to the container, so you won't have to redownload every time you update.
@jukofyork commented on GitHub (Jan 12, 2024):
Yeah, just trying this now and it seems really useful!
@gitmybox commented on GitHub (Jan 13, 2024):
I have tried many others local ChatPDF, none of them is close to Ollama Web UI. The GUI is intuitive to navigate & the upload documents (image, pdf, doc) process is reliable. Great effort & very well done.
As a suggestion, for those text-based document uploaded, it will be good to include the source of the extracted sentences, lines, pages etc. in the chat reply. I believe this can be obtained from the meta data during the text-splitting & embedding process. This extra info can be display as another icon beside the current existing icons for each message
@gitmybox commented on GitHub (Jan 13, 2024):
Another possible different scenario regarding document upload is audio-based data. There is use case that user can upload audio file recorded during meeting / class lesson / interview etc. It is very useful to chat with this type of voice converted text seamlessly without searching up & down the recording.
@oliverbob commented on GitHub (Jan 13, 2024):
@tjbck thanks for the addition of the document section mate. Could we probably add a functionality to hide private docs from users (or is this the default behavior)? Just pulled the most recent update. Tried it only in the admin account.
@justinh-rahb commented on GitHub (Jan 13, 2024):
@oliverbob The Documents button in the sidebar is only available to admins. I believe the intention here is that you can upload some documents that are globally available to all users, and anything a user drops into their chat box themselves is only for them. @tjbck can correct me if I've gotten any of the above wrong.
@tjbck commented on GitHub (Jan 14, 2024):
@gitmybox sounds like a great idea, will try to take a look and see what can be done!
@oliverbob @justinh-rahb's explanation is on point, the documents page is intended to be used by admins to add documents to make them globally available to all users. Users with 'user' role should still be able to drag and drop their file to use RAG feature, it just won't be available to all users like the documents in the documents page. Hope that clarify things a bit.
@oliverbob commented on GitHub (Jan 15, 2024):
Thank for the clarification.
@gitmybox commented on GitHub (Jan 17, 2024):
I have noticed that Ollama Web-UI is using CPU to embed the pdf document while the chat conversation is using GPU, if there is one in system.
However, I did some testing in the past using PrivateGPT, I remember both pdf embedding & chat is using GPU, if there is one in system. But the embedding performance is very very slooow in PrivateGPT.
My understanding is by right GPU is better than CPU to perform any LLM tasks, particularly embedding. But Ollama Web-UI is using CPU instead with very good result. That puzzle me....
Appreciate if anyone can share some insights on whether embedding using CPU or GPU is better for speed ?
@justinh-rahb commented on GitHub (Jan 17, 2024):
Embedding models are relatively lightweight, and the performance enhancement from running them on a GPU is often minimal for smaller ones. In some cases, they may even run more slowly on a GPU than on a CPU due to the overhead of shuffling data around.
Given this, it seems reasonable that Ollama WebUI does not currently support GPU acceleration for text-embedding; implementing such a feature would likely require no small amount of effort for what might only benefit a limited number of users, and questionably at that.
However, if someone were to submit a well-crafted pull request demonstrating reliable GPU support for the Docker image on compatible systems, it would undoubtedly receive serious consideration.
@gitmybox commented on GitHub (Jan 17, 2024):
@justinh-rahb thanks for the explanation. You mentioned overhead shuffling data around that trigger me to think about the memory. My system GPU has 8GB VRAM but CPU has 32GB RAM (4 times more). That could explains why CPU embedding performance is so much out-perform GPU in my use-case. It is perfectly fine that Ollama Web-UI use CPU embedding for me.
@amirvenus commented on GitHub (Apr 1, 2024):
While this has been a really great feature and I really appreciate it, I find the manual document upload rather tedious.
I was wondering if admins could make available a vector database in which they have already stored [large] embeddings?
Thanks
@araffin commented on GitHub (Jul 26, 2024):
@tjbck for clarification, is there any way currently to have full document parsing/text upload? (as suggested in https://github.com/open-webui/open-webui/issues/60, the same way it is done in https://huggingface.co/chat/).
What I mean is that currently it is a RAG system, so the document is parsed, embedded, and then chunks are retrieved.
With models with longer context (e.g. llama 3.1) it would be nice to be able to give the whole document as input (and not just chunks, so just parse and give as input).
As a concrete example, it would be nice to upload a PDF and ask for a summary section by section.
With the current system, it only works partially as only part of the document is retrieved.
@I-I-IT commented on GitHub (Dec 22, 2024):
Hello, how can I use this with the standard command-line Ollama