feat: RAG support #12

Closed
opened 2025-11-11 14:01:53 -06:00 by GiteaMirror · 23 comments
Owner

Originally created by @tjbck on GitHub (Oct 25, 2023).

Originally assigned to: @tjbck on GitHub.

Originally created by @tjbck on GitHub (Oct 25, 2023). Originally assigned to: @tjbck on GitHub.
Author
Owner

@JDRay42 commented on GitHub (Nov 6, 2023):

Document parsing should (probably) be implemented as an external service that can be called (Apache Tika, etc.). Similarly the RAG workflow, which involves a lot of "knowns" (vectorizer, vector store, block size, etc.). But having support for initiating those functions in the UI would be nice.

@JDRay42 commented on GitHub (Nov 6, 2023): Document parsing should (probably) be implemented as an external service that can be called (Apache Tika, etc.). Similarly the RAG workflow, which involves a lot of "knowns" (vectorizer, vector store, block size, etc.). But having support for initiating those functions in the UI would be nice.
Author
Owner

@justinh-rahb commented on GitHub (Jan 7, 2024):

YES! HOLY GRAIL DIVINE TRINITY ACHIEVED! We've got external APIs, we've got local AI, and now RAG 🚀
Screenshot 2024-01-07 at 2 46 34 PM

@justinh-rahb commented on GitHub (Jan 7, 2024): YES! HOLY GRAIL DIVINE TRINITY ACHIEVED! We've got external APIs, we've got local AI, and now RAG 🚀 <img width="785" alt="Screenshot 2024-01-07 at 2 46 34 PM" src="https://github.com/ollama-webui/ollama-webui/assets/52832301/1a899b40-d485-4577-b06b-1520acd375c4">
Author
Owner

@justinh-rahb commented on GitHub (Jan 7, 2024):

Doesn't seem to work with .docx files, is that a known issue?

@justinh-rahb commented on GitHub (Jan 7, 2024): Doesn't seem to work with .docx files, is that a known issue?
Author
Owner

@tjbck commented on GitHub (Jan 7, 2024):

@justinh-rahb Added support for.docx file with #418, Try it out!

@tjbck commented on GitHub (Jan 7, 2024): @justinh-rahb Added support for.docx file with #418, Try it out!
Author
Owner

@justinh-rahb commented on GitHub (Jan 7, 2024):

@justinh-rahb Added support for.docx file with #418, Try it out!

Will do as soon as the docker image finishes building 👌

@justinh-rahb commented on GitHub (Jan 7, 2024): > @justinh-rahb Added support for.docx file with #418, Try it out! Will do as soon as the docker image finishes building 👌
Author
Owner

@justinh-rahb commented on GitHub (Jan 7, 2024):

.docx is working now, excellent work! There seems to be an issue with starting a new chat and uploading a document that has already been uploaded to another chat, it doesn't complete.

@justinh-rahb commented on GitHub (Jan 7, 2024): .docx is working now, excellent work! There seems to be an issue with starting a new chat and uploading a document that has already been uploaded to another chat, it doesn't complete.
Author
Owner

@tjbck commented on GitHub (Jan 7, 2024):

@justinh-rahb If you just updated the container, it might take a while to download the embedding model weights. You might also want to clear your browser cache, Let me know if the issue persists!

@tjbck commented on GitHub (Jan 7, 2024): @justinh-rahb If you just updated the container, it might take a while to download the embedding model weights. You might also want to clear your browser cache, Let me know if the issue persists!
Author
Owner

@justinh-rahb commented on GitHub (Jan 7, 2024):

@justinh-rahb If you just updated the container, it might take a while to download the embedding model weights. You might also want to clear your browser cache, Let me know if the issue persists!

Ah yes you may be right, seems to be working now several minutes later. Will need to note that for next time I re-pull the image.

EDIT: Is there a directory that can be persistent volume mounted to prevent redownloading of weights?

@justinh-rahb commented on GitHub (Jan 7, 2024): > @justinh-rahb If you just updated the container, it might take a while to download the embedding model weights. You might also want to clear your browser cache, Let me know if the issue persists! Ah yes you may be right, seems to be working now several minutes later. Will need to note that for next time I re-pull the image. EDIT: Is there a directory that can be persistent volume mounted to prevent redownloading of weights?
Author
Owner

@justinh-rahb commented on GitHub (Jan 8, 2024):

Congratulations are in order Timothy, you really knocked this one out of the park. I can say without hint of exaggeration that this is the best and easiest local AI interface going right now for chat and now RAG too. I've tried a lot of them, I've even done my own forks of Chatbot-UI (who hasn't at this point), and none of them measure up. I don't know this is quite production ready for most folks yet, but it's good enough for me and my small team to hammer away on to see what comes loose 👍

@justinh-rahb commented on GitHub (Jan 8, 2024): Congratulations are in order Timothy, you really knocked this one out of the park. I can say without hint of exaggeration that this is the best and easiest local AI interface going right now for chat and now RAG too. I've tried a lot of them, I've even done my own forks of Chatbot-UI (who hasn't at this point), and none of them measure up. I don't know this is quite production ready for most folks yet, but it's good enough for me and my small team to hammer away on to see what comes loose 👍
Author
Owner

@tjbck commented on GitHub (Jan 8, 2024):

@justinh-rahb Thank you so much for your kind words, I'm glad you and your team's finding this project useful, it means a lot to me! I would also like to take this opportunity to thank you for being a part of our journey by helping me with the troubleshooting at all. We're just getting started, and there will be a lot more to come! Stay tuned for more updates!🌟

Also! FYI, the docker image has been update to have the weights baked in to the container, so you won't have to redownload every time you update.

@tjbck commented on GitHub (Jan 8, 2024): @justinh-rahb Thank you so much for your kind words, I'm glad you and your team's finding this project useful, it means a lot to me! I would also like to take this opportunity to thank you for being a part of our journey by helping me with the troubleshooting at all. We're just getting started, and there will be a lot more to come! Stay tuned for more updates!🌟 Also! FYI, the docker image has been update to have the weights baked in to the container, so you won't have to redownload every time you update.
Author
Owner

@jukofyork commented on GitHub (Jan 12, 2024):

Yeah, just trying this now and it seems really useful!

@jukofyork commented on GitHub (Jan 12, 2024): Yeah, just trying this now and it seems really useful!
Author
Owner

@gitmybox commented on GitHub (Jan 13, 2024):

I have tried many others local ChatPDF, none of them is close to Ollama Web UI. The GUI is intuitive to navigate & the upload documents (image, pdf, doc) process is reliable. Great effort & very well done.

As a suggestion, for those text-based document uploaded, it will be good to include the source of the extracted sentences, lines, pages etc. in the chat reply. I believe this can be obtained from the meta data during the text-splitting & embedding process. This extra info can be display as another icon beside the current existing icons for each message

@gitmybox commented on GitHub (Jan 13, 2024): I have tried many others local ChatPDF, none of them is close to Ollama Web UI. The GUI is intuitive to navigate & the upload documents (image, pdf, doc) process is reliable. Great effort & very well done. As a suggestion, for those text-based document uploaded, it will be good to include the source of the extracted sentences, lines, pages etc. in the chat reply. I believe this can be obtained from the meta data during the text-splitting & embedding process. This extra info can be display as another icon beside the current existing icons for each message
Author
Owner

@gitmybox commented on GitHub (Jan 13, 2024):

Another possible different scenario regarding document upload is audio-based data. There is use case that user can upload audio file recorded during meeting / class lesson / interview etc. It is very useful to chat with this type of voice converted text seamlessly without searching up & down the recording.

@gitmybox commented on GitHub (Jan 13, 2024): Another possible different scenario regarding document upload is audio-based data. There is use case that user can upload audio file recorded during meeting / class lesson / interview etc. It is very useful to chat with this type of voice converted text seamlessly without searching up & down the recording.
Author
Owner

@oliverbob commented on GitHub (Jan 13, 2024):

@tjbck thanks for the addition of the document section mate. Could we probably add a functionality to hide private docs from users (or is this the default behavior)? Just pulled the most recent update. Tried it only in the admin account.

@oliverbob commented on GitHub (Jan 13, 2024): @tjbck thanks for the addition of the document section mate. Could we probably add a functionality to hide private docs from users (or is this the default behavior)? Just pulled the most recent update. Tried it only in the admin account.
Author
Owner

@justinh-rahb commented on GitHub (Jan 13, 2024):

@tjbck thanks for the addition of the document section mate. Could we probably add a functionality to hide private docs from users (or is this the default behavior)? Just pulled the most recent update. Tried it only in the admin account.

@oliverbob The Documents button in the sidebar is only available to admins. I believe the intention here is that you can upload some documents that are globally available to all users, and anything a user drops into their chat box themselves is only for them. @tjbck can correct me if I've gotten any of the above wrong.

@justinh-rahb commented on GitHub (Jan 13, 2024): > @tjbck thanks for the addition of the document section mate. Could we probably add a functionality to hide private docs from users (or is this the default behavior)? Just pulled the most recent update. Tried it only in the admin account. @oliverbob The Documents button in the sidebar is only available to admins. I believe the intention here is that you can upload some documents that are globally available to all users, and anything a user drops into their chat box themselves is only for them. @tjbck can correct me if I've gotten any of the above wrong.
Author
Owner

@tjbck commented on GitHub (Jan 14, 2024):

@gitmybox sounds like a great idea, will try to take a look and see what can be done!

@oliverbob @justinh-rahb's explanation is on point, the documents page is intended to be used by admins to add documents to make them globally available to all users. Users with 'user' role should still be able to drag and drop their file to use RAG feature, it just won't be available to all users like the documents in the documents page. Hope that clarify things a bit.

@tjbck commented on GitHub (Jan 14, 2024): @gitmybox sounds like a great idea, will try to take a look and see what can be done! @oliverbob @justinh-rahb's explanation is on point, the documents page is intended to be used by admins to add documents to make them globally available to all users. Users with 'user' role should still be able to drag and drop their file to use RAG feature, it just won't be available to all users like the documents in the documents page. Hope that clarify things a bit.
Author
Owner

@oliverbob commented on GitHub (Jan 15, 2024):

Thank for the clarification.

@oliverbob commented on GitHub (Jan 15, 2024): Thank for the clarification.
Author
Owner

@gitmybox commented on GitHub (Jan 17, 2024):

I have noticed that Ollama Web-UI is using CPU to embed the pdf document while the chat conversation is using GPU, if there is one in system.

However, I did some testing in the past using PrivateGPT, I remember both pdf embedding & chat is using GPU, if there is one in system. But the embedding performance is very very slooow in PrivateGPT.

My understanding is by right GPU is better than CPU to perform any LLM tasks, particularly embedding. But Ollama Web-UI is using CPU instead with very good result. That puzzle me....

Appreciate if anyone can share some insights on whether embedding using CPU or GPU is better for speed ?

@gitmybox commented on GitHub (Jan 17, 2024): I have noticed that Ollama Web-UI is using CPU to embed the pdf document while the chat conversation is using GPU, if there is one in system. However, I did some testing in the past using PrivateGPT, I remember both pdf embedding & chat is using GPU, if there is one in system. But the embedding performance is very very slooow in PrivateGPT. My understanding is by right GPU is better than CPU to perform any LLM tasks, particularly embedding. But Ollama Web-UI is using CPU instead with very good result. That puzzle me.... Appreciate if anyone can share some insights on whether embedding using CPU or GPU is better for speed ?
Author
Owner

@justinh-rahb commented on GitHub (Jan 17, 2024):

Embedding models are relatively lightweight, and the performance enhancement from running them on a GPU is often minimal for smaller ones. In some cases, they may even run more slowly on a GPU than on a CPU due to the overhead of shuffling data around.

Given this, it seems reasonable that Ollama WebUI does not currently support GPU acceleration for text-embedding; implementing such a feature would likely require no small amount of effort for what might only benefit a limited number of users, and questionably at that.

However, if someone were to submit a well-crafted pull request demonstrating reliable GPU support for the Docker image on compatible systems, it would undoubtedly receive serious consideration.

@justinh-rahb commented on GitHub (Jan 17, 2024): Embedding models are relatively lightweight, and the performance enhancement from running them on a GPU is often minimal for smaller ones. In some cases, they may even run more slowly on a GPU than on a CPU due to the overhead of shuffling data around. Given this, it seems reasonable that Ollama WebUI does not currently support GPU acceleration for text-embedding; implementing such a feature would likely require no small amount of effort for what might only benefit a limited number of users, and questionably at that. However, if someone were to submit a well-crafted pull request demonstrating reliable GPU support for the Docker image on compatible systems, it would undoubtedly receive serious consideration.
Author
Owner

@gitmybox commented on GitHub (Jan 17, 2024):

Embedding models are relatively lightweight, and the performance enhancement from running them on a GPU is often minimal for smaller ones. In some cases, they may even run more slowly on a GPU than on a CPU due to the overhead of shuffling data around.

Given this, it seems reasonable that Ollama WebUI does not currently support GPU acceleration for text-embedding; implementing such a feature would likely require no small amount of effort for what might only benefit a limited number of users, and questionably at that.

However, if someone were to submit a well-crafted pull request demonstrating reliable GPU support for the Docker image on compatible systems, it would undoubtedly receive serious consideration.

@justinh-rahb thanks for the explanation. You mentioned overhead shuffling data around that trigger me to think about the memory. My system GPU has 8GB VRAM but CPU has 32GB RAM (4 times more). That could explains why CPU embedding performance is so much out-perform GPU in my use-case. It is perfectly fine that Ollama Web-UI use CPU embedding for me.

@gitmybox commented on GitHub (Jan 17, 2024): > Embedding models are relatively lightweight, and the performance enhancement from running them on a GPU is often minimal for smaller ones. In some cases, they may even run more slowly on a GPU than on a CPU due to the overhead of shuffling data around. > > Given this, it seems reasonable that Ollama WebUI does not currently support GPU acceleration for text-embedding; implementing such a feature would likely require no small amount of effort for what might only benefit a limited number of users, and questionably at that. > > However, if someone were to submit a well-crafted pull request demonstrating reliable GPU support for the Docker image on compatible systems, it would undoubtedly receive serious consideration. @justinh-rahb thanks for the explanation. You mentioned overhead shuffling data around that trigger me to think about the memory. My system GPU has 8GB VRAM but CPU has 32GB RAM (4 times more). That could explains why CPU embedding performance is so much out-perform GPU in my use-case. It is perfectly fine that Ollama Web-UI use CPU embedding for me.
Author
Owner

@amirvenus commented on GitHub (Apr 1, 2024):

While this has been a really great feature and I really appreciate it, I find the manual document upload rather tedious.

I was wondering if admins could make available a vector database in which they have already stored [large] embeddings?

Thanks

@amirvenus commented on GitHub (Apr 1, 2024): While this has been a really great feature and I really appreciate it, I find the manual document upload rather tedious. I was wondering if admins could make available a vector database in which they have already stored [large] embeddings? Thanks
Author
Owner

@araffin commented on GitHub (Jul 26, 2024):

@tjbck for clarification, is there any way currently to have full document parsing/text upload? (as suggested in https://github.com/open-webui/open-webui/issues/60, the same way it is done in https://huggingface.co/chat/).

What I mean is that currently it is a RAG system, so the document is parsed, embedded, and then chunks are retrieved.
With models with longer context (e.g. llama 3.1) it would be nice to be able to give the whole document as input (and not just chunks, so just parse and give as input).

As a concrete example, it would be nice to upload a PDF and ask for a summary section by section.
With the current system, it only works partially as only part of the document is retrieved.

@araffin commented on GitHub (Jul 26, 2024): @tjbck for clarification, is there any way currently to have full document parsing/text upload? (as suggested in https://github.com/open-webui/open-webui/issues/60, the same way it is done in https://huggingface.co/chat/). What I mean is that currently it is a RAG system, so the document is parsed, embedded, and then chunks are retrieved. With models with longer context (e.g. llama 3.1) it would be nice to be able to give the whole document as input (and not just chunks, so just parse and give as input). As a concrete example, it would be nice to upload a PDF and ask for a summary section by section. With the current system, it only works partially as only part of the document is retrieved.
Author
Owner

@I-I-IT commented on GitHub (Dec 22, 2024):

Hello, how can I use this with the standard command-line Ollama

@I-I-IT commented on GitHub (Dec 22, 2024): Hello, how can I use this with the standard command-line Ollama
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12