enh: client side web crawling for RAG #1069

Closed
opened 2025-11-11 14:36:40 -06:00 by GiteaMirror · 6 comments
Owner

Originally created by @arjunkrishna on GitHub (May 29, 2024).

"#" usage for websites and youtube videos should download the content on the client side and then use rag on the server side.

If the server on which openwebui is running is blocked from accessing the internet, then this functionality does not work. It would really help if downloading the content is done on the client side and then that content is then added to the rag on server side code.

Discussed in https://github.com/open-webui/open-webui/discussions/1959

Originally posted by arjunkrishna May 3, 2024
The first step in YouTube transcribing where we add the url after #, does it fetch the transcript from YouTube on client side or the processing of download of the script happens on the server side?

Originally created by @arjunkrishna on GitHub (May 29, 2024). "#" usage for websites and youtube videos should download the content on the client side and then use rag on the server side. If the server on which openwebui is running is blocked from accessing the internet, then this functionality does not work. It would really help if downloading the content is done on the client side and then that content is then added to the rag on server side code. ### Discussed in https://github.com/open-webui/open-webui/discussions/1959 <div type='discussions-op-text'> <sup>Originally posted by **arjunkrishna** May 3, 2024</sup> The first step in YouTube transcribing where we add the url after #, does it fetch the transcript from YouTube on client side or the processing of download of the script happens on the server side?</div>
Author
Owner

@tjbck commented on GitHub (May 30, 2024):

Great idea! PR welcome!

@tjbck commented on GitHub (May 30, 2024): Great idea! PR welcome!
Author
Owner

@arjunkrishna commented on GitHub (May 30, 2024):

Unfortunately not a python or svelte developer :) anyone willing to take up this change?

@arjunkrishna commented on GitHub (May 30, 2024): Unfortunately not a python or svelte developer :) anyone willing to take up this change?
Author
Owner

@cheahjs commented on GitHub (May 30, 2024):

This is not very feasible outside of very narrow usecases, as the client is a browser, you would need the website to have the correct CORS headers that allow fetching, which is only normally the case when exposing APIs.

@cheahjs commented on GitHub (May 30, 2024): This is not very feasible outside of very narrow usecases, as the client is a browser, you would need the website to have the correct [CORS headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) that allow fetching, which is only normally the case when exposing APIs.
Author
Owner

@tjbck commented on GitHub (May 30, 2024):

Correct me if I'm wrong but I believe this might be doable using a chrome extension.

@tjbck commented on GitHub (May 30, 2024): Correct me if I'm wrong but I believe this might be doable using a chrome extension.
Author
Owner

@arjunkrishna commented on GitHub (May 30, 2024):

how does the existing rag for documents work? can it be done using javascript on client side where it fetches the page's content or youtube's transcript and saves it into browser's localstorage if needed and then upload it as <webpage/youtubelink url>.txt to the rag used for documents.

@arjunkrishna commented on GitHub (May 30, 2024): how does the existing rag for documents work? can it be done using javascript on client side where it fetches the page's content or youtube's transcript and saves it into browser's localstorage if needed and then upload it as <webpage/youtubelink url>.txt to the rag used for documents.
Author
Owner

@que-nguyen commented on GitHub (Jun 2, 2024):

Correct me if I'm wrong but I believe this might be doable using a chrome extension.

Yeah, using a Chrome extension could work, but it might not be the best for mobile browsers.

@que-nguyen commented on GitHub (Jun 2, 2024): > Correct me if I'm wrong but I believe this might be doable using a chrome extension. Yeah, using a Chrome extension could work, but it might not be the best for mobile browsers.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1069