feat: web content pipeline for rag #177

New Issue

GiteaMirror · 2025-11-11T14:09:54-06:00

GiteaMirror commented

2025-11-11 14:09:54 -06:00

Originally created by @tjbck on GitHub (Jan 13, 2024).

Originally assigned to: @tjbck on GitHub.

https://www.reddit.com/r/LocalLLaMA/comments/192dz3r/q_is_it_possible_to_give_ollama_access_to_a_local/

Originally created by @tjbck on GitHub (Jan 13, 2024). Originally assigned to: @tjbck on GitHub. https://www.reddit.com/r/LocalLLaMA/comments/192dz3r/q_is_it_possible_to_give_ollama_access_to_a_local/

GiteaMirror added the core label 2025-11-11 14:09:54 -06:00

GiteaMirror closed this issue

2025-11-11 14:09:54 -06:00

GiteaMirror commented

2025-11-11 14:09:55 -06:00

@justinh-rahb commented on GitHub (Jan 14, 2024):

Integrating a feature in Ollama WebUI that allows users to provide a URL and have it automatically processed through RAG would be an exciting addition. However, determining how to trigger this feature raises some questions. For instance, we wouldn't want every URL mentioned in the chat to automatically trigger the RAG process since that could lead to unnecessary processing and potentially unwanted results.

One solution could be adding a button that explicitly triggers the RAG process for a given URL. However, it would be ideal if we could make this feature as automatic as possible while maintaining control over when it's triggered. One way to achieve this could be by implementing some sort of semantic routing model that recognizes specific commands or keywords in the chat input and automatically triggers the RAG process for a provided URL.

For example, if a user types "Read this article" followed by a URL, Ollama WebUI could automatically recognize the command and trigger the RAG process without requiring any additional steps. This approach would maintain the clean interface we currently have.

Here's some further information on Semantic Routing:
YouTube
Demo notebook
LangChain example notebook
Repo

OpenAI must be implementing a similar concept in ChatGPT Plus to figure out whether to generate images or code or what have you, as it wouldn't be feasible or cost-effective to require two API calls for every message, one with GPT-3.5 to determine what to use and another to perform the action with GPT-4/DALLE-3/CodeInterpreter.

@justinh-rahb commented on GitHub (Jan 14, 2024): Integrating a feature in Ollama WebUI that allows users to provide a URL and have it automatically processed through RAG would be an exciting addition. However, determining how to trigger this feature raises some questions. For instance, we wouldn't want every URL mentioned in the chat to automatically trigger the RAG process since that could lead to unnecessary processing and potentially unwanted results. One solution could be adding a button that explicitly triggers the RAG process for a given URL. However, it would be ideal if we could make this feature as automatic as possible while maintaining control over when it's triggered. One way to achieve this could be by implementing some sort of semantic routing model that recognizes specific commands or keywords in the chat input and automatically triggers the RAG process for a provided URL. For example, if a user types "Read this article" followed by a URL, Ollama WebUI could automatically recognize the command and trigger the RAG process without requiring any additional steps. This approach would maintain the clean interface we currently have. Here's some further information on Semantic Routing: [YouTube](https://www.youtube.com/watch?v=ro312jDqAh0) [Demo notebook](https://github.com/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb) [LangChain example notebook](https://github.com/aurelio-labs/semantic-router/blob/main/docs/03-basic-langchain-agent.ipynb) [Repo](https://github.com/aurelio-labs/semantic-router/) OpenAI must be implementing a similar concept in ChatGPT Plus to figure out whether to generate images or code or what have you, as it wouldn't be feasible or cost-effective to require two API calls for every message, one with GPT-3.5 to determine what to use and another to perform the action with GPT-4/DALLE-3/CodeInterpreter.

GiteaMirror commented

2025-11-11 14:09:55 -06:00

@RLutsch commented on GitHub (Jan 14, 2024):

how about adding an endpoint you can publish data to? something like
curl -X post <myUrl.com>/rag --data 'my.csv'
then option to add chromaDB?
Maybe also support for external db?

@RLutsch commented on GitHub (Jan 14, 2024): how about adding an endpoint you can publish data to? something like ``` curl -X post <myUrl.com>/rag --data 'my.csv' ``` then option to add chromaDB? Maybe also support for external db?

GiteaMirror commented

2025-11-11 14:09:55 -06:00

@ChingWeiChan commented on GitHub (Jan 16, 2024):

Additional information about web content pipeline, I think we can integrate search api for RAG. Like serpapi(free plan is 100 times search/month), bing search API (free plan is 1000 transactions /month),Wikipedia api or other.

@ChingWeiChan commented on GitHub (Jan 16, 2024): Additional information about web content pipeline, I think we can integrate search api for RAG. Like [serpapi](https://serpapi.com)(free plan is 100 times search/month), [bing search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) (free plan is 1000 transactions /month),[Wikipedia api](https://api.wikimedia.org/wiki/Searching_for_Wikipedia_articles_using_Python) or other.

GiteaMirror commented

2025-11-11 14:09:55 -06:00

@Marclass commented on GitHub (Jan 17, 2024):

how about adding an endpoint you can publish data to? something like curl -X post <myUrl.com>/rag --data 'my.csv' then option to add chromaDB? Maybe also support for external db?

The RAG endpoint to scrape web pages was added in #333 with /web.

@Marclass commented on GitHub (Jan 17, 2024): > how about adding an endpoint you can publish data to? something like `curl -X post <myUrl.com>/rag --data 'my.csv'` then option to add chromaDB? Maybe also support for external db? The RAG endpoint to scrape web pages was added in #333 with /web.

GiteaMirror commented

2025-11-11 14:09:56 -06:00

@oliverbob commented on GitHub (Jan 23, 2024):

Any update to this yet?

Thanks.

@oliverbob commented on GitHub (Jan 23, 2024): Any update to this yet? Thanks.

GiteaMirror commented

2025-11-11 14:09:56 -06:00

@tjbck commented on GitHub (Jan 27, 2024):

You can now add website content to rag pipeline directly using '#' command followed by the website url, let me know if you guys encounter any issues!

As for the API integration support, let's continue our discussion here: #586, Thanks!

@tjbck commented on GitHub (Jan 27, 2024): ![image](https://github.com/ollama-webui/ollama-webui/assets/25473318/7f511461-f0f7-4784-bcb7-4297ab1210cb) You can now add website content to rag pipeline directly using '#' command followed by the website url, let me know if you guys encounter any issues! As for the API integration support, let's continue our discussion here: #586, Thanks!

GiteaMirror commented

2025-11-11 14:09:57 -06:00

@oliverbob commented on GitHub (Jan 28, 2024):

You can now add website content to rag pipeline directly using '#' command followed by the website url, let me know if you guys encounter any issues!

As for the API integration support, let's continue our discussion here: #586, Thanks!

Thanks mate. Been waiting for this. Will try the latest update.

@oliverbob commented on GitHub (Jan 28, 2024): > ![image](https://private-user-images.githubusercontent.com/25473318/300159751-7f511461-f0f7-4784-bcb7-4297ab1210cb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDY0NTU0MzAsIm5iZiI6MTcwNjQ1NTEzMCwicGF0aCI6Ii8yNTQ3MzMxOC8zMDAxNTk3NTEtN2Y1MTE0NjEtZjBmNy00Nzg0LWJjYjctNDI5N2FiMTIxMGNiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAxMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMTI4VDE1MTg1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNlYWMwZWUyNjMxZGU2MTZjNWE2OTU4YjFjYTA5NGYwNTEzM2UwNjM5OTQ2MGI4MDg2NTg1MGM4YWNiNDA1ZGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.c0PhpkjEO07VjleCbTEdH5G2CxYtYZl5RbteBAB9MRY) > > You can now add website content to rag pipeline directly using '#' command followed by the website url, let me know if you guys encounter any issues! > > As for the API integration support, let's continue our discussion here: #586, Thanks! Thanks mate. Been waiting for this. Will try the latest update.

GiteaMirror commented

2025-11-11 14:09:59 -06:00

@justinh-rahb commented on GitHub (Jan 28, 2024):

Working great for me 💯 Congrats @tjbck for landing this absolutely huge feature!

@justinh-rahb commented on GitHub (Jan 28, 2024): Working great for me 💯 Congrats @tjbck for landing this absolutely huge feature! <img width="779" alt="Screenshot 2024-01-28 at 10 07 14 AM" src="https://github.com/ollama-webui/ollama-webui/assets/52832301/06672288-8bc8-46e1-bf42-cf21f712d4ae">

GiteaMirror referenced this issue

2025-11-11 17:13:46 -06:00

[PR #177] [MERGED] doc: features update #6990

GiteaMirror referenced this issue

2026-04-20 02:48:36 -05:00

[PR #177] [MERGED] doc: features update #20194

GiteaMirror referenced this issue

2026-04-25 09:59:38 -05:00

[PR #177] [MERGED] doc: features update #35824

GiteaMirror referenced this issue

2026-04-29 17:22:44 -05:00

[PR #177] [MERGED] doc: features update #43242