mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 00:04:08 -05:00
feat: Option not to read specific webpage content after performing a web search #4362
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @williamgateszhao on GitHub (Mar 10, 2025).
Check Existing Issues
Problem Description
When executing
process_web_searchinopen-webui/backend/open_webui/routers/retrieval.py, search_web is called to retrieve web_results, which containsnippet. For some search engine results, the snippet is already a processed, complete, or relatively complete webpage content, rather than just a brief summary of the webpage. However, thesesnippetare directly discarded.Subsequently,
process_web_searchuses the defaultweb_loaderto access the webpage again to obtain its content asdocs. In the above scenario, this is a waste of time and resources, and the scraping of the webpage may not necessarily be better than that provided by professional search providers like Jina.For example, the
snippetgenerated byjina_search.pyincludes the complete webpage content processed by Jina in markdown format.Another example is
tavily.py, which can actually obtain theraw_contentprocessed by Tavily in the return value by adding"include_raw_content": trueto the post data.Desired Solution you'd like
In the above situation, there is no need to call get_web_loader to visit each URL individually. Instead, the
snippetin web_results can be directly used asdocsin the return value ofprocess_web_search.I suggest to add an option allowing users to decide this. I guess this might also address the requirement mentioned in #11488.
Alternatives Considered
No response
Additional Context
No response