[PR #11655] [CLOSED] feat: add option BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE #22775

Closed
opened 2026-04-20 04:23:32 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/11655
Author: @williamgateszhao
Created: 3/14/2025
Status: Closed

Base: devHead: BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE


📝 Commits (2)

  • 30104c6 add option: BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE
  • 7bbe43b unit test

📊 Changes

5 files changed (+170 additions, -7 deletions)

View changed files

📝 backend/open_webui/config.py (+6 -0)
📝 backend/open_webui/main.py (+4 -0)
📝 backend/open_webui/routers/retrieval.py (+28 -7)
backend/open_webui/test/apps/webui/routers/test_retrieval.py (+115 -0)
📝 src/lib/components/admin/Settings/WebSearch.svelte (+17 -0)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests for validating the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

  • Added an option BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE, which is reflected in the web search page under admin/settings.
  • This option is disabled by default. When enabled, openwebui's built-in web search will use the search engine's snippet as page content, bypassing URL scraping.
  • This approach helps save time and resources, addresses the frequent HTTP ERROR issues with the built-in web search functionality of openwebui, and leverages high-quality scraping results from certain search engines optimized for LLMs (such as Jina and Tavily), which may yield better results than our own scraping efforts.

Added

  • An option BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE. When enabled, use the search engine's snippet as page content, bypassing URL scraping.

Additional Information

  • It is worth mentioning that the current built-in Tavily search support does not require Tavily to provide full page content. I plan to submit a separate small PR to add this functionality.
  • For the issue related to this PR, see #11505, and for the discussion, see #11543.

Screenshots or Videos

  • [Attach any relevant screenshots or videos demonstrating the changes]

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/11655 **Author:** [@williamgateszhao](https://github.com/williamgateszhao) **Created:** 3/14/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE` --- ### 📝 Commits (2) - [`30104c6`](https://github.com/open-webui/open-webui/commit/30104c615f7f3f52080e4016fc60a37fd839d294) add option: BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE - [`7bbe43b`](https://github.com/open-webui/open-webui/commit/7bbe43ba2db123fde38a5d9bf16a24daf19c5b91) unit test ### 📊 Changes **5 files changed** (+170 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+6 -0) 📝 `backend/open_webui/main.py` (+4 -0) 📝 `backend/open_webui/routers/retrieval.py` (+28 -7) ➕ `backend/open_webui/test/apps/webui/routers/test_retrieval.py` (+115 -0) 📝 `src/lib/components/admin/Settings/WebSearch.svelte` (+17 -0) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests for validating the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description - Added an option `BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE`, which is reflected in the `web search` page under `admin/settings`. - This option is disabled by default. When enabled, openwebui's built-in web search will use the search engine's snippet as page content, bypassing URL scraping. - This approach helps save time and resources, addresses the frequent HTTP ERROR issues with the built-in web search functionality of openwebui, and leverages high-quality scraping results from certain search engines optimized for LLMs (such as Jina and Tavily), which may yield better results than our own scraping efforts. ### Added - An option `BYPASS_WEB_SEARCH_RESULT_LINK_SCRAPE`. When enabled, use the search engine's snippet as page content, bypassing URL scraping. --- ### Additional Information - It is worth mentioning that the current built-in Tavily search support does not require Tavily to provide full page content. I plan to submit a separate small PR to add this functionality. - For the issue related to this PR, see #11505, and for the discussion, see #11543. ### Screenshots or Videos - [Attach any relevant screenshots or videos demonstrating the changes] --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 04:23:32 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#22775