[PR #19092] feat: Web Search domain filter allowlist/blacklist mode #11884

Open
opened 2025-11-11 19:59:32 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/19092
Author: @kjpoccia
Created: 11/10/2025
Status: 🔄 Open

Base: devHead: feat/websearch-filters


📝 Commits (3)

  • e0d5de1 Merge pull request #18978 from open-webui/dev
  • 1929b17 Incorporate is_allowlist flag and filtering to provide option to make domain filter a block list
  • 73bdd9e indentation

📊 Changes

27 files changed (+252 additions, -41 deletions)

View changed files

📝 backend/open_webui/config.py (+16 -7)
📝 backend/open_webui/main.py (+4 -0)
📝 backend/open_webui/retrieval/web/bing.py (+6 -1)
📝 backend/open_webui/retrieval/web/bocha.py (+10 -2)
📝 backend/open_webui/retrieval/web/brave.py (+10 -2)
📝 backend/open_webui/retrieval/web/duckduckgo.py (+6 -1)
📝 backend/open_webui/retrieval/web/exa.py (+20 -2)
📝 backend/open_webui/retrieval/web/external.py (+6 -1)
📝 backend/open_webui/retrieval/web/firecrawl.py (+6 -1)
📝 backend/open_webui/retrieval/web/google_pse.py (+6 -1)
📝 backend/open_webui/retrieval/web/kagi.py (+10 -2)
📝 backend/open_webui/retrieval/web/main.py (+7 -2)
📝 backend/open_webui/retrieval/web/mojeek.py (+10 -2)
📝 backend/open_webui/retrieval/web/ollama.py (+8 -1)
📝 backend/open_webui/retrieval/web/perplexity.py (+6 -2)
📝 backend/open_webui/retrieval/web/perplexity_search.py (+7 -0)
📝 backend/open_webui/retrieval/web/searchapi.py (+6 -1)
📝 backend/open_webui/retrieval/web/searxng.py (+6 -1)
📝 backend/open_webui/retrieval/web/serpapi.py (+6 -1)
📝 backend/open_webui/retrieval/web/serper.py (+10 -2)

...and 7 more files

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

  • Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

  • 🌐 Web Search domain filter now supports both allowlist and blacklist modes via a new WEB_SEARCH_DOMAIN_LIST_IS_ALLOWLIST environment variable (default: true) and a UI toggle (“Treat domain list as allowlist”). This enables excluding specific domains or restricting results to a curated set while maintaining backward-compatible defaults.

Description

This PR enhances Web Search domain filtering by:

  • Adding a mode toggle to treat the domain list as either an allowlist (default) or a blacklist.
  • Expanding existing post-search filtering to allow for domains to be filtered out
  • This addresses long-standing requests to support excluding specific domains while still searching broadly.

Related: Proposal discussion #18944, PR #16628, PR #10358, and issue #9912.

Added

Allowlist/Blacklist mode

  • New boolean setting and env var WEB_SEARCH_DOMAIN_LIST_IS_ALLOWLIST (default: true).
  • UI toggle: “Treat domain list as allowlist” (on = allowlist, off = blacklist).

Changed

  • Generalized get_filtered_results from allowlist-only to allowlist/blacklist behavior, with consistent domain normalization, subdomain handling, and backward-compatible default (allowlist).
  • Updated retrieval.py so web searches that return no matches no longer surface a 404. Instead, the API returns a successful response with an empty result set, and the UI/model clearly shows “0 sources found.” This avoids conflating “endpoint not found” with “no results."

Breaking Changes

  • BREAKING CHANGE: None

Additional Information

  • A blocklist is by far the most requested feature when it comes to web search for us. As has been mentioned in previous issues and PRs, the term "domain filter" does imply it's a blocklist as it stands. This PR is to add that functionality while maintaining default behavior and backward compatibility
  • I haven't drafted any documentation language yet - if you guys like this change, I'd be happy to.
  • Testing: I have tested perplexity, external, Tavily, and Google PSE. Testing of other providers is welcome

Screenshots or Videos

  • Admin Settings Web Search panel with filter list set to allowlist:
allow
  • Admin Settings Web Search panel with filter list set to blacklist:
Screenshot 2025-11-10 at 12 19 48 PM

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/19092 **Author:** [@kjpoccia](https://github.com/kjpoccia) **Created:** 11/10/2025 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `feat/websearch-filters` --- ### 📝 Commits (3) - [`e0d5de1`](https://github.com/open-webui/open-webui/commit/e0d5de16978786b8a7538adf1efcde5258f38faf) Merge pull request #18978 from open-webui/dev - [`1929b17`](https://github.com/open-webui/open-webui/commit/1929b172bc9d8d985f31a4fe541f9eae2a83e7b2) Incorporate is_allowlist flag and filtering to provide option to make domain filter a block list - [`73bdd9e`](https://github.com/open-webui/open-webui/commit/73bdd9ee924e528bb0908b903f56aac1d5a0ac8c) indentation ### 📊 Changes **27 files changed** (+252 additions, -41 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+16 -7) 📝 `backend/open_webui/main.py` (+4 -0) 📝 `backend/open_webui/retrieval/web/bing.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/bocha.py` (+10 -2) 📝 `backend/open_webui/retrieval/web/brave.py` (+10 -2) 📝 `backend/open_webui/retrieval/web/duckduckgo.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/exa.py` (+20 -2) 📝 `backend/open_webui/retrieval/web/external.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/firecrawl.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/google_pse.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/kagi.py` (+10 -2) 📝 `backend/open_webui/retrieval/web/main.py` (+7 -2) 📝 `backend/open_webui/retrieval/web/mojeek.py` (+10 -2) 📝 `backend/open_webui/retrieval/web/ollama.py` (+8 -1) 📝 `backend/open_webui/retrieval/web/perplexity.py` (+6 -2) 📝 `backend/open_webui/retrieval/web/perplexity_search.py` (+7 -0) 📝 `backend/open_webui/retrieval/web/searchapi.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/searxng.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/serpapi.py` (+6 -1) 📝 `backend/open_webui/retrieval/web/serper.py` (+10 -2) _...and 7 more files_ </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request. This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR. **Before submitting, make sure you've checked the following:** - [X] **Target branch:** Verify that the pull request targets the `dev` branch. **Not targeting the `dev` branch will lead to immediate closure of the PR.** - [X] **Description:** Provide a concise description of the changes made in this pull request down below. - [X] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** If necessary, update relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs) like environment variables, the tutorials, or other documentation sources. - [X] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [X] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Take this as an opportunity to **make screenshots of the feature/fix and include it in the PR description**. - [X] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [X] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [X] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry - 🌐 Web Search domain filter now supports both allowlist and blacklist modes via a new `WEB_SEARCH_DOMAIN_LIST_IS_ALLOWLIST` environment variable (default: `true`) and a UI toggle (“Treat domain list as allowlist”). This enables excluding specific domains or restricting results to a curated set while maintaining backward-compatible defaults. ### Description This PR enhances Web Search domain filtering by: - Adding a mode toggle to treat the domain list as either an allowlist (default) or a blacklist. - Expanding existing post-search filtering to allow for domains to be filtered out - This addresses long-standing requests to support excluding specific domains while still searching broadly. Related: Proposal discussion #18944, PR #16628, PR #10358, and issue #9912. ### Added Allowlist/Blacklist mode - New boolean setting and env var WEB_SEARCH_DOMAIN_LIST_IS_ALLOWLIST (default: true). - UI toggle: “Treat domain list as allowlist” (on = allowlist, off = blacklist). ### Changed - Generalized `get_filtered_results` from allowlist-only to allowlist/blacklist behavior, with consistent domain normalization, subdomain handling, and backward-compatible default (allowlist). - Updated retrieval.py so web searches that return no matches no longer surface a 404. Instead, the API returns a successful response with an empty result set, and the UI/model clearly shows “0 sources found.” This avoids conflating “endpoint not found” with “no results." ### Breaking Changes - **BREAKING CHANGE**: None --- ### Additional Information - A blocklist is by far the most requested feature when it comes to web search for us. As has been mentioned in previous issues and PRs, the term "domain filter" does imply it's a blocklist as it stands. This PR is to add that functionality while maintaining default behavior and backward compatibility - I haven't drafted any documentation language yet - if you guys like this change, I'd be happy to. - **Testing:** I have tested perplexity, external, Tavily, and Google PSE. Testing of other providers is welcome ### Screenshots or Videos - Admin Settings Web Search panel with filter list set to allowlist: <img width="543" height="103" alt="allow" src="https://github.com/user-attachments/assets/2d087341-635f-4514-95ea-68ede59fec55" /> - Admin Settings Web Search panel with filter list set to blacklist: <img width="551" height="102" alt="Screenshot 2025-11-10 at 12 19 48 PM" src="https://github.com/user-attachments/assets/018f96db-ae8a-41be-b42e-734b7d42f75f" /> ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 19:59:32 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#11884