[PR #13664] [CLOSED] feat: use single collection for web search results #46307

Closed
opened 2026-04-29 21:03:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/13664
Author: @mMabeck
Created: 5/7/2025
Status: Closed

Base: devHead: web-search


📝 Commits (3)

  • 1e2aa92 feat: enhance web search functionality to support multiple queries and improve result handling
  • dfd5931 refac: remove redundant URL deduplication
  • 1b23502 refac: formatting

📊 Changes

2 files changed (+95 additions, -115 deletions)

View changed files

📝 backend/open_webui/routers/retrieval.py (+46 -33)
📝 backend/open_webui/utils/middleware.py (+49 -82)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests to validate the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

This pull request refactors the web search functionality to use a single collection for all web search results. This also makes sure each query is only embedded once. Smaller changes include supporting multiple queries, improving concurrency, and simplifying the processing logic. The process/web/search endpoint now accepts a list of queries and runs them concurrently.

Before, each collection was queried, which led to unclear context size. In practice most webpages were given as full context, since it used top K results from each collection/web-page.

Added

  • Deduplicate and flatten search results based on unique links.
  • Updated SearchForm to accept both single query strings and lists of queries, enabling multi-query support (backend/open_webui/routers/retrieval.py).

Changed

  • Refactored process_web_search to:
    • Deduplicate and flatten search results based on unique links.
    • Consolidate all documents into a single collection instead of creating separate collections for each query.

Improvements to Middleware:

  • Simplified chat_web_search_handler by:
    • Processing all queries in a single call to process_web_search.

Additional Information

I guess using a Union for the SearchForm model is not ideal, but I would likely break some things/setups if it didn't accept a string.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/13664 **Author:** [@mMabeck](https://github.com/mMabeck) **Created:** 5/7/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `web-search` --- ### 📝 Commits (3) - [`1e2aa92`](https://github.com/open-webui/open-webui/commit/1e2aa9209356e730bd5e459e42769c198b97126a) feat: enhance web search functionality to support multiple queries and improve result handling - [`dfd5931`](https://github.com/open-webui/open-webui/commit/dfd59317d98441ef9bd7d1a77d9b26ac1f62e910) refac: remove redundant URL deduplication - [`1b23502`](https://github.com/open-webui/open-webui/commit/1b2350245f58284f2d36b635dede69e259a2ec27) refac: formatting ### 📊 Changes **2 files changed** (+95 additions, -115 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/retrieval.py` (+46 -33) 📝 `backend/open_webui/utils/middleware.py` (+49 -82) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [x] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [x] **Testing:** Have you written and run sufficient tests to validate the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description This pull request refactors the web search functionality to use a single collection for all web search results. This also makes sure each query is only embedded once. Smaller changes include supporting multiple queries, improving concurrency, and simplifying the processing logic. The process/web/search endpoint now accepts a list of queries and runs them concurrently. Before, each collection was queried, which led to unclear context size. In practice most webpages were given as full context, since it used top K results from each collection/web-page. ### Added - Deduplicate and flatten search results based on unique links. - Updated `SearchForm` to accept both single query strings and lists of queries, enabling multi-query support (`backend/open_webui/routers/retrieval.py`). ### Changed - Refactored `process_web_search` to: - Deduplicate and flatten search results based on unique links. - Consolidate all documents into a single collection instead of creating separate collections for each query. ### Improvements to Middleware: * Simplified `chat_web_search_handler` by: - Processing all queries in a single call to `process_web_search`. ### Additional Information I guess using a Union for the SearchForm model is not ideal, but I would likely break some things/setups if it didn't accept a string. ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 21:03:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#46307