[PR #20342] [CLOSED] perf: Fix hybrid search performance regression with parallel collection fetching and BM25 bypass #25575

Closed
opened 2026-04-20 06:00:29 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/20342
Author: @silentoplayz
Created: 1/3/2026
Status: Closed

Base: devHead: perf/fix-hybrid-search-performance-regression


📝 Commits (4)

  • 6b5dee8 perf: fix hybrid search performance regression with parallel collection fetching and BM25 bypass
  • 17003a0 Merge branch 'dev' into perf/fix-hybrid-search-performance-regression
  • c671bd5 perf: optimize collection fetching to address 160k doc regression
  • 895276a fix: revert conditional skip that caused no_docs regression

📊 Changes

1 file changed (+90 additions, -14 deletions)

View changed files

📝 backend/open_webui/retrieval/utils.py (+90 -14)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

  • Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

This PR addresses a critical performance regression introduced in version 0.6.26 where hybrid search with BM25 disabled experiences dramatic latency increases. The regression makes retrieval workflows impractical for collections with 10k+ files, with response times increasing from 10-30 seconds (v0.6.25) to 3+ minutes, and becoming completely unusable for larger collections (160k+ files) with timeouts at 15-20 minutes.

Root causes identified:

  1. Sequential collection fetching - Collections were fetched one by one in a synchronous loop before any parallel processing, creating a bottleneck with multiple collections
  2. Unnecessary BM25 processing - Even when BM25 weight was set to 0, the system still initialized BM25 retrievers and processed enriched texts

Solutions implemented:

  1. Parallel collection fetching - Replaced sequential loop with asyncio.gather to fetch multiple collections concurrently, eliminating N-1 sequential waits
  2. Early BM25 bypass - Added fast-path logic to skip BM25 initialization entirely when weight is 0 and enrichment is disabled

Added

  • New [fetch_collection_data()] async helper function to fetch multiple collections in parallel using asyncio.gather
  • Early bypass logic in [query_doc_with_hybrid_search()] to skip BM25 processing when not needed
  • Debug logging for BM25 bypass activation

Changed

  • [query_collection_with_hybrid_search()] now uses parallel collection fetching instead of sequential loop
  • Collection fetching logic refactored for better concurrency and reduced latency

Fixed

  • Performance regression in hybrid search (Fixes #20327)
    • Restored response times to v0.6.25 levels when BM25 weight is 0
    • Eliminated 15-20 minute timeouts for large collections (160k+ files)
    • Reduced response time from ~3 minutes to ~10-30 seconds for medium collections (~10k files)
  • Sequential collection fetching bottleneck that caused delays with multiple collections

Breaking Changes

  • None - changes are backward compatible and only activate under specific conditions

Additional Information

Technical Details

Parallel Collection Fetching:

  • Uses asyncio.gather to fetch collections concurrently
  • Maintains individual error handling for each collection
  • Reduces total fetch time from N × T to ~T for N collections

Early BM25 Bypass:

  • Only activates when both conditions are true:
    • hybrid_bm25_weight <= 0 (BM25 not wanted)
    • not enable_enriched_texts (No metadata enrichment wanted)
  • Falls through to existing logic when conditions aren't met
  • Skips: BM25 text enrichment, BM25 retriever initialization, EnsembleRetriever creation

Testing Note

Code changes are conservative:

  • Parallel collection fetching uses standard asyncio.gather patterns
  • Early BM25 bypass only activates when weight=0 AND enrichment disabled
  • Existing behavior unchanged when conditions not met

Performance testing requested:
@galvanoid (original issue reporter) - Could you test this PR with your 10k and 160k file collections to verify it resolves the latency issues you reported?

Expected improvements:

  • ~10k files: Response time should return to v0.6.25 levels (~10-30s vs ~3min)
  • ~160k files: Should eliminate 15-20min timeouts

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/20342 **Author:** [@silentoplayz](https://github.com/silentoplayz) **Created:** 1/3/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `perf/fix-hybrid-search-performance-regression` --- ### 📝 Commits (4) - [`6b5dee8`](https://github.com/open-webui/open-webui/commit/6b5dee8bcda15db752e1d4a2058ee4b03f10692e) perf: fix hybrid search performance regression with parallel collection fetching and BM25 bypass - [`17003a0`](https://github.com/open-webui/open-webui/commit/17003a0335fe7b58fd10263badc3fa011479a61d) Merge branch 'dev' into perf/fix-hybrid-search-performance-regression - [`c671bd5`](https://github.com/open-webui/open-webui/commit/c671bd5d6dea8322e3f742171807cc4f7922fcaf) perf: optimize collection fetching to address 160k doc regression - [`895276a`](https://github.com/open-webui/open-webui/commit/895276a08ecc7674e6130608c12b1160d09e4c11) fix: revert conditional skip that caused no_docs regression ### 📊 Changes **1 file changed** (+90 additions, -14 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/utils.py` (+90 -14) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request. This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Verify that the pull request targets the `dev` branch. **Not targeting the `dev` branch will lead to immediate closure of the PR.** - [x] **Description:** Provide a concise description of the changes made in this pull request down below. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [X] **Documentation:** If necessary, update relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs) like environment variables, the tutorials, or other documentation sources. - [X] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [ ] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Take this as an opportunity to **make screenshots of the feature/fix and include it in the PR description**. - [x] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work --- # Changelog Entry ### Description This PR addresses a critical performance regression introduced in version 0.6.26 where hybrid search with BM25 disabled experiences dramatic latency increases. The regression makes retrieval workflows impractical for collections with 10k+ files, with response times increasing from 10-30 seconds (v0.6.25) to 3+ minutes, and becoming completely unusable for larger collections (160k+ files) with timeouts at 15-20 minutes. **Root causes identified:** 1. **Sequential collection fetching** - Collections were fetched one by one in a synchronous loop before any parallel processing, creating a bottleneck with multiple collections 2. **Unnecessary BM25 processing** - Even when BM25 weight was set to `0`, the system still initialized BM25 retrievers and processed enriched texts **Solutions implemented:** 1. **Parallel collection fetching** - Replaced sequential loop with `asyncio.gather` to fetch multiple collections concurrently, eliminating N-1 sequential waits 2. **Early BM25 bypass** - Added fast-path logic to skip BM25 initialization entirely when weight is 0 and enrichment is disabled ### Added - New `[fetch_collection_data()]` async helper function to fetch multiple collections in parallel using `asyncio.gather` - Early bypass logic in `[query_doc_with_hybrid_search()]` to skip BM25 processing when not needed - Debug logging for BM25 bypass activation ### Changed - `[query_collection_with_hybrid_search()]` now uses parallel collection fetching instead of sequential loop - Collection fetching logic refactored for better concurrency and reduced latency ### Fixed - **Performance regression in hybrid search** (Fixes #20327) - Restored response times to v0.6.25 levels when BM25 weight is 0 - Eliminated 15-20 minute timeouts for large collections (160k+ files) - Reduced response time from ~3 minutes to ~10-30 seconds for medium collections (~10k files) - Sequential collection fetching bottleneck that caused delays with multiple collections ### Breaking Changes - None - changes are backward compatible and only activate under specific conditions --- ### Additional Information #### Technical Details **Parallel Collection Fetching:** - Uses `asyncio.gather` to fetch collections concurrently - Maintains individual error handling for each collection - Reduces total fetch time from `N × T` to `~T` for N collections **Early BM25 Bypass:** - Only activates when **both** conditions are true: - `hybrid_bm25_weight <= 0` (BM25 not wanted) - `not enable_enriched_texts` (No metadata enrichment wanted) - Falls through to existing logic when conditions aren't met - Skips: BM25 text enrichment, BM25 retriever initialization, EnsembleRetriever creation #### Testing Note **Code changes are conservative:** - Parallel collection fetching uses standard `asyncio.gather` patterns - Early BM25 bypass only activates when weight=0 AND enrichment disabled - Existing behavior unchanged when conditions not met **Performance testing requested:** @galvanoid (original issue reporter) - Could you test this PR with your 10k and 160k file collections to verify it resolves the latency issues you reported? **Expected improvements:** - ~10k files: Response time should return to v0.6.25 levels (~10-30s vs ~3min) - ~160k files: Should eliminate 15-20min timeouts ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 06:00:29 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#25575