[PR #20342] perf: Fix hybrid search performance regression with parallel collection fetching and BM25 bypass #64431

New Issue

GiteaMirror · 2026-05-06T10:00:40-05:00

GiteaMirror commented

2026-05-06 10:00:40 -05:00

Original Pull Request: https://github.com/open-webui/open-webui/pull/20342

State: closed
Merged: No

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
Description: Provide a concise description of the changes made in this pull request down below.
Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
- BREAKING CHANGE: Significant changes that may affect compatibility
- build: Changes that affect the build system or external dependencies
- ci: Changes to our continuous integration processes or workflows
- chore: Refactor, cleanup, or other non-functional code changes
- docs: Documentation update or addition
- feat: Introduces a new feature or enhancement to the codebase
- fix: Bug fix or error correction
- i18n: Internationalization or localization changes
- perf: Performance improvement
- refactor: Code restructuring for better maintainability, readability, or scalability
- style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
- test: Adding missing tests or correcting existing tests
- WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

This PR addresses a critical performance regression introduced in version 0.6.26 where hybrid search with BM25 disabled experiences dramatic latency increases. The regression makes retrieval workflows impractical for collections with 10k+ files, with response times increasing from 10-30 seconds (v0.6.25) to 3+ minutes, and becoming completely unusable for larger collections (160k+ files) with timeouts at 15-20 minutes.

Root causes identified:

Sequential collection fetching - Collections were fetched one by one in a synchronous loop before any parallel processing, creating a bottleneck with multiple collections
Unnecessary BM25 processing - Even when BM25 weight was set to 0, the system still initialized BM25 retrievers and processed enriched texts

Solutions implemented:

Parallel collection fetching - Replaced sequential loop with asyncio.gather to fetch multiple collections concurrently, eliminating N-1 sequential waits
Early BM25 bypass - Added fast-path logic to skip BM25 initialization entirely when weight is 0 and enrichment is disabled

Added

New [fetch_collection_data()] async helper function to fetch multiple collections in parallel using asyncio.gather
Early bypass logic in [query_doc_with_hybrid_search()] to skip BM25 processing when not needed
Debug logging for BM25 bypass activation

Changed

[query_collection_with_hybrid_search()] now uses parallel collection fetching instead of sequential loop
Collection fetching logic refactored for better concurrency and reduced latency

Fixed

Performance regression in hybrid search (Fixes #20327)
- Restored response times to v0.6.25 levels when BM25 weight is 0
- Eliminated 15-20 minute timeouts for large collections (160k+ files)
- Reduced response time from ~3 minutes to ~10-30 seconds for medium collections (~10k files)
Sequential collection fetching bottleneck that caused delays with multiple collections

Breaking Changes

None - changes are backward compatible and only activate under specific conditions

Additional Information

Technical Details

Parallel Collection Fetching:

Uses asyncio.gather to fetch collections concurrently
Maintains individual error handling for each collection
Reduces total fetch time from N × T to ~T for N collections

Early BM25 Bypass:

Only activates when both conditions are true:
- hybrid_bm25_weight <= 0 (BM25 not wanted)
- not enable_enriched_texts (No metadata enrichment wanted)
Falls through to existing logic when conditions aren't met
Skips: BM25 text enrichment, BM25 retriever initialization, EnsembleRetriever creation

Testing Note

Code changes are conservative:

Parallel collection fetching uses standard asyncio.gather patterns
Early BM25 bypass only activates when weight=0 AND enrichment disabled
Existing behavior unchanged when conditions not met

Performance testing requested:
@galvanoid (original issue reporter) - Could you test this PR with your 10k and 160k file collections to verify it resolves the latency issues you reported?

Expected improvements:

~10k files: Response time should return to v0.6.25 levels (~10-30s vs ~3min)
~160k files: Should eliminate 15-20min timeouts

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.

**Original Pull Request:** https://github.com/open-webui/open-webui/pull/20342 **State:** closed **Merged:** No --- # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request. This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Verify that the pull request targets the `dev` branch. **Not targeting the `dev` branch will lead to immediate closure of the PR.** - [x] **Description:** Provide a concise description of the changes made in this pull request down below. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [X] **Documentation:** If necessary, update relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs) like environment variables, the tutorials, or other documentation sources. - [X] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [ ] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Take this as an opportunity to **make screenshots of the feature/fix and include it in the PR description**. - [x] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work --- # Changelog Entry ### Description This PR addresses a critical performance regression introduced in version 0.6.26 where hybrid search with BM25 disabled experiences dramatic latency increases. The regression makes retrieval workflows impractical for collections with 10k+ files, with response times increasing from 10-30 seconds (v0.6.25) to 3+ minutes, and becoming completely unusable for larger collections (160k+ files) with timeouts at 15-20 minutes. **Root causes identified:** 1. **Sequential collection fetching** - Collections were fetched one by one in a synchronous loop before any parallel processing, creating a bottleneck with multiple collections 2. **Unnecessary BM25 processing** - Even when BM25 weight was set to `0`, the system still initialized BM25 retrievers and processed enriched texts **Solutions implemented:** 1. **Parallel collection fetching** - Replaced sequential loop with `asyncio.gather` to fetch multiple collections concurrently, eliminating N-1 sequential waits 2. **Early BM25 bypass** - Added fast-path logic to skip BM25 initialization entirely when weight is 0 and enrichment is disabled ### Added - New `[fetch_collection_data()]` async helper function to fetch multiple collections in parallel using `asyncio.gather` - Early bypass logic in `[query_doc_with_hybrid_search()]` to skip BM25 processing when not needed - Debug logging for BM25 bypass activation ### Changed - `[query_collection_with_hybrid_search()]` now uses parallel collection fetching instead of sequential loop - Collection fetching logic refactored for better concurrency and reduced latency ### Fixed - **Performance regression in hybrid search** (Fixes #20327) - Restored response times to v0.6.25 levels when BM25 weight is 0 - Eliminated 15-20 minute timeouts for large collections (160k+ files) - Reduced response time from ~3 minutes to ~10-30 seconds for medium collections (~10k files) - Sequential collection fetching bottleneck that caused delays with multiple collections ### Breaking Changes - None - changes are backward compatible and only activate under specific conditions --- ### Additional Information #### Technical Details **Parallel Collection Fetching:** - Uses `asyncio.gather` to fetch collections concurrently - Maintains individual error handling for each collection - Reduces total fetch time from `N × T` to `~T` for N collections **Early BM25 Bypass:** - Only activates when **both** conditions are true: - `hybrid_bm25_weight <= 0` (BM25 not wanted) - `not enable_enriched_texts` (No metadata enrichment wanted) - Falls through to existing logic when conditions aren't met - Skips: BM25 text enrichment, BM25 retriever initialization, EnsembleRetriever creation #### Testing Note **Code changes are conservative:** - Parallel collection fetching uses standard `asyncio.gather` patterns - Early BM25 bypass only activates when weight=0 AND enrichment disabled - Existing behavior unchanged when conditions not met **Performance testing requested:** @galvanoid (original issue reporter) - Could you test this PR with your 10k and 160k file collections to verify it resolves the latency issues you reported? **Expected improvements:** - ~10k files: Response time should return to v0.6.25 levels (~10-30s vs ~3min) - ~160k files: Should eliminate 15-20min timeouts ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.

GiteaMirror added the pull-request label 2026-05-06 10:00:40 -05:00

GiteaMirror closed this issue

2026-05-06 10:00:44 -05:00

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#64431