[PR #20637] Perf: Optimize retrieval logic for pure vector search with reranker [ hybrid search enabled with BM25 weight set to 0] , significant performance improvements #64569

Open
opened 2026-05-06 10:10:45 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/20637
Author: @shrutichy91
Created: 1/13/2026
Status: 🔄 Open

Base: devHead: main


📝 Commits (10+)

📊 Changes

1 file changed (+70 additions, -40 deletions)

View changed files

📝 backend/open_webui/retrieval/utils.py (+70 -40)

📄 Description

Checks for hybrid_bm25_weight before calculating bm25 texts and bm25_retriever.

Also does not run VECTOR_DB_CLIENT.get as this is not required for pure vector search with reranker. Significantly improves performance (8x) for larger docs >3K

Before submitting, make sure you've checked the following:

  • Target branch: dev
  • Description:

This PR optimizes the document retrieval pipeline by skipping BM25-related data loading and computation when hybrid_bm25_weight <= 0.

These operations are unnecessary for vector-only search and add avoidable latency and memory overhead.

  • Testing:
    Tested for docs and observed significant performance improvements

2k large docs now take 13s as compared to previous 2m3s
10k small docs now take 11s as compared to previous 2m12s
This will not break hybrid search with BM_25 weight >0 or Non hybrid search as if conditions are provided.

  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
    Code review completed

Changelog Entry

Description

This PR optimizes the document retrieval pipeline by skipping BM25-related data loading and computation when hybrid_bm25_weight <= 0.

Currently, even when BM25 is effectively disabled (weight ≤ 0), the system may still:

Fetches documents using VECTOR_DB_CLIENT.get
Constructs BM25 retrievers in memory

These operations are unnecessary for vector-only search and add avoidable latency and memory overhead causing app server to hang for knowledge with docs >2k

Changed

Introduced an explicit short-circuit for the vector-only path when hybrid_bm25_weight <= 0
We dont need to call VECTOR_DB_CLIENT.get when BM25 is disabled
Bypassed BM25 retriever creation and ensemble logic in vector-only scenarios > if condition added and BM25 retriever logic is moved to BM25 weights > 0
reranking still applies correctly on vector search results [ taking advantage of reranker option in hybrid search]

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/20637 **Author:** [@shrutichy91](https://github.com/shrutichy91) **Created:** 1/13/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `main` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`6f1486f`](https://github.com/open-webui/open-webui/commit/6f1486ffd0cb288d0e21f41845361924e0d742b3) Merge pull request #19466 from open-webui/dev - [`d95f533`](https://github.com/open-webui/open-webui/commit/d95f533214e3fe5beb5e41ec1f349940bc4c7043) Merge pull request #19729 from open-webui/dev - [`a727153`](https://github.com/open-webui/open-webui/commit/a7271532f8a38da46785afcaa7e65f9a45e7d753) 0.6.43 (#20093) - [`6adde20`](https://github.com/open-webui/open-webui/commit/6adde203cd292a9e3af9c64a2ae36b603fed096a) Merge pull request #20394 from open-webui/dev - [`f9b0534`](https://github.com/open-webui/open-webui/commit/f9b0534e0c442631d1cb7205169588b9b6204179) Merge pull request #20522 from open-webui/dev ### 📊 Changes **1 file changed** (+70 additions, -40 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/utils.py` (+70 -40) </details> ### 📄 Description Checks for hybrid_bm25_weight before calculating bm25 texts and bm25_retriever. Also does not run VECTOR_DB_CLIENT.get as this is not required for pure vector search with reranker. Significantly improves performance (8x) for larger docs >3K **Before submitting, make sure you've checked the following:** - [ ] **Target branch:** dev - [ ] **Description:** This PR optimizes the document retrieval pipeline by skipping BM25-related data loading and computation when hybrid_bm25_weight <= 0. These operations are unnecessary for vector-only search and add avoidable latency and memory overhead. - [ ] **Testing:** Tested for docs and observed significant performance improvements 2k large docs now take 13s as compared to previous 2m3s 10k small docs now take 11s as compared to previous 2m12s This will not break hybrid search with BM_25 weight >0 or Non hybrid search as if conditions are provided. - [ ] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? Code review completed # Changelog Entry ### Description This PR optimizes the document retrieval pipeline by skipping BM25-related data loading and computation when hybrid_bm25_weight <= 0. Currently, even when BM25 is effectively disabled (weight ≤ 0), the system may still: Fetches documents using VECTOR_DB_CLIENT.get Constructs BM25 retrievers in memory These operations are unnecessary for vector-only search and add avoidable latency and memory overhead causing app server to hang for knowledge with docs >2k ### Changed Introduced an explicit short-circuit for the vector-only path when hybrid_bm25_weight <= 0 We dont need to call VECTOR_DB_CLIENT.get when BM25 is disabled Bypassed BM25 retriever creation and ensemble logic in vector-only scenarios > if condition added and BM25 retriever logic is moved to BM25 weights > 0 reranking still applies correctly on vector search results [ taking advantage of reranker option in hybrid search] ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 10:10:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#64569