[PR #21644] [CLOSED] fix: correct document/metadata swap in hybrid search sort #49227

Closed
opened 2026-04-30 01:33:15 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/21644
Author: @hackwell
Created: 2/20/2026
Status: Closed

Base: devHead: fix/hybrid-search-document-metadata-swap


📝 Commits (10+)

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 backend/open_webui/retrieval/utils.py (+1 -1)

📄 Description

Pull Request Checklist

  • Target branch: Verify that the pull request targets the dev branch.
  • Description: Provided below.
  • Changelog: Provided below.
  • Documentation: No user-facing behavior change, no new env vars or APIs.
  • Dependencies: No new or upgraded dependencies.
  • Testing: Manually tested via direct API call to /api/v1/retrieval/query/collection with a Qdrant-backed knowledge base and hybrid search + reranking enabled. Verified results are returned correctly after the fix. See details below.
  • Agentic AI Code: This fix was identified through manual debugging of a production issue and has been manually tested and verified on a live Open WebUI instance.
  • Code review: Self-reviewed. The change is a one-line fix correcting variable ordering.
  • Design & Architecture: No architectural changes. Single bug fix.
  • Git Hygiene: Single atomic commit with one logical change.
  • Title Prefix: fix:

Changelog Entry

Description

Fix a variable ordering bug in query_doc_with_hybrid_search that causes all hybrid search results to be silently dropped when TOP_K < TOP_K_RERANKER (the default configuration).

Fixed

  • Hybrid search returns empty results despite successful Qdrant queries and reranking: In backend/open_webui/retrieval/utils.py line 295, zip(distances, metadatas, documents) used incorrect ordering. The subsequent unpacking distances, documents, metadatas = map(list, zip(*sorted_items)) expected (distances, documents, metadatas) order, causing the documents and metadatas variables to be swapped. Downstream, merge_and_sort_query_results checks isinstance(document, str) which returned False for the dict metadata objects now in the documents field, silently discarding all results.

Additional Information

Root cause analysis:

The sorting block at line 293-302 activates when k < k_reranker (default: TOP_K=8, TOP_K_RERANKER=20):

# BEFORE (broken):
sorted_items = sorted(
    zip(distances, metadatas, documents), ...  # order: dist, meta, doc
)
distances, documents, metadatas = map(list, zip(*sorted_items))  # expects: dist, doc, meta
# AFTER (fixed):
sorted_items = sorted(
    zip(distances, documents, metadatas), ...  # order: dist, doc, meta  ← matches unpacking
)
distances, documents, metadatas = map(list, zip(*sorted_items))  # expects: dist, doc, meta

Impact:

  • All RAG hybrid search results are silently lost when reranking is enabled with default settings
  • The Qdrant vector search and BM25 search execute correctly, the reranker scores documents correctly, but merge_and_sort_query_results discards everything because it receives dicts where it expects strings
  • No error is logged — the failure is completely silent

How it was discovered:

  • Production Open WebUI instance with Qdrant + Azure OpenAI embeddings + hybrid search + BAAI/bge-reranker-v2-m3
  • Chat responses showed sources_retrieved: count: 0 despite logs showing successful query_doc_with_hybrid_search:result with actual document content and scores
  • Added debug logging to merge_and_sort_query_results which revealed first_doc_type=<class 'dict'> instead of the expected <class 'str'>

Testing

Before fix — API call to /api/v1/retrieval/query/collection:

{"distances":[[]],"documents":[[]],"metadatas":[[]]}

After fix — same API call:

Results: 4 documents
  [0] score=0.8096 type(meta)=dict doc_preview=exporto GmbH | Bücklestraße 5...
  [1] score=0.6950 type(meta)=dict doc_preview=## Anmerkung Der Rechnungsbetrag...
  [2] score=0.1290 type(meta)=dict doc_preview=## Rechnung Nr. 1539664...
  [3] score=0.0582 type(meta)=dict doc_preview=| DATEV Zahlungsdatenservice...

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/21644 **Author:** [@hackwell](https://github.com/hackwell) **Created:** 2/20/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix/hybrid-search-document-metadata-swap` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`6f1486f`](https://github.com/open-webui/open-webui/commit/6f1486ffd0cb288d0e21f41845361924e0d742b3) Merge pull request #19466 from open-webui/dev - [`d95f533`](https://github.com/open-webui/open-webui/commit/d95f533214e3fe5beb5e41ec1f349940bc4c7043) Merge pull request #19729 from open-webui/dev - [`a727153`](https://github.com/open-webui/open-webui/commit/a7271532f8a38da46785afcaa7e65f9a45e7d753) 0.6.43 (#20093) - [`6adde20`](https://github.com/open-webui/open-webui/commit/6adde203cd292a9e3af9c64a2ae36b603fed096a) Merge pull request #20394 from open-webui/dev - [`f9b0534`](https://github.com/open-webui/open-webui/commit/f9b0534e0c442631d1cb7205169588b9b6204179) Merge pull request #20522 from open-webui/dev ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/utils.py` (+1 -1) </details> ### 📄 Description <!-- ⚠️ CRITICAL CHECKS FOR CONTRIBUTORS (READ, DON'T DELETE) ⚠️ 1. Target the `dev` branch. PRs targeting `main` will be automatically closed. 2. Do NOT delete the CLA section at the bottom. It is required for the bot to accept your PR. --> # Pull Request Checklist - [x] **Target branch:** Verify that the pull request targets the `dev` branch. - [x] **Description:** Provided below. - [x] **Changelog:** Provided below. - [x] **Documentation:** No user-facing behavior change, no new env vars or APIs. - [x] **Dependencies:** No new or upgraded dependencies. - [x] **Testing:** Manually tested via direct API call to `/api/v1/retrieval/query/collection` with a Qdrant-backed knowledge base and hybrid search + reranking enabled. Verified results are returned correctly after the fix. See details below. - [x] **Agentic AI Code:** This fix was identified through manual debugging of a production issue and has been manually tested and verified on a live Open WebUI instance. - [x] **Code review:** Self-reviewed. The change is a one-line fix correcting variable ordering. - [x] **Design & Architecture:** No architectural changes. Single bug fix. - [x] **Git Hygiene:** Single atomic commit with one logical change. - [x] **Title Prefix:** `fix:` # Changelog Entry ### Description Fix a variable ordering bug in `query_doc_with_hybrid_search` that causes **all hybrid search results to be silently dropped** when `TOP_K < TOP_K_RERANKER` (the default configuration). ### Fixed - **Hybrid search returns empty results despite successful Qdrant queries and reranking**: In `backend/open_webui/retrieval/utils.py` line 295, `zip(distances, metadatas, documents)` used incorrect ordering. The subsequent unpacking `distances, documents, metadatas = map(list, zip(*sorted_items))` expected `(distances, documents, metadatas)` order, causing the `documents` and `metadatas` variables to be swapped. Downstream, `merge_and_sort_query_results` checks `isinstance(document, str)` which returned `False` for the dict metadata objects now in the documents field, silently discarding all results. --- ### Additional Information **Root cause analysis:** The sorting block at line 293-302 activates when `k < k_reranker` (default: `TOP_K=8`, `TOP_K_RERANKER=20`): ```python # BEFORE (broken): sorted_items = sorted( zip(distances, metadatas, documents), ... # order: dist, meta, doc ) distances, documents, metadatas = map(list, zip(*sorted_items)) # expects: dist, doc, meta ``` ```python # AFTER (fixed): sorted_items = sorted( zip(distances, documents, metadatas), ... # order: dist, doc, meta ← matches unpacking ) distances, documents, metadatas = map(list, zip(*sorted_items)) # expects: dist, doc, meta ``` **Impact:** - All RAG hybrid search results are silently lost when reranking is enabled with default settings - The Qdrant vector search and BM25 search execute correctly, the reranker scores documents correctly, but `merge_and_sort_query_results` discards everything because it receives dicts where it expects strings - No error is logged — the failure is completely silent **How it was discovered:** - Production Open WebUI instance with Qdrant + Azure OpenAI embeddings + hybrid search + BAAI/bge-reranker-v2-m3 - Chat responses showed `sources_retrieved: count: 0` despite logs showing successful `query_doc_with_hybrid_search:result` with actual document content and scores - Added debug logging to `merge_and_sort_query_results` which revealed `first_doc_type=<class 'dict'>` instead of the expected `<class 'str'>` ### Testing **Before fix** — API call to `/api/v1/retrieval/query/collection`: ```json {"distances":[[]],"documents":[[]],"metadatas":[[]]} ``` **After fix** — same API call: ``` Results: 4 documents [0] score=0.8096 type(meta)=dict doc_preview=exporto GmbH | Bücklestraße 5... [1] score=0.6950 type(meta)=dict doc_preview=## Anmerkung Der Rechnungsbetrag... [2] score=0.1290 type(meta)=dict doc_preview=## Rechnung Nr. 1539664... [3] score=0.0582 type(meta)=dict doc_preview=| DATEV Zahlungsdatenservice... ``` ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-30 01:33:15 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#49227