[PR #21643] [CLOSED] fix: correct document/metadata swap in hybrid search sort #65034

Closed
opened 2026-05-06 10:47:38 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/21643
Author: @hackwell
Created: 2/20/2026
Status: Closed

Base: mainHead: fix/hybrid-search-document-metadata-swap


📝 Commits (1)

  • b3851a8 fix: correct document/metadata swap in hybrid search sort

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 backend/open_webui/retrieval/utils.py (+1 -1)

📄 Description

Summary

  • Fix swapped documents and metadatas variables in query_doc_with_hybrid_search when k < k_reranker

Problem

In backend/open_webui/retrieval/utils.py, line 295, the zip() call used the order (distances, metadatas, documents):

sorted_items = sorted(
    zip(distances, metadatas, documents), key=lambda x: x[0], reverse=True
)

But the subsequent unpacking at line 300 expected (distances, documents, metadatas):

distances, documents, metadatas = map(list, zip(*sorted_items))

This caused documents to contain metadata dicts and metadatas to contain document strings whenever k < k_reranker (the default config: TOP_K=8, TOP_K_RERANKER=20).

Downstream, merge_and_sort_query_results checks isinstance(document, str) which returned False for the dict objects, silently discarding all hybrid search results.

Impact

All hybrid search + reranking results are silently dropped when TOP_K < TOP_K_RERANKER (the default configuration). The RAG pipeline appears to work (Qdrant returns results, reranker scores them) but no documents reach the model context.

Fix

One-character fix: change zip(distances, metadatas, documents) to zip(distances, documents, metadatas) to match the unpacking order.

Test plan

  • Verified with direct API call to /api/v1/retrieval/query/collection — returns documents with correct types
  • Confirmed merge_and_sort_query_results receives strings in documents field (not dicts)
  • Tested end-to-end with knowledge base containing PDF invoices

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/21643 **Author:** [@hackwell](https://github.com/hackwell) **Created:** 2/20/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix/hybrid-search-document-metadata-swap` --- ### 📝 Commits (1) - [`b3851a8`](https://github.com/open-webui/open-webui/commit/b3851a8b306d0f2da836918c8c9634bc50129ed8) fix: correct document/metadata swap in hybrid search sort ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/utils.py` (+1 -1) </details> ### 📄 Description ## Summary - Fix swapped `documents` and `metadatas` variables in `query_doc_with_hybrid_search` when `k < k_reranker` ## Problem In `backend/open_webui/retrieval/utils.py`, line 295, the `zip()` call used the order `(distances, metadatas, documents)`: ```python sorted_items = sorted( zip(distances, metadatas, documents), key=lambda x: x[0], reverse=True ) ``` But the subsequent unpacking at line 300 expected `(distances, documents, metadatas)`: ```python distances, documents, metadatas = map(list, zip(*sorted_items)) ``` This caused `documents` to contain metadata dicts and `metadatas` to contain document strings whenever `k < k_reranker` (the default config: `TOP_K=8`, `TOP_K_RERANKER=20`). Downstream, `merge_and_sort_query_results` checks `isinstance(document, str)` which returned `False` for the dict objects, silently discarding **all** hybrid search results. ## Impact **All hybrid search + reranking results are silently dropped** when `TOP_K < TOP_K_RERANKER` (the default configuration). The RAG pipeline appears to work (Qdrant returns results, reranker scores them) but no documents reach the model context. ## Fix One-character fix: change `zip(distances, metadatas, documents)` to `zip(distances, documents, metadatas)` to match the unpacking order. ## Test plan - [x] Verified with direct API call to `/api/v1/retrieval/query/collection` — returns documents with correct types - [x] Confirmed `merge_and_sort_query_results` receives strings in documents field (not dicts) - [x] Tested end-to-end with knowledge base containing PDF invoices --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 10:47:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#65034