[PR #1693] [MERGED] feat: hybrid search with reranking #7549

Closed
opened 2025-11-11 17:29:49 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/1693
Author: @buroa
Created: 4/22/2024
Status: Merged
Merged: 4/25/2024
Merged by: @tjbck

Base: devHead: buroa/hybrid-search


📝 Commits (9)

  • 4e0b32b feat: hybrid search
  • db801ae Merge branch 'dev' into buroa/hybrid-search
  • c0259aa feat: hybrid search and reranking support
  • adb009f Merge branch 'dev' into buroa/hybrid-search
  • d5f60b1 Merge branch 'dev' into buroa/hybrid-search
  • c9c9660 fix: address comment in pr #1687
  • e92680a chore: update changelog.md
  • 72090fa chore: update log line
  • 1c1d2c2 fix: query collection api call

📊 Changes

8 files changed (+650 additions, -171 deletions)

View changed files

📝 CHANGELOG.md (+4 -0)
📝 Dockerfile (+9 -3)
📝 backend/apps/rag/main.py (+123 -91)
📝 backend/apps/rag/utils.py (+295 -72)
📝 backend/config.py (+24 -4)
📝 backend/main.py (+2 -0)
📝 src/lib/apis/rag/index.ts (+62 -0)
📝 src/lib/components/documents/Settings/General.svelte (+131 -1)

📄 Description

This adds three features:

  1. Hybrid search. Ensemble BM25 + ChromaDB together. I use the documents returned from the ChromaDB collection to build the BM25 retriever.
  2. Reranking support. Using native sentence_transformers we can rerank the results from the hybrid search to get better relevant documents.
  3. Relevance filter. Best used with a reranking model, you can set the threshold to automatically filter out non-relevant documents.

I updated the UI to allow setting the reranking model and the relevance filter via API calls.

Lastly, query_embeddings_function returns a lambda function that is responsible for providing the correct embeddings for whatever embedding model / engine the user chose. I use this inside the custom ChromaDB retriever when searching and replaced it where relevant as well.

Please let me know if I need to explain anything here. I hope it makes sense :)


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/1693 **Author:** [@buroa](https://github.com/buroa) **Created:** 4/22/2024 **Status:** ✅ Merged **Merged:** 4/25/2024 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `buroa/hybrid-search` --- ### 📝 Commits (9) - [`4e0b32b`](https://github.com/open-webui/open-webui/commit/4e0b32b5052275829f254d81d7ceb38a06e64ad1) feat: hybrid search - [`db801ae`](https://github.com/open-webui/open-webui/commit/db801aee79382019f394ae039a8acfbef1b2e0f2) Merge branch 'dev' into buroa/hybrid-search - [`c0259aa`](https://github.com/open-webui/open-webui/commit/c0259aad67627de344e42d0f062b0a93d48ef41f) feat: hybrid search and reranking support - [`adb009f`](https://github.com/open-webui/open-webui/commit/adb009f388141c281d4dff2fb59c254e2688dcc9) Merge branch 'dev' into buroa/hybrid-search - [`d5f60b1`](https://github.com/open-webui/open-webui/commit/d5f60b119c50d10376ed5213963cf6d96a8e3fc9) Merge branch 'dev' into buroa/hybrid-search - [`c9c9660`](https://github.com/open-webui/open-webui/commit/c9c9660459e9bb98b6a58e66c8123bfff53cb04e) fix: address comment in pr #1687 - [`e92680a`](https://github.com/open-webui/open-webui/commit/e92680a566d4e31fe20e54cd3ff454b58a9dc739) chore: update changelog.md - [`72090fa`](https://github.com/open-webui/open-webui/commit/72090fab8823aee600b201022d59bf1033251d61) chore: update log line - [`1c1d2c2`](https://github.com/open-webui/open-webui/commit/1c1d2c254dc9c0a87ab99b9304a24b7725bbca90) fix: query collection api call ### 📊 Changes **8 files changed** (+650 additions, -171 deletions) <details> <summary>View changed files</summary> 📝 `CHANGELOG.md` (+4 -0) 📝 `Dockerfile` (+9 -3) 📝 `backend/apps/rag/main.py` (+123 -91) 📝 `backend/apps/rag/utils.py` (+295 -72) 📝 `backend/config.py` (+24 -4) 📝 `backend/main.py` (+2 -0) 📝 `src/lib/apis/rag/index.ts` (+62 -0) 📝 `src/lib/components/documents/Settings/General.svelte` (+131 -1) </details> ### 📄 Description This adds three features: 1. Hybrid search. Ensemble `BM25` + `ChromaDB` together. I use the documents returned from the ChromaDB collection to build the BM25 retriever. 2. Reranking support. Using native `sentence_transformers` we can rerank the results from the hybrid search to get better relevant documents. 3. Relevance filter. Best used with a reranking model, you can set the threshold to automatically filter out non-relevant documents. I updated the UI to allow setting the reranking model and the relevance filter via API calls. Lastly, `query_embeddings_function` returns a lambda function that is responsible for providing the correct embeddings for whatever embedding model / engine the user chose. I use this inside the custom ChromaDB retriever when searching and replaced it where relevant as well. Please let me know if I need to explain anything here. I hope it makes sense :) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 17:29:49 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#7549