[PR #22892] [CLOSED] fix: use hybrid search and reranking in builtin query_knowledge_files tool #42545

Closed
opened 2026-04-25 14:24:45 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/22892
Author: @daudo
Created: 3/20/2026
Status: Closed

Base: devHead: fix/builtin-knowledge-hybrid-search


📝 Commits (1)

  • 8954487 fix: use hybrid search and reranking in builtin query_knowledge_files tool

📊 Changes

1 file changed (+35 additions, -7 deletions)

View changed files

📝 backend/open_webui/tools/builtin.py (+35 -7)

📄 Description

Pull Request Checklist

  • Target branch: Targets dev
  • Description: See below
  • Changelog: See below
  • Documentation: No user-facing behavior change — hybrid search/reranking now applies to the same admin settings that already exist
  • Dependencies: None
  • Testing: Manually tested on a live deployment with native function calling enabled, TEI embeddings (bge-m3), TEI reranker (bge-reranker-v2-m3), and model-attached knowledge (10 epub files). Confirmed hybrid search and reranker are now invoked via container logs. Confirmed fallback to plain vector search works when hybrid search is disabled.
  • Agentic AI Code: AI-assisted but human-reviewed, manually tested on live deployment, and verified against the existing middleware implementation
  • Code review: Self-reviewed
  • Design & Architecture: No new settings — uses existing RAG config values
  • Git Hygiene: Single atomic commit, rebased on dev
  • Title Prefix: fix:

Changelog Entry

Description

The builtin query_knowledge_files tool (used when native function calling is enabled) calls query_collection() directly — plain vector search only. The middleware RAG pipeline (get_sources_from_items) uses query_collection_with_hybrid_search() with BM25 hybrid search, reranking, and relevance threshold filtering.

This means that when a model has attached knowledge and native function calling is enabled, the admin-configured RAG quality settings (hybrid search, reranking, relevance threshold, TOP_K_RERANKER) have no effect on knowledge retrieval.

This fix makes query_knowledge_files use the same hybrid search + reranking pipeline as the middleware path, controlled by the same existing config flags.

Added

  • N/A

Changed

  • query_knowledge_files in builtin.py now uses query_collection_with_hybrid_search() when ENABLE_RAG_HYBRID_SEARCH is enabled, with reranking function, TOP_K_RERANKER, RELEVANCE_THRESHOLD, HYBRID_BM25_WEIGHT, and enriched texts support

Deprecated

  • N/A

Removed

  • N/A

Fixed

  • Builtin query_knowledge_files tool now respects hybrid search and reranking settings, matching the behavior of the middleware RAG pipeline
  • Graceful fallback to plain vector search if hybrid search fails or is disabled (identical to previous behavior)

Security

  • N/A

Breaking Changes

  • N/A

Additional Information

  • The native function calling builtin tools (builtin.py) were introduced in commit 5c1d5223 (Jan 2026) and the knowledge tools added in c8622adc, using query_collection() from the start. The hybrid search + reranking infrastructure in retrieval/utils.py predates the builtin tools but was never wired in.
  • The return format of query_collection_with_hybrid_search() is identical to query_collection() (both use merge_and_sort_query_results()), so no changes to result processing were needed.
  • All parameters (RERANKING_FUNCTION, TOP_K_RERANKER, RELEVANCE_THRESHOLD, HYBRID_BM25_WEIGHT, ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS) are accessed via __request__.app.state, which is already available in the tool function.

Contributor License Agreement


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/22892 **Author:** [@daudo](https://github.com/daudo) **Created:** 3/20/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix/builtin-knowledge-hybrid-search` --- ### 📝 Commits (1) - [`8954487`](https://github.com/open-webui/open-webui/commit/89544877f161871ca68722fa661fe1c4f31fdf45) fix: use hybrid search and reranking in builtin query_knowledge_files tool ### 📊 Changes **1 file changed** (+35 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/tools/builtin.py` (+35 -7) </details> ### 📄 Description <!-- ⚠️ CRITICAL CHECKS FOR CONTRIBUTORS (READ, DON'T DELETE) ⚠️ 1. Target the `dev` branch. PRs targeting `main` will be automatically closed. 2. Do NOT delete the CLA section at the bottom. It is required for the bot to accept your PR. --> # Pull Request Checklist - [x] **Target branch:** Targets `dev` - [x] **Description:** See below - [x] **Changelog:** See below - [ ] **Documentation:** No user-facing behavior change — hybrid search/reranking now applies to the same admin settings that already exist - [x] **Dependencies:** None - [x] **Testing:** Manually tested on a live deployment with native function calling enabled, TEI embeddings (bge-m3), TEI reranker (bge-reranker-v2-m3), and model-attached knowledge (10 epub files). Confirmed hybrid search and reranker are now invoked via container logs. Confirmed fallback to plain vector search works when hybrid search is disabled. - [x] **Agentic AI Code:** AI-assisted but human-reviewed, manually tested on live deployment, and verified against the existing middleware implementation - [x] **Code review:** Self-reviewed - [x] **Design & Architecture:** No new settings — uses existing RAG config values - [x] **Git Hygiene:** Single atomic commit, rebased on `dev` - [x] **Title Prefix:** `fix:` # Changelog Entry ### Description The builtin `query_knowledge_files` tool (used when native function calling is enabled) calls `query_collection()` directly — plain vector search only. The middleware RAG pipeline (`get_sources_from_items`) uses `query_collection_with_hybrid_search()` with BM25 hybrid search, reranking, and relevance threshold filtering. This means that when a model has attached knowledge and native function calling is enabled, the admin-configured RAG quality settings (hybrid search, reranking, relevance threshold, `TOP_K_RERANKER`) have no effect on knowledge retrieval. This fix makes `query_knowledge_files` use the same hybrid search + reranking pipeline as the middleware path, controlled by the same existing config flags. ### Added - N/A ### Changed - `query_knowledge_files` in `builtin.py` now uses `query_collection_with_hybrid_search()` when `ENABLE_RAG_HYBRID_SEARCH` is enabled, with reranking function, `TOP_K_RERANKER`, `RELEVANCE_THRESHOLD`, `HYBRID_BM25_WEIGHT`, and enriched texts support ### Deprecated - N/A ### Removed - N/A ### Fixed - Builtin `query_knowledge_files` tool now respects hybrid search and reranking settings, matching the behavior of the middleware RAG pipeline - Graceful fallback to plain vector search if hybrid search fails or is disabled (identical to previous behavior) ### Security - N/A ### Breaking Changes - N/A --- ### Additional Information - The native function calling builtin tools (`builtin.py`) were introduced in commit 5c1d5223 (Jan 2026) and the knowledge tools added in c8622adc, using `query_collection()` from the start. The hybrid search + reranking infrastructure in `retrieval/utils.py` predates the builtin tools but was never wired in. - The return format of `query_collection_with_hybrid_search()` is identical to `query_collection()` (both use `merge_and_sort_query_results()`), so no changes to result processing were needed. - All parameters (`RERANKING_FUNCTION`, `TOP_K_RERANKER`, `RELEVANCE_THRESHOLD`, `HYBRID_BM25_WEIGHT`, `ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS`) are accessed via `__request__.app.state`, which is already available in the tool function. ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 14:24:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#42545