[PR #22895] [CLOSED] fix: support TEI reranker format in ExternalReranker with auto-detection #49966

Closed
opened 2026-04-30 02:25:01 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/22895
Author: @daudo
Created: 3/20/2026
Status: Closed

Base: devHead: fix/external-reranker-tei-format


📝 Commits (1)

  • 0f9be21 fix: support TEI reranker format in ExternalReranker with auto-detection

📊 Changes

1 file changed (+86 additions, -11 deletions)

View changed files

📝 backend/open_webui/retrieval/models/external.py (+86 -11)

📄 Description

Pull Request Checklist

  • Target branch: Targets dev
  • Description: See below
  • Changelog: See below
  • Documentation: No user-facing config changes - TEI format is auto-detected transparently
  • Dependencies: None
  • Testing: Manually tested on a live deployment with TEI reranker (bge-reranker-v2-m3 via ghcr.io/huggingface/text-embeddings-inference:120-1.9.1). Confirmed format auto-detection, reranking scores returned correctly, thread-safe under 10 parallel rerank calls. Also verified Cohere/Jina format path is unchanged (probe succeeds on first try, cached).
  • Agentic AI Code: AI-assisted but human-reviewed, manually tested on live deployment
  • Code review: Self-reviewed
  • Design & Architecture: No new settings - format is auto-detected via smart default
  • Git Hygiene: Single atomic commit, rebased on dev
  • Title Prefix: fix:

Changelog Entry

Description

ExternalReranker only supports the Cohere/Jina rerank API format ("documents" key, {"results": [{"relevance_score": ...}]} response). HuggingFace Text Embeddings Inference (TEI) uses a different format ("texts" key, flat array response with "score" key), causing a silent 422 error that falls back to unreranked results. Users with TEI rerankers configured get no reranking despite the setting being enabled.

Additionally, ChromaClient.search() silently swallows all exceptions (except: return None), making it impossible to diagnose vector search failures.

Added

  • Auto-detection of reranker API format (Cohere/Jina vs TEI) on first call, cached for subsequent requests
  • Thread-safe format detection via lock (hybrid search dispatches parallel rerank calls across collections)
  • Support for TEI response format (flat array with "score" key)

Changed

  • ExternalReranker.predict() now auto-detects and caches the API format instead of assuming Cohere/Jina
  • Response parsing extracted to _parse_response() static method, handles both Cohere/Jina and TEI formats
  • Payload construction extracted to _build_payload() method

Deprecated

  • N/A

Removed

  • N/A

Fixed

  • External reranker now works with TEI endpoints out of the box (no config change needed)
    - ChromaClient.search() now logs exceptions instead of silently returning None

Security

  • N/A

Breaking Changes

  • N/A

Additional Information

  • TEI's /rerank endpoint expects {"query": "...", "texts": [...], "truncate": true} and returns [{"index": 0, "score": 0.5}, ...]. Open WebUI sends {"model": "...", "query": "...", "documents": [...], "top_n": N} and expects {"results": [{"index": 0, "relevance_score": 0.5}]}. The 422 from TEI was caught by the generic exception handler in predict(), silently falling back to unreranked results.
  • Format detection sends one Cohere-format probe on first call. If the endpoint returns 422, TEI format is used. The result is cached as a class-level dict keyed by URL, surviving ExternalReranker re-instantiation (which happens on config updates via the admin UI). A threading.Lock prevents concurrent probe requests when hybrid search dispatches parallel rerank calls.
    - The ChromaDB logging change is included because the silent except: return None in search() made it very difficult to diagnose a vector search failure encountered during testing of the reranker integration.

This is related to my other PR https://github.com/open-webui/open-webui/pull/22892, which enables reranking for query_knowledge_files, which in turn is my use case.

Contributor License Agreement


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/22895 **Author:** [@daudo](https://github.com/daudo) **Created:** 3/20/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix/external-reranker-tei-format` --- ### 📝 Commits (1) - [`0f9be21`](https://github.com/open-webui/open-webui/commit/0f9be213f7c0ce494c12ccbfbb36109957f1c815) fix: support TEI reranker format in ExternalReranker with auto-detection ### 📊 Changes **1 file changed** (+86 additions, -11 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/models/external.py` (+86 -11) </details> ### 📄 Description <!-- ⚠️ CRITICAL CHECKS FOR CONTRIBUTORS (READ, DON'T DELETE) ⚠️ 1. Target the `dev` branch. PRs targeting `main` will be automatically closed. 2. Do NOT delete the CLA section at the bottom. It is required for the bot to accept your PR. --> # Pull Request Checklist - [x] **Target branch:** Targets `dev` - [x] **Description:** See below - [x] **Changelog:** See below - [ ] **Documentation:** No user-facing config changes - TEI format is auto-detected transparently - [x] **Dependencies:** None - [x] **Testing:** Manually tested on a live deployment with TEI reranker (bge-reranker-v2-m3 via `ghcr.io/huggingface/text-embeddings-inference:120-1.9.1`). Confirmed format auto-detection, reranking scores returned correctly, thread-safe under 10 parallel rerank calls. Also verified Cohere/Jina format path is unchanged (probe succeeds on first try, cached). - [x] **Agentic AI Code:** AI-assisted but human-reviewed, manually tested on live deployment - [x] **Code review:** Self-reviewed - [x] **Design & Architecture:** No new settings - format is auto-detected via smart default - [x] **Git Hygiene:** Single atomic commit, rebased on `dev` - [x] **Title Prefix:** `fix:` # Changelog Entry ### Description `ExternalReranker` only supports the Cohere/Jina rerank API format (`"documents"` key, `{"results": [{"relevance_score": ...}]}` response). HuggingFace Text Embeddings Inference (TEI) uses a different format (`"texts"` key, flat array response with `"score"` key), causing a silent 422 error that falls back to unreranked results. Users with TEI rerankers configured get no reranking despite the setting being enabled. ~~Additionally, `ChromaClient.search()` silently swallows all exceptions (`except: return None`), making it impossible to diagnose vector search failures.~~ ### Added - Auto-detection of reranker API format (Cohere/Jina vs TEI) on first call, cached for subsequent requests - Thread-safe format detection via lock (hybrid search dispatches parallel rerank calls across collections) - Support for TEI response format (flat array with `"score"` key) ### Changed - `ExternalReranker.predict()` now auto-detects and caches the API format instead of assuming Cohere/Jina - Response parsing extracted to `_parse_response()` static method, handles both Cohere/Jina and TEI formats - Payload construction extracted to `_build_payload()` method ### Deprecated - N/A ### Removed - N/A ### Fixed - External reranker now works with TEI endpoints out of the box (no config change needed) ~~- `ChromaClient.search()` now logs exceptions instead of silently returning `None`~~ ### Security - N/A ### Breaking Changes - N/A --- ### Additional Information - TEI's `/rerank` endpoint expects `{"query": "...", "texts": [...], "truncate": true}` and returns `[{"index": 0, "score": 0.5}, ...]`. Open WebUI sends `{"model": "...", "query": "...", "documents": [...], "top_n": N}` and expects `{"results": [{"index": 0, "relevance_score": 0.5}]}`. The 422 from TEI was caught by the generic exception handler in `predict()`, silently falling back to unreranked results. - Format detection sends one Cohere-format probe on first call. If the endpoint returns 422, TEI format is used. The result is cached as a class-level dict keyed by URL, surviving `ExternalReranker` re-instantiation (which happens on config updates via the admin UI). A `threading.Lock` prevents concurrent probe requests when hybrid search dispatches parallel rerank calls. ~~- The ChromaDB logging change is included because the silent `except: return None` in `search()` made it very difficult to diagnose a vector search failure encountered during testing of the reranker integration.~~ This is related to my other PR https://github.com/open-webui/open-webui/pull/22892, which enables reranking for query_knowledge_files, which in turn is my use case. ### Contributor License Agreement - [x] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-30 02:25:01 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#49966