[GH-ISSUE #8478] feat: Allow reranker to be accessed via API instead of local model #53806

New Issue

GiteaMirror · 2026-05-05T15:21:42-05:00

GiteaMirror commented

2026-05-05 15:21:42 -05:00

Originally created by @GrayXu on GitHub (Jan 11, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8478

Is your feature request related to a problem? Please describe.

Currently, the reranker model used in RAG can only be run locally after being pulled. However, there are now many MaaS providers offering rerankers. I would like to use reranker models via API within open-webui, which would make the server much lighter.

Describe the solution you'd like

Although the OpenAI API does not have a reranker API, there is a widely used API pattern for rerankers in MaaS, such as /v1/rerank, used by projects like siliconflow, xinference, and api-for-open-llm.

Like this:

curl --request POST \
  --url https://api.siliconflow.cn/v1/rerank \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "BAAI/bge-reranker-v2-m3",
  "query": "Apple",
  "documents": [
    "apple",
    "banana",
    "fruit",
    "vegetable"
  ],
  "top_n": 4,
  "return_documents": false,
  "max_chunks_per_doc": 1024,
  "overlap_tokens": 80
}'

Thanks

Originally created by @GrayXu on GitHub (Jan 11, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/8478 **Is your feature request related to a problem? Please describe.** Currently, the reranker model used in RAG can only be run locally after being pulled. However, there are now many MaaS providers offering rerankers. I would like to use reranker models via API within open-webui, which would make the server much lighter. **Describe the solution you'd like** Although the OpenAI API does not have a reranker API, there is a widely used API pattern for rerankers in MaaS, such as `/v1/rerank`, used by projects like siliconflow, xinference, and api-for-open-llm. Like this: ``` curl --request POST \ --url https://api.siliconflow.cn/v1/rerank \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{ "model": "BAAI/bge-reranker-v2-m3", "query": "Apple", "documents": [ "apple", "banana", "fruit", "vegetable" ], "top_n": 4, "return_documents": false, "max_chunks_per_doc": 1024, "overlap_tokens": 80 }' ``` Thanks

GiteaMirror added the enhancement good first issue help wanted labels 2026-05-05 15:21:43 -05:00

GiteaMirror closed this issue

2026-05-05 15:21:46 -05:00

GiteaMirror commented

2026-05-05 15:21:49 -05:00

@tjbck commented on GitHub (Jan 13, 2025):

PR Welcome here!

@tjbck commented on GitHub (Jan 13, 2025): PR Welcome here!

GiteaMirror commented

2026-05-05 15:21:51 -05:00

@GrayXu commented on GitHub (Jan 22, 2025):

Thanks for your reply, I'll see what I can do.

@GrayXu commented on GitHub (Jan 22, 2025): Thanks for your reply, I'll see what I can do.

GiteaMirror commented

2026-05-05 15:21:52 -05:00

@rjmalagon commented on GitHub (Mar 11, 2025):

Llama.cpp server has an initial support for reranking jobs.
https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md#post-reranking-rerank-documents-according-to-a-given-query

Mixedbread AI released their Qwen2 based rerankers, mxbai-rerank-base-v2 and mxbai-rerank-large-v2.
https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v2
https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v2

I think these high quality rerankers are a good example of models that can be supported by llama.cpp. It would be nice to offload reranking to llama.cpp servers too.

@rjmalagon commented on GitHub (Mar 11, 2025): Llama.cpp server has an initial support for reranking jobs. https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md#post-reranking-rerank-documents-according-to-a-given-query Mixedbread AI released their Qwen2 based rerankers, mxbai-rerank-base-v2 and mxbai-rerank-large-v2. https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v2 https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v2 I think these high quality rerankers are a good example of models that can be supported by llama.cpp. It would be nice to offload reranking to llama.cpp servers too.

GiteaMirror commented

2026-05-05 15:21:54 -05:00

@ArtrixTech commented on GitHub (Mar 13, 2025):

need this!

@ArtrixTech commented on GitHub (Mar 13, 2025): need this!

GiteaMirror commented

2026-05-05 15:21:55 -05:00

@niyouzhu commented on GitHub (Mar 15, 2025):

support！

@niyouzhu commented on GitHub (Mar 15, 2025): support！

GiteaMirror commented

2026-05-05 15:21:57 -05:00

@Jotakak-yu commented on GitHub (Mar 20, 2025):

need this too

@Jotakak-yu commented on GitHub (Mar 20, 2025): need this too

GiteaMirror commented

2026-05-05 15:21:58 -05:00

@rgaricano commented on GitHub (Mar 20, 2025):

before it, rerank need to be improved with PRs : https://github.com/open-webui/open-webui/pull/11814, https://github.com/open-webui/open-webui/pull/11497 & https://github.com/open-webui/open-webui/pull/11876

Some of us are doing tests by manually integrating these three (not possible making direct pull requests with all togheter) it seem that it work better. ;)

@rgaricano commented on GitHub (Mar 20, 2025): before it, rerank need to be improved with PRs : https://github.com/open-webui/open-webui/pull/11814, https://github.com/open-webui/open-webui/pull/11497 & https://github.com/open-webui/open-webui/pull/11876 Some of us are doing tests by manually integrating these three (not possible making direct pull requests with all togheter) it seem that it work better. ;)

GiteaMirror commented

2026-05-05 15:21:59 -05:00

@bet0x commented on GitHub (Mar 20, 2025):

Implementing this should be trivial at best.

1.- install any embedding or reranker using https://michaelfeil.eu/infinity/main/deploy/

export REMOTE_RERANKER_URL=https://<URL>/rerank
export REMOTE_RERANKER_KEY=<KEY>
export REMOTE_RERANKER_MODEL=mixedbread-ai/mxbai-rerank-large-v1
export RELEVANCE_THRESHOLD=0.1

Changes on the RerankCompressor at backend/open_webui/retrieval/utils.py:

class RerankCompressor(BaseDocumentCompressor):
    embedding_function: Any
    top_n: int
    reranking_function: Any
    r_score: float
    use_remote_reranker: bool = False
    remote_reranker_url: str = None
    remote_reranker_key: str = None
    remote_reranker_model: str = None

    class Config:
        extra = "forbid"
        arbitrary_types_allowed = True

    def compress_documents(
        self,
        documents: Sequence[Document],
        query: str,
        callbacks: Optional[Callbacks] = None,
    ) -> Sequence[Document]:
        log.debug(f"RerankCompressor: Query: {query}")
        log.debug(f"RerankCompressor: Input documents: {[doc.page_content[:100] + '...' for doc in documents]}")
        
        # Don't attempt reranking if there are no documents
        if not documents:
            log.debug("RerankCompressor: No documents to rerank")
            return []
        
        reranking = self.reranking_function is not None or self.use_remote_reranker
        doc_contents = [doc.page_content for doc in documents]

        if reranking:
            if self.use_remote_reranker and self.remote_reranker_url:
                log.debug(f"RerankCompressor: Using remote reranker at {self.remote_reranker_url}")
                reranked = rerank_remote(
                    query=query, 
                    documents=doc_contents,
                    api_url=self.remote_reranker_url,
                    api_key=self.remote_reranker_key,
                    model=self.remote_reranker_model,
                    top_n=self.top_n if self.r_score == 0 else None
                )
                
                # Convert to scores format expected by rest of function
                indices, scores_list = zip(*reranked) if reranked else ([], [])
                scores = [0.0] * len(documents)
                for idx, score in zip(indices, scores_list):
                    scores[idx] = score
                
                log.debug(f"RerankCompressor: Remote reranking scores: {scores}")
            else:
                input_pairs = [(query, doc) for doc in doc_contents]
                log.debug(f"RerankCompressor: Input pairs for reranking function: {[(p[0], p[1][:50] + '...') for p in input_pairs]}")
                
                scores = self.reranking_function.predict(input_pairs)
                log.debug(f"RerankCompressor: Reranking scores: {scores.tolist()}")
                scores = scores.tolist()
        else:
            from sentence_transformers import util

            query_embedding = self.embedding_function(query)
            document_embedding = self.embedding_function(doc_contents)
            scores = util.cos_sim(query_embedding, document_embedding)[0]
            log.debug(f"RerankCompressor: Embedding similarity scores: {scores.tolist()}")
            scores = scores.tolist()

        docs_with_scores = list(zip(documents, scores))
        if self.r_score:
            filtered_docs = [
                (d, s) for d, s in docs_with_scores if s >= self.r_score
            ]
            log.debug(f"RerankCompressor: Filtered {len(docs_with_scores)} down to {len(filtered_docs)} docs with threshold {self.r_score}")
            docs_with_scores = filtered_docs

        result = sorted(docs_with_scores, key=operator.itemgetter(1), reverse=True)
        log.debug(f"RerankCompressor: Sorted results: {[(doc.page_content[:50] + '...', score) for doc, score in result[:min(5, len(result))]]}")
        
        final_results = []
        for doc, doc_score in result[: self.top_n]:
            metadata = doc.metadata.copy()
            metadata["score"] = doc_score
            doc = Document(
                page_content=doc.page_content,
                metadata=metadata,
            )
            final_results.append(doc)
        
        log.debug(f"RerankCompressor: Final top {len(final_results)} documents with scores: {[(doc.metadata.get('score'), doc.page_content[:50] + '...') for doc in final_results]}")
        
        return final_results

You will also need to change the query_doc_with_hybrid_search function:

def query_doc_with_hybrid_search(
    collection_name: str,
    query: str,
    embedding_function,
    k: int,
    reranking_function,
    r: float,
    user: UserModel = None,
) -> dict:
    try:
        log.debug(f"query_doc_with_hybrid_search: Start with query: {query}, k={k}, r={r}")
        
        # Verificar qué tipo de reranking usar
        remote_url = os.getenv("REMOTE_RERANKER_URL", "")
        remote_key = os.getenv("REMOTE_RERANKER_KEY", "")
        remote_model = os.getenv("REMOTE_RERANKER_MODEL", "mixedbread-ai/mxbai-rerank-large-v1")
        use_remote_reranker = True
        
        if use_remote_reranker:
            log.debug(f"query_doc_with_hybrid_search: Using remote reranker at {remote_url}")
            # Si usamos reranker remoto, ignoramos el local
            reranking_function = None
        elif reranking_function is not None:
            log.debug(f"query_doc_with_hybrid_search: Using local reranker type: {type(reranking_function)}")
        else:
            log.debug("query_doc_with_hybrid_search: No reranker provided")
        
        # Verificar si la colección existe primero
        if not VECTOR_DB_CLIENT.has_collection(collection_name=collection_name):
            log.debug(f"query_doc_with_hybrid_search: Collection {collection_name} does not exist")
            return {
                "distances": [[]],
                "documents": [[]],
                "metadatas": [[]],
            }
        
        result = VECTOR_DB_CLIENT.get(collection_name=collection_name)
        
        # Verificar si el resultado es None o está vacío
        if result is None or not result.documents or not result.documents[0]:
            log.debug(f"query_doc_with_hybrid_search: No documents found in collection {collection_name}")
            return {
                "distances": [[]],
                "documents": [[]],
                "metadatas": [[]],
            }
            
        log.debug(f"query_doc_with_hybrid_search: Got {len(result.documents[0])} documents from collection")

        bm25_retriever = BM25Retriever.from_texts(
            texts=result.documents[0],
            metadatas=result.metadatas[0],
        )
        bm25_retriever.k = k

        vector_search_retriever = VectorSearchRetriever(
            collection_name=collection_name,
            embedding_function=embedding_function,
            top_k=k,
        )

        ensemble_retriever = EnsembleRetriever(
            retrievers=[bm25_retriever, vector_search_retriever], weights=[0.5, 0.5]
        )
        
        compressor = RerankCompressor(
            embedding_function=embedding_function,
            top_n=k,
            reranking_function=reranking_function,
            r_score=r,
            use_remote_reranker=use_remote_reranker,
            remote_reranker_url=remote_url,
            remote_reranker_key=remote_key,
            remote_reranker_model=remote_model,
        )

        compression_retriever = ContextualCompressionRetriever(
            base_compressor=compressor, base_retriever=ensemble_retriever
        )

        retrieved_docs = compression_retriever.invoke(query)
        
        # Handle the case where no documents were returned after filtering
        if not retrieved_docs:
            log.debug("query_doc_with_hybrid_search: No documents matched the relevance threshold")
            return {
                "distances": [[]],
                "documents": [[]],
                "metadatas": [[]],
            }

        formatted_result = {
            "distances": [[d.metadata.get("score") for d in retrieved_docs]],
            "documents": [[d.page_content for d in retrieved_docs]],
            "metadatas": [[d.metadata for d in retrieved_docs]],
        }
        
        log.debug(f"query_doc_with_hybrid_search: Final result scores: {formatted_result['distances']}")
        log.debug(f"query_doc_with_hybrid_search: Top result content: {formatted_result['documents'][0][0][:100] if formatted_result['documents'][0] else 'None'}")
        
        log.info(
            "query_doc_with_hybrid_search:result "
            + f'{formatted_result["metadatas"]} {formatted_result["distances"]}'
        )
        return formatted_result
    except Exception as e:
        log.error(f"query_doc_with_hybrid_search error: {e}")
        raise e

@bet0x commented on GitHub (Mar 20, 2025): Implementing this should be trivial at best. 1.- install any embedding or reranker using https://michaelfeil.eu/infinity/main/deploy/ ``` export REMOTE_RERANKER_URL=https://<URL>/rerank export REMOTE_RERANKER_KEY=<KEY> export REMOTE_RERANKER_MODEL=mixedbread-ai/mxbai-rerank-large-v1 export RELEVANCE_THRESHOLD=0.1 ``` Changes on the RerankCompressor at backend/open_webui/retrieval/utils.py: ``` class RerankCompressor(BaseDocumentCompressor): embedding_function: Any top_n: int reranking_function: Any r_score: float use_remote_reranker: bool = False remote_reranker_url: str = None remote_reranker_key: str = None remote_reranker_model: str = None class Config: extra = "forbid" arbitrary_types_allowed = True def compress_documents( self, documents: Sequence[Document], query: str, callbacks: Optional[Callbacks] = None, ) -> Sequence[Document]: log.debug(f"RerankCompressor: Query: {query}") log.debug(f"RerankCompressor: Input documents: {[doc.page_content[:100] + '...' for doc in documents]}") # Don't attempt reranking if there are no documents if not documents: log.debug("RerankCompressor: No documents to rerank") return [] reranking = self.reranking_function is not None or self.use_remote_reranker doc_contents = [doc.page_content for doc in documents] if reranking: if self.use_remote_reranker and self.remote_reranker_url: log.debug(f"RerankCompressor: Using remote reranker at {self.remote_reranker_url}") reranked = rerank_remote( query=query, documents=doc_contents, api_url=self.remote_reranker_url, api_key=self.remote_reranker_key, model=self.remote_reranker_model, top_n=self.top_n if self.r_score == 0 else None ) # Convert to scores format expected by rest of function indices, scores_list = zip(*reranked) if reranked else ([], []) scores = [0.0] * len(documents) for idx, score in zip(indices, scores_list): scores[idx] = score log.debug(f"RerankCompressor: Remote reranking scores: {scores}") else: input_pairs = [(query, doc) for doc in doc_contents] log.debug(f"RerankCompressor: Input pairs for reranking function: {[(p[0], p[1][:50] + '...') for p in input_pairs]}") scores = self.reranking_function.predict(input_pairs) log.debug(f"RerankCompressor: Reranking scores: {scores.tolist()}") scores = scores.tolist() else: from sentence_transformers import util query_embedding = self.embedding_function(query) document_embedding = self.embedding_function(doc_contents) scores = util.cos_sim(query_embedding, document_embedding)[0] log.debug(f"RerankCompressor: Embedding similarity scores: {scores.tolist()}") scores = scores.tolist() docs_with_scores = list(zip(documents, scores)) if self.r_score: filtered_docs = [ (d, s) for d, s in docs_with_scores if s >= self.r_score ] log.debug(f"RerankCompressor: Filtered {len(docs_with_scores)} down to {len(filtered_docs)} docs with threshold {self.r_score}") docs_with_scores = filtered_docs result = sorted(docs_with_scores, key=operator.itemgetter(1), reverse=True) log.debug(f"RerankCompressor: Sorted results: {[(doc.page_content[:50] + '...', score) for doc, score in result[:min(5, len(result))]]}") final_results = [] for doc, doc_score in result[: self.top_n]: metadata = doc.metadata.copy() metadata["score"] = doc_score doc = Document( page_content=doc.page_content, metadata=metadata, ) final_results.append(doc) log.debug(f"RerankCompressor: Final top {len(final_results)} documents with scores: {[(doc.metadata.get('score'), doc.page_content[:50] + '...') for doc in final_results]}") return final_results ``` You will also need to change the query_doc_with_hybrid_search function: ``` def query_doc_with_hybrid_search( collection_name: str, query: str, embedding_function, k: int, reranking_function, r: float, user: UserModel = None, ) -> dict: try: log.debug(f"query_doc_with_hybrid_search: Start with query: {query}, k={k}, r={r}") # Verificar qué tipo de reranking usar remote_url = os.getenv("REMOTE_RERANKER_URL", "") remote_key = os.getenv("REMOTE_RERANKER_KEY", "") remote_model = os.getenv("REMOTE_RERANKER_MODEL", "mixedbread-ai/mxbai-rerank-large-v1") use_remote_reranker = True if use_remote_reranker: log.debug(f"query_doc_with_hybrid_search: Using remote reranker at {remote_url}") # Si usamos reranker remoto, ignoramos el local reranking_function = None elif reranking_function is not None: log.debug(f"query_doc_with_hybrid_search: Using local reranker type: {type(reranking_function)}") else: log.debug("query_doc_with_hybrid_search: No reranker provided") # Verificar si la colección existe primero if not VECTOR_DB_CLIENT.has_collection(collection_name=collection_name): log.debug(f"query_doc_with_hybrid_search: Collection {collection_name} does not exist") return { "distances": [[]], "documents": [[]], "metadatas": [[]], } result = VECTOR_DB_CLIENT.get(collection_name=collection_name) # Verificar si el resultado es None o está vacío if result is None or not result.documents or not result.documents[0]: log.debug(f"query_doc_with_hybrid_search: No documents found in collection {collection_name}") return { "distances": [[]], "documents": [[]], "metadatas": [[]], } log.debug(f"query_doc_with_hybrid_search: Got {len(result.documents[0])} documents from collection") bm25_retriever = BM25Retriever.from_texts( texts=result.documents[0], metadatas=result.metadatas[0], ) bm25_retriever.k = k vector_search_retriever = VectorSearchRetriever( collection_name=collection_name, embedding_function=embedding_function, top_k=k, ) ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, vector_search_retriever], weights=[0.5, 0.5] ) compressor = RerankCompressor( embedding_function=embedding_function, top_n=k, reranking_function=reranking_function, r_score=r, use_remote_reranker=use_remote_reranker, remote_reranker_url=remote_url, remote_reranker_key=remote_key, remote_reranker_model=remote_model, ) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=ensemble_retriever ) retrieved_docs = compression_retriever.invoke(query) # Handle the case where no documents were returned after filtering if not retrieved_docs: log.debug("query_doc_with_hybrid_search: No documents matched the relevance threshold") return { "distances": [[]], "documents": [[]], "metadatas": [[]], } formatted_result = { "distances": [[d.metadata.get("score") for d in retrieved_docs]], "documents": [[d.page_content for d in retrieved_docs]], "metadatas": [[d.metadata for d in retrieved_docs]], } log.debug(f"query_doc_with_hybrid_search: Final result scores: {formatted_result['distances']}") log.debug(f"query_doc_with_hybrid_search: Top result content: {formatted_result['documents'][0][0][:100] if formatted_result['documents'][0] else 'None'}") log.info( "query_doc_with_hybrid_search:result " + f'{formatted_result["metadatas"]} {formatted_result["distances"]}' ) return formatted_result except Exception as e: log.error(f"query_doc_with_hybrid_search error: {e}") raise e ```

GiteaMirror commented

2026-05-05 15:22:01 -05:00

@Ithanil commented on GitHub (Mar 21, 2025):

@bet0x Wonderful! Please make this a PR, although I didn't check how much conflict there will be with the other PRs mentioned above. Maybe it's good to have them merged first.

That said, I think the function rerank_remote is missing.

@Ithanil commented on GitHub (Mar 21, 2025): @bet0x Wonderful! Please make this a PR, although I didn't check how much conflict there will be with the other PRs mentioned above. Maybe it's good to have them merged first. That said, I think the function `rerank_remote` is missing.

GiteaMirror commented

2026-05-05 15:22:03 -05:00

@bet0x commented on GitHub (Mar 24, 2025):

@Ithanil Hello. I had a busy week , if someone is willing to take on the PR and quote me to check & implement anything missing would it be of help!

rerank_remote

def rerank_remote(query, documents, api_url, api_key, model="mixedbread-ai/mxbai-rerank-large-v1", top_n=None):
    try:
        # Si no hay documentos, devolver lista vacía
        if not documents:
            log.debug("rerank_remote: No documents to rerank")
            return []
            
        headers = {
            'accept': 'application/json',
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        }
        
        payload = {
            "query": query,
            "documents": documents,
            "return_documents": False,
            "raw_scores": False,
            "model": model
        }
        
        if top_n:
            payload["top_n"] = top_n
            
        log.debug(f"Sending rerank request to {api_url} with {len(documents)} documents")
        
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()
        
        log.debug(f"Rerank API response: {result}")
        
        # Asegurar que haya resultados
        if not result.get("results"):
            return []
            
        # Procesar resultados
        reranked = [(r["index"], r["relevance_score"]) for r in result["results"]]
        return reranked
        
    except Exception as e:
        log.error(f"Remote reranking error: {e}")
        # Devolver orden original con puntuaciones en cero
        return [(i, 0.0) for i in range(len(documents))]

@bet0x commented on GitHub (Mar 24, 2025): @Ithanil Hello. I had a busy week , if someone is willing to take on the PR and quote me to check & implement anything missing would it be of help! > `rerank_remote` ``` def rerank_remote(query, documents, api_url, api_key, model="mixedbread-ai/mxbai-rerank-large-v1", top_n=None): try: # Si no hay documentos, devolver lista vacía if not documents: log.debug("rerank_remote: No documents to rerank") return [] headers = { 'accept': 'application/json', 'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json' } payload = { "query": query, "documents": documents, "return_documents": False, "raw_scores": False, "model": model } if top_n: payload["top_n"] = top_n log.debug(f"Sending rerank request to {api_url} with {len(documents)} documents") response = requests.post(api_url, headers=headers, json=payload) response.raise_for_status() result = response.json() log.debug(f"Rerank API response: {result}") # Asegurar que haya resultados if not result.get("results"): return [] # Procesar resultados reranked = [(r["index"], r["relevance_score"]) for r in result["results"]] return reranked except Exception as e: log.error(f"Remote reranking error: {e}") # Devolver orden original con puntuaciones en cero return [(i, 0.0) for i in range(len(documents))] ```

GiteaMirror commented

2026-05-05 15:22:05 -05:00

@Phlogi commented on GitHub (Mar 24, 2025):

I'm willing to prepare a PR for this after the other open PRs get merged. I think we could run the remote reranking in parallel (as with my https://github.com/open-webui/open-webui/pull/11814) too; but should make it configurable due to possible rate limiting.

@Phlogi commented on GitHub (Mar 24, 2025): I'm willing to prepare a PR for this after the other open PRs get merged. I think we could run the remote reranking in parallel (as with my https://github.com/open-webui/open-webui/pull/11814) too; but should make it configurable due to possible rate limiting.

GiteaMirror commented

2026-05-05 15:22:06 -05:00

@athoik commented on GitHub (Apr 24, 2025):

Hello,

Is somebody still working on this?

I really support that feature! It's a must have!

Thank you!

@athoik commented on GitHub (Apr 24, 2025): Hello, Is somebody still working on this? I really support that feature! It's a must have! Thank you!

GiteaMirror commented

2026-05-05 15:22:08 -05:00

@RAPHCVR commented on GitHub (Apr 30, 2025):

Very interesting feature ! Thanks for referencing this.

@RAPHCVR commented on GitHub (Apr 30, 2025): Very interesting feature ! Thanks for referencing this.

GiteaMirror commented

2026-05-05 15:22:10 -05:00

@tjbck commented on GitHub (May 5, 2025):

Related #13261

@tjbck commented on GitHub (May 5, 2025): Related #13261

GiteaMirror commented

2026-05-05 15:22:13 -05:00

@jescalada commented on GitHub (May 10, 2025):

Hi folks, I made a proof-of-concept PR based on the suggestions by @bet0x. Let me know if it works for your specific API/workflow! #13745

@jescalada commented on GitHub (May 10, 2025): Hi folks, I made a proof-of-concept PR based on the suggestions by @bet0x. Let me know if it works for your specific API/workflow! #13745

GiteaMirror commented

2026-05-05 15:22:14 -05:00

@athoik commented on GitHub (May 10, 2025):

... And we officially have external reranker via d5fd3b3600

Kudos to everyone involved!

@athoik commented on GitHub (May 10, 2025): ... And we officially have external reranker via https://github.com/open-webui/open-webui/commit/d5fd3b36006b4073af2ce0c04171d0a3034b57d7 Kudos to everyone involved!

GiteaMirror commented

2026-05-05 15:22:15 -05:00

@tjbck commented on GitHub (May 10, 2025):

As mentioned by @athoik, this should be addressed in dev with d5fd3b3600. Testing wanted here!

@tjbck commented on GitHub (May 10, 2025): As mentioned by @athoik, this should be addressed in dev with d5fd3b36006b4073af2ce0c04171d0a3034b57d7. Testing wanted here!

GiteaMirror commented

2026-05-05 15:22:17 -05:00

@athoik commented on GitHub (May 10, 2025):

Testing soon!

@athoik commented on GitHub (May 10, 2025): Testing soon!

GiteaMirror commented

2026-05-05 15:22:19 -05:00

@athoik commented on GitHub (May 10, 2025):

PS a minor issue building interface...

x Build failed in 11.69s
error during build:
[vite:json] [plugin vite:json] src/lib/i18n/locales/it-IT/translation.json (1317:42): Failed to parse JSON file, invalid JSON syntax found at position 90808
file: /home/user/open-webui/src/lib/i18n/locales/it-IT/translation.json:1317:42

1315:   "Youtube": "Youtube",
1316:   "Youtube Language": "Lingua Youtube",
1317:   "Youtube Proxy URL": "URL proxy Youtube",
                                                 ^
1318: }

The last comma is causing the build to fail.

@athoik commented on GitHub (May 10, 2025): PS a minor issue building interface... ``` x Build failed in 11.69s error during build: [vite:json] [plugin vite:json] src/lib/i18n/locales/it-IT/translation.json (1317:42): Failed to parse JSON file, invalid JSON syntax found at position 90808 file: /home/user/open-webui/src/lib/i18n/locales/it-IT/translation.json:1317:42 1315: "Youtube": "Youtube", 1316: "Youtube Language": "Lingua Youtube", 1317: "Youtube Proxy URL": "URL proxy Youtube", ^ 1318: } ``` The last comma is causing the build to fail.

GiteaMirror commented

2026-05-05 15:22:20 -05:00

@athoik commented on GitHub (May 10, 2025):

We have an minor error when using external reranker...

2025-05-10 18:04:49.925 | ERROR    | open_webui.retrieval.utils:query_doc_with_hybrid_search:174 - Error querying doc 5df99d16-7baa-4aef-be1d-feec771198bb with hybrid search: 'list' object has no attribute 'tolist' - {}
Traceback (most recent call last):

  File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7fe4619049a0>
    └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function Thread.run at 0x7fe461904680>
    └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
    │    │        │    └ (<weakref at 0x7fe3843e0e00; to 'ThreadPoolExecutor' at 0x7fe34610d050>, <_queue.SimpleQueue object at 0x7fe34614a980>, None,...
    │    │        └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
    │    └ <function _worker at 0x7fe460a02660>
    └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
    │         └ <function _WorkItem.run at 0x7fe460a027a0>
    └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             │    │   │    │       │    └ {}
             │    │   │    │       └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>
             │    │   │    └ ('5df99d16-7baa-4aef-be1d-feec771198bb', 'xyz')
             │    │   └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>
             │    └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fe34396e480>
             └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>

  File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 340, in process_query
    result = query_doc_with_hybrid_search(
             └ <function query_doc_with_hybrid_search at 0x7fe34b4322a0>

> File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search
    result = compression_retriever.invoke(query)
             │                     │      └ 'xyz'
             │                     └ <function BaseRetriever.invoke at 0x7fe360176020>
             └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

  File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke
    result = self._get_relevant_documents(
             │    └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fe360175da0>
             └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...
  File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents
    compressed_docs = self.base_compressor.compress_documents(
                      │    │               └ <function RerankCompressor.compress_documents at 0x7fe34b431c60>
                      │    └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fe345d1...
                      └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

  File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 821, in compress_documents
    docs_with_scores = list(zip(documents, scores.tolist()))
                                │          └ [0.59824234, 0.7533401, 0.6208291, 0.509887, 0.5749443, 0.25218645, 0.34184664, 0.23707257, 0.6304352, 0.45202848, 0.40952834...
                                └ [Document(metadata={'collection_name': 'open-webui_5df99d16-7baa-4aef-be1d-feec771198bb', 'file_id': '9eb17a77-d35a-4738-bce6...

AttributeError: 'list' object has no attribute 'tolist'
2025-05-10 18:04:49.931 | ERROR    | open_webui.retrieval.utils:process_query:352 - Error when querying the collection with hybrid_search: 'list' object has no attribute 'tolist' - {}
Traceback (most recent call last):

  File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7fe4619049a0>
    └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function Thread.run at 0x7fe461904680>
    └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
    │    │        │    └ (<weakref at 0x7fe3843e0e00; to 'ThreadPoolExecutor' at 0x7fe34610d050>, <_queue.SimpleQueue object at 0x7fe34614a980>, None,...
    │    │        └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
    │    └ <function _worker at 0x7fe460a02660>
    └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
    │         └ <function _WorkItem.run at 0x7fe460a027a0>
    └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>
  File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             │    │   │    │       │    └ {}
             │    │   │    │       └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>
             │    │   │    └ ('5df99d16-7baa-4aef-be1d-feec771198bb', 'xyz')
             │    │   └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>
             │    └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fe34396e480>
             └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390>

> File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 340, in process_query
    result = query_doc_with_hybrid_search(
             └ <function query_doc_with_hybrid_search at 0x7fe34b4322a0>

  File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 175, in query_doc_with_hybrid_search
    raise e

  File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search
    result = compression_retriever.invoke(query)
             │                     │      └ 'xyz'
             │                     └ <function BaseRetriever.invoke at 0x7fe360176020>
             └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

  File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke
    result = self._get_relevant_documents(
             │    └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fe360175da0>
             └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...
  File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents
    compressed_docs = self.base_compressor.compress_documents(
                      │    │               └ <function RerankCompressor.compress_documents at 0x7fe34b431c60>
                      │    └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fe345d1...
                      └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l...

  File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 821, in compress_documents
    docs_with_scores = list(zip(documents, scores.tolist()))
                                │          └ [0.59824234, 0.7533401, 0.6208291, 0.509887, 0.5749443, 0.25218645, 0.34184664, 0.23707257, 0.6304352, 0.45202848, 0.40952834...
                                └ [Document(metadata={'collection_name': 'open-webui_5df99d16-7baa-4aef-be1d-feec771198bb', 'file_id': '9eb17a77-d35a-4738-bce6...

AttributeError: 'list' object has no attribute 'tolist'
2025-05-10 18:04:49.938 | DEBUG    | open_webui.retrieval.utils:get_sources_from_files:555 - Error when using hybrid search, using non hybrid search as fallback. - {}

The following fixes the issue:

diff --git a/backend/open_webui/retrieval/utils.py b/backend/open_webui/retrieval/utils.py
index b952080d3..4496118a5 100644
--- a/backend/open_webui/retrieval/utils.py
+++ b/backend/open_webui/retrieval/utils.py
@@ -818,7 +818,7 @@ class RerankCompressor(BaseDocumentCompressor):
             )
             scores = util.cos_sim(query_embedding, document_embedding)[0]

-        docs_with_scores = list(zip(documents, scores.tolist()))
+        docs_with_scores = list(zip(documents, list(scores)))
         if self.r_score:
             docs_with_scores = [
                 (d, s) for d, s in docs_with_scores if s >= self.r_score

I also think we need async or multithread on external rerarker to speed up exection.

@athoik commented on GitHub (May 10, 2025): We have an minor error when using external reranker... ``` 2025-05-10 18:04:49.925 | ERROR | open_webui.retrieval.utils:query_doc_with_hybrid_search:174 - Error querying doc 5df99d16-7baa-4aef-be1d-feec771198bb with hybrid search: 'list' object has no attribute 'tolist' - {} Traceback (most recent call last): File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7fe4619049a0> └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function Thread.run at 0x7fe461904680> └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> │ │ │ └ (<weakref at 0x7fe3843e0e00; to 'ThreadPoolExecutor' at 0x7fe34610d050>, <_queue.SimpleQueue object at 0x7fe34614a980>, None,... │ │ └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> │ └ <function _worker at 0x7fe460a02660> └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker work_item.run() │ └ <function _WorkItem.run at 0x7fe460a027a0> └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> │ │ │ └ ('5df99d16-7baa-4aef-be1d-feec771198bb', 'xyz') │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> │ └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fe34396e480> └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 340, in process_query result = query_doc_with_hybrid_search( └ <function query_doc_with_hybrid_search at 0x7fe34b4322a0> > File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search result = compression_retriever.invoke(query) │ │ └ 'xyz' │ └ <function BaseRetriever.invoke at 0x7fe360176020> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke result = self._get_relevant_documents( │ └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fe360175da0> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents compressed_docs = self.base_compressor.compress_documents( │ │ └ <function RerankCompressor.compress_documents at 0x7fe34b431c60> │ └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fe345d1... └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 821, in compress_documents docs_with_scores = list(zip(documents, scores.tolist())) │ └ [0.59824234, 0.7533401, 0.6208291, 0.509887, 0.5749443, 0.25218645, 0.34184664, 0.23707257, 0.6304352, 0.45202848, 0.40952834... └ [Document(metadata={'collection_name': 'open-webui_5df99d16-7baa-4aef-be1d-feec771198bb', 'file_id': '9eb17a77-d35a-4738-bce6... AttributeError: 'list' object has no attribute 'tolist' 2025-05-10 18:04:49.931 | ERROR | open_webui.retrieval.utils:process_query:352 - Error when querying the collection with hybrid_search: 'list' object has no attribute 'tolist' - {} Traceback (most recent call last): File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7fe4619049a0> └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function Thread.run at 0x7fe461904680> └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> File "/home/user/.conda/envs/open-webui/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> │ │ │ └ (<weakref at 0x7fe3843e0e00; to 'ThreadPoolExecutor' at 0x7fe34610d050>, <_queue.SimpleQueue object at 0x7fe34614a980>, None,... │ │ └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> │ └ <function _worker at 0x7fe460a02660> └ <Thread(ThreadPoolExecutor-6_0, started 140614056269504)> File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker work_item.run() │ └ <function _WorkItem.run at 0x7fe460a027a0> └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> File "/home/user/.conda/envs/open-webui/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> │ │ │ └ ('5df99d16-7baa-4aef-be1d-feec771198bb', 'xyz') │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> │ └ <function query_collection_with_hybrid_search.<locals>.process_query at 0x7fe34396e480> └ <concurrent.futures.thread._WorkItem object at 0x7fe3465df390> > File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 340, in process_query result = query_doc_with_hybrid_search( └ <function query_doc_with_hybrid_search at 0x7fe34b4322a0> File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 175, in query_doc_with_hybrid_search raise e File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 148, in query_doc_with_hybrid_search result = compression_retriever.invoke(query) │ │ └ 'xyz' │ └ <function BaseRetriever.invoke at 0x7fe360176020> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain_core/retrievers.py", line 258, in invoke result = self._get_relevant_documents( │ └ <function ContextualCompressionRetriever._get_relevant_documents at 0x7fe360175da0> └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/home/user/.conda/envs/open-webui/lib/python3.11/site-packages/langchain/retrievers/contextual_compression.py", line 48, in _get_relevant_documents compressed_docs = self.base_compressor.compress_documents( │ │ └ <function RerankCompressor.compress_documents at 0x7fe34b431c60> │ └ RerankCompressor(embedding_function=<function chat_completion_files_handler.<locals>.<lambda>.<locals>.<lambda> at 0x7fe345d1... └ ContextualCompressionRetriever(base_compressor=RerankCompressor(embedding_function=<function chat_completion_files_handler.<l... File "/home/user/open-webui/backend/open_webui/retrieval/utils.py", line 821, in compress_documents docs_with_scores = list(zip(documents, scores.tolist())) │ └ [0.59824234, 0.7533401, 0.6208291, 0.509887, 0.5749443, 0.25218645, 0.34184664, 0.23707257, 0.6304352, 0.45202848, 0.40952834... └ [Document(metadata={'collection_name': 'open-webui_5df99d16-7baa-4aef-be1d-feec771198bb', 'file_id': '9eb17a77-d35a-4738-bce6... AttributeError: 'list' object has no attribute 'tolist' 2025-05-10 18:04:49.938 | DEBUG | open_webui.retrieval.utils:get_sources_from_files:555 - Error when using hybrid search, using non hybrid search as fallback. - {} ``` The following fixes the issue: ``` diff --git a/backend/open_webui/retrieval/utils.py b/backend/open_webui/retrieval/utils.py index b952080d3..4496118a5 100644 --- a/backend/open_webui/retrieval/utils.py +++ b/backend/open_webui/retrieval/utils.py @@ -818,7 +818,7 @@ class RerankCompressor(BaseDocumentCompressor): ) scores = util.cos_sim(query_embedding, document_embedding)[0] - docs_with_scores = list(zip(documents, scores.tolist())) + docs_with_scores = list(zip(documents, list(scores))) if self.r_score: docs_with_scores = [ (d, s) for d, s in docs_with_scores if s >= self.r_score ``` I also think we need async or multithread on external rerarker to speed up exection.

GiteaMirror commented

2026-05-05 15:22:26 -05:00

@tjbck commented on GitHub (May 10, 2025):

Addresed with https://github.com/open-webui/open-webui/pull/13751, other enhancement PRs welcome!

@tjbck commented on GitHub (May 10, 2025): Addresed with https://github.com/open-webui/open-webui/pull/13751, other enhancement PRs welcome!

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#53806