mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-08 04:16:03 -05:00
[GH-ISSUE #16896] issue: HybridSearch ZeroDivisionError: division by zero #33619
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Schwenn2002 on GitHub (Aug 25, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/16896
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.25
Ollama Version (if applicable)
ollama 0.11.6
Operating System
Ubuntu 24.04
Browser (if applicable)
chrome
Confirmation
README.md.Expected Behavior
The expected result is that documents are displayed from the RAG and the API returns corresponding chunks.``
Actual Behavior
I get an error when querying RAG (http status 400) and in the Docker log there is an error for the hybrid search.
Test via the API:
/api/v1/retrieval/query/Collection
Steps to Reproduce
Go to the URL /docs#/retrieval/query_collection_handler_api_v1_retrieval_query_collection_post and execute a test API query.
Logs & Screenshots
██████╗ ██████╗ ███████╗███╗ ██╗ ██╗ ██╗███████╗██████╗ ██╗ ██╗██╗
██╔═══██╗██╔══██╗██╔════╝████╗ ██║ ██║ ██║██╔════╝██╔══██╗██║ ██║██║
██║ ██║██████╔╝█████╗ ██╔██╗ ██║ ██║ █╗ ██║█████╗ ██████╔╝██║ ██║██║
██║ ██║██╔═══╝ ██╔══╝ ██║╚██╗██║ ██║███╗██║██╔══╝ ██╔══██╗██║ ██║██║
╚██████╔╝██║ ███████╗██║ ╚████║ ╚███╔███╔╝███████╗██████╔╝╚██████╔╝██║
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝ ╚══╝╚══╝ ╚══════╝╚═════╝ ╚═════╝ ╚═╝
v0.6.25 - building the best AI user interface.
https://github.com/open-webui/open-webui
Fetching 22 files: 100%|██████████| 22/22 [00:00<00:00, 75697.04it/s]
Fetching 13 files: 100%|██████████| 13/13 [00:00<00:00, 46843.60it/s]
INFO: Started server process [1]
INFO: Waiting for application startup.
2025-08-25 14:29:34.620 | ERROR | open_webui.retrieval.utils:query_doc_with_hybrid_search:193 - Error querying doc 3ce4b804-6c65-46c2-b93c-b6aec5345742 with hybrid search: division by zero
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7487f3740a40>
└ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function Thread.run at 0x7487f3740720>
└ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
File "/usr/local/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
│ │ │ └ (<weakref at 0x7484756ad3f0; to 'ThreadPoolExecutor' at 0x748477acb050>, <_queue.SimpleQueue object at 0x748475569f30>, None,...
│ │ └ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
│ └ <function _worker at 0x7487f2814a40>
└ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7487f2814b80>
└ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
│ │ │ └ ('3ce4b804-6c65-46c2-b93c-b6aec5345742', 'Liegt ein IKT-Managementrahmen vor')
│ │ └ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
│ └ <function query_collection_with_hybrid_search..process_query at 0x7484756ce520>
└ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
File "/app/backend/open_webui/retrieval/utils.py", line 364, in process_query
result = query_doc_with_hybrid_search(
└ <function query_doc_with_hybrid_search at 0x7484ec289b20>
File "/usr/local/lib/python3.11/site-packages/langchain_community/retrievers/bm25.py", line 64, in from_texts
vectorizer = BM25Okapi(texts_processed, **bm25_params)
│ │ └ {}
│ └ []
└ <class 'rank_bm25.BM25Okapi'>
File "/usr/local/lib/python3.11/site-packages/rank_bm25.py", line 83, in init
super().init(corpus, tokenizer)
│ └ None
└ []
File "/usr/local/lib/python3.11/site-packages/rank_bm25.py", line 27, in init
nd = self._initialize(corpus)
│ │ └ []
│ └ <function BM25._initialize at 0x7484755f0cc0>
└ <rank_bm25.BM25Okapi object at 0x7484755e1a90>
File "/usr/local/lib/python3.11/site-packages/rank_bm25.py", line 52, in _initialize
self.avgdl = num_doc / self.corpus_size
│ │ │ │ └ 0
│ │ │ └ <rank_bm25.BM25Okapi object at 0x7484755e1a90>
│ │ └ 0
│ └ 0
└ <rank_bm25.BM25Okapi object at 0x7484755e1a90>
ZeroDivisionError: division by zero
2025-08-25 14:29:34.622 | ERROR | open_webui.retrieval.utils:process_query:377 - Error when querying the collection with hybrid_search: division by zero
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7487f3740a40>
└ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function Thread.run at 0x7487f3740720>
└ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
File "/usr/local/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
│ │ │ └ (<weakref at 0x7484756ad3f0; to 'ThreadPoolExecutor' at 0x748477acb050>, <_queue.SimpleQueue object at 0x748475569f30>, None,...
│ │ └ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
│ └ <function _worker at 0x7487f2814a40>
└ <Thread(ThreadPoolExecutor-4_0, started 128105721099968)>
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
work_item.run()
│ └ <function _WorkItem.run at 0x7487f2814b80>
└ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
│ │ │ └ ('3ce4b804-6c65-46c2-b93c-b6aec5345742', 'Liegt ein IKT-Managementrahmen vor')
│ │ └ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
│ └ <function query_collection_with_hybrid_search..process_query at 0x7484756ce520>
└ <concurrent.futures.thread._WorkItem object at 0x748477aaea50>
File "/app/backend/open_webui/retrieval/utils.py", line 194, in query_doc_with_hybrid_search
raise e
File "/app/backend/open_webui/retrieval/utils.py", line 130, in query_doc_with_hybrid_search
bm25_retriever = BM25Retriever.from_texts(
│ └ <classmethod(<function BM25Retriever.from_texts at 0x7484ec288400>)>
└ <class 'langchain_community.retrievers.bm25.BM25Retriever'>
File "/usr/local/lib/python3.11/site-packages/langchain_community/retrievers/bm25.py", line 64, in from_texts
vectorizer = BM25Okapi(texts_processed, **bm25_params)
│ │ └ {}
│ └ []
└ <class 'rank_bm25.BM25Okapi'>
File "/usr/local/lib/python3.11/site-packages/rank_bm25.py", line 83, in init
super().init(corpus, tokenizer)
│ └ None
└ []
File "/usr/local/lib/python3.11/site-packages/rank_bm25.py", line 27, in init
nd = self._initialize(corpus)
│ │ └ []
│ └ <function BM25._initialize at 0x7484755f0cc0>
└ <rank_bm25.BM25Okapi object at 0x7484755e1a90>
File "/usr/local/lib/python3.11/site-packages/rank_bm25.py", line 52, in _initialize
self.avgdl = num_doc / self.corpus_size
│ │ │ │ └ 0
│ │ │ └ <rank_bm25.BM25Okapi object at 0x7484755e1a90>
│ │ └ 0
│ └ 0
└ <rank_bm25.BM25Okapi object at 0x7484755e1a90>
ZeroDivisionError: division by zero
2025-08-25 14:29:34.623 | ERROR | open_webui.routers.retrieval:query_collection_handler:2189 - Hybrid search failed for all collections. Using Non-hybrid search as fallback.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x7487f3740a40>
└ <WorkerThread(AnyIO worker thread, started 128106283116224)>
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
│ └ <function WorkerThread.run at 0x7484762d2700>
└ <WorkerThread(AnyIO worker thread, started 128106283116224)>
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run
result = context.run(func, *args)
│ │ │ └ ()
│ │ └ functools.partial(<function query_collection_handler at 0x7484ec31c220>, user=UserModel(id='4dcd7332-88b8-4ba4-b910-db56289de...
│ └ <method 'run' of '_contextvars.Context' objects>
└ <_contextvars.Context object at 0x748477aace40>
File "/app/backend/open_webui/retrieval/utils.py", line 400, in query_collection_with_hybrid_search
raise Exception(
Exception: Hybrid search failed for all collections. Using Non-hybrid search as fallback.
Additional Information
No response
@rgaricano commented on GitHub (Aug 25, 2025):
That error happend when hybrid search return a empty sets or malformed similarity scores in your specific collection data.
Probably because it doesn't found any doc with relevance in the range of the indicated in the request.
By the way, I made a PR for fix hybrid param because it is not handled correctly: if the request have "hybrid": "false" the hybrid search should be ignored (regardless of other parameters).
PR for fix both: https://github.com/open-webui/open-webui/pull/16901
@mahenning commented on GitHub (Aug 26, 2025):
The error points to BM25, which happens before the similarity scores. The line
self.avgdl = num_doc / self.corpus_sizethrows it (related: https://github.com/run-llama/llama_index/issues/9024).This means that the size of the input text is 0 (no text). Is the document scanned or something? Can you test with another document?
@tjbck commented on GitHub (Aug 26, 2025):
This will occur when there aren't any documents, addressed in dev to display an error message.