mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 00:04:08 -05:00
issue: Qdrant connection timeout (Add as configurable ENV variable?) #5965
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @galvanoid on GitHub (Aug 5, 2025).
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
0.6.18
Ollama Version (if applicable)
0.9.3
Operating System
Ubuntu server 24.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Queries to Qdrant (hybrid search with reranker and large document sets) should support configurable timeouts via environment variables, to prevent httpx.ReadTimeout errors when queries exceed default duration.
Actual Behavior
When using hybrid search (reranker) with a large knowledge base (~30k documents), I consistently get httpx.ReadTimeout from the Qdrant client in OpenWebUI.
qdrant_client.http.exceptions.ResponseHandlingException: timed out
httpx.ReadTimeout: timed out
Steps to Reproduce
Deploy OpenWebUI using Docker.
Set up a Qdrant instance (Docker) with a large collection (e.g., 30,000 documents).
Link the collection to OpenWebUI as a knowledge base.
Enable hybrid search (reranker).
Perform a prompt that triggers hybrid search over that collection.
Observe the crash with httpx.ReadTimeout in the Docker logs.
Logs & Screenshots
httpx.ReadTimeout: timed out
...
File "/app/backend/open_webui/retrieval/utils.py", line 345, in query_collection_with_hybrid_search
collection_results[collection_name] = VECTOR_DB_CLIENT.get(
...
qdrant_client.http.exceptions.ResponseHandlingException: timed out
Additional Information
There is currently no environment variable exposed to control the timeout used by the Qdrant Python client (QdrantClient(timeout=...)), and the default is insufficient for large hybrid search workloads.
Suggestion:
Add a new ENV variable, e.g. QDRANT_CLIENT_TIMEOUT, to allow users to configure this timeout in the Docker container without modifying the code. This would improve flexibility and avoid silent failure when reranker is used on large datasets.
@expruc commented on GitHub (Aug 9, 2025):
I have created #16419 to address your proposal, but this, however, doesn't fix the real issue, which is related to the way hybrid mode is implemented. The reason you experience timeouts is the need to get all the documents in the collection before applying the reranking logic as in the following code
b8da4a8cd8/backend/open_webui/retrieval/utils.py (L339-L354)and specifically in line 345. Collecting all the points in large collections takes a lot of time (can be up to 20 seconds on a collection with 200k, with size of 384, and growing as more vectors are added), even with server optimizations.
This means the client must wait for all that time before the llm starts answering the prompt, which is less than ideal.
One solution might be using the builtin qdrant hybrid search, but this requires implementing collection creation with hybrid mode enabled, and perhaps a new
VECTOR_DB_CLIENTmethod of querying/searching with hybrid mode.