issue: Qdrant connection timeout (Add as configurable ENV variable?) #5965

New Issue

GiteaMirror · 2025-11-11T16:40:22-06:00

GiteaMirror commented

2025-11-11 16:40:22 -06:00

Originally created by @galvanoid on GitHub (Aug 5, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

0.6.18

Ollama Version (if applicable)

0.9.3

Operating System

Ubuntu server 24.04

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Queries to Qdrant (hybrid search with reranker and large document sets) should support configurable timeouts via environment variables, to prevent httpx.ReadTimeout errors when queries exceed default duration.

Actual Behavior

When using hybrid search (reranker) with a large knowledge base (~30k documents), I consistently get httpx.ReadTimeout from the Qdrant client in OpenWebUI.

qdrant_client.http.exceptions.ResponseHandlingException: timed out
httpx.ReadTimeout: timed out

Steps to Reproduce

Deploy OpenWebUI using Docker.

Set up a Qdrant instance (Docker) with a large collection (e.g., 30,000 documents).

Link the collection to OpenWebUI as a knowledge base.

Enable hybrid search (reranker).

Perform a prompt that triggers hybrid search over that collection.

Observe the crash with httpx.ReadTimeout in the Docker logs.

Logs & Screenshots

httpx.ReadTimeout: timed out
...
File "/app/backend/open_webui/retrieval/utils.py", line 345, in query_collection_with_hybrid_search
collection_results[collection_name] = VECTOR_DB_CLIENT.get(
...
qdrant_client.http.exceptions.ResponseHandlingException: timed out

Additional Information

There is currently no environment variable exposed to control the timeout used by the Qdrant Python client (QdrantClient(timeout=...)), and the default is insufficient for large hybrid search workloads.

Suggestion:
Add a new ENV variable, e.g. QDRANT_CLIENT_TIMEOUT, to allow users to configure this timeout in the Docker container without modifying the code. This would improve flexibility and avoid silent failure when reranker is used on large datasets.

Originally created by @galvanoid on GitHub (Aug 5, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version 0.6.18 ### Ollama Version (if applicable) 0.9.3 ### Operating System Ubuntu server 24.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Queries to Qdrant (hybrid search with reranker and large document sets) should support configurable timeouts via environment variables, to prevent httpx.ReadTimeout errors when queries exceed default duration. ### Actual Behavior When using hybrid search (reranker) with a large knowledge base (~30k documents), I consistently get httpx.ReadTimeout from the Qdrant client in OpenWebUI. qdrant_client.http.exceptions.ResponseHandlingException: timed out httpx.ReadTimeout: timed out ### Steps to Reproduce Deploy OpenWebUI using Docker. Set up a Qdrant instance (Docker) with a large collection (e.g., 30,000 documents). Link the collection to OpenWebUI as a knowledge base. Enable hybrid search (reranker). Perform a prompt that triggers hybrid search over that collection. Observe the crash with httpx.ReadTimeout in the Docker logs. ### Logs & Screenshots httpx.ReadTimeout: timed out ... File "/app/backend/open_webui/retrieval/utils.py", line 345, in query_collection_with_hybrid_search collection_results[collection_name] = VECTOR_DB_CLIENT.get( ... qdrant_client.http.exceptions.ResponseHandlingException: timed out ### Additional Information There is currently no environment variable exposed to control the timeout used by the Qdrant Python client (QdrantClient(timeout=...)), and the default is insufficient for large hybrid search workloads. Suggestion: Add a new ENV variable, e.g. QDRANT_CLIENT_TIMEOUT, to allow users to configure this timeout in the Docker container without modifying the code. This would improve flexibility and avoid silent failure when reranker is used on large datasets.

GiteaMirror added the bug label 2025-11-11 16:40:23 -06:00

GiteaMirror closed this issue

2025-11-11 16:40:23 -06:00

GiteaMirror commented

2025-11-11 16:40:24 -06:00

@expruc commented on GitHub (Aug 9, 2025):

I have created #16419 to address your proposal, but this, however, doesn't fix the real issue, which is related to the way hybrid mode is implemented. The reason you experience timeouts is the need to get all the documents in the collection before applying the reranking logic as in the following code
b8da4a8cd8/backend/open_webui/retrieval/utils.py (L339-L354)
and specifically in line 345. Collecting all the points in large collections takes a lot of time (can be up to 20 seconds on a collection with 200k, with size of 384, and growing as more vectors are added), even with server optimizations.

This means the client must wait for all that time before the llm starts answering the prompt, which is less than ideal.
One solution might be using the builtin qdrant hybrid search, but this requires implementing collection creation with hybrid mode enabled, and perhaps a new VECTOR_DB_CLIENT method of querying/searching with hybrid mode.

@expruc commented on GitHub (Aug 9, 2025): I have created #16419 to address your proposal, but this, however, doesn't fix the real issue, which is related to the way hybrid mode is implemented. The reason you experience timeouts is the need to get all the documents in the collection before applying the reranking logic as in the following code https://github.com/open-webui/open-webui/blob/b8da4a8cd8257d4846f3608e299618a0b4f185ed/backend/open_webui/retrieval/utils.py#L339-L354 and specifically in line 345. Collecting all the points in large collections takes a lot of time (can be up to 20 seconds on a collection with 200k, with size of 384, and growing as more vectors are added), even with server optimizations. This means the client must wait for all that time before the llm starts answering the prompt, which is less than ideal. One solution might be using the builtin qdrant hybrid search, but this requires implementing collection creation with hybrid mode enabled, and perhaps a new `VECTOR_DB_CLIENT` method of querying/searching with hybrid mode.