mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-05 18:38:17 -05:00
issue: Severe slowdown when collections are associated to a model (vs. passing knowledges per request). Model editor also takes ~5 minutes to open with large collections #6338
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @galvanoid on GitHub (Sep 7, 2025).
Originally assigned to: @tjbck on GitHub.
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.6.26
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24.04
Browser (if applicable)
Chrome, Edge, Firefox, Chromium
Confirmation
README.md.Expected Behavior
Expected:
Comparable latency in both modes, associated (knowledges associated to model) vs. per-request (knowledges sent vía request).
The model editor should open within seconds, regardless of associated collections’ size.
Actual Behavior
Associated mode: high latency; model editor takes minutes to open (e.g., ~5 min).
Per-request mode: fast and stable.
Steps to Reproduce
Case A — Slow: collections associated to the model
In OWUI, associate the large collection(s) (50k-100k documents) to model (in admin settings)
Send a basic chat/RAG query.
Observe very high latency (many seconds/minutes).
Open the model editor for that model: the page takes ~5 minutes to become interactive when large collections are attached. In that time if a llm request is streaming, it stopped.
In this case (knoledges associated to a model), begin of response takes up to 10 min. o more.
Case B — Fast: pass collections per request
Ensure the model has no associated collections.
Call the OWUI chat API with the same model but include the collections in the request body (e.g. knowledges: [...]).
Observe fast responses and a snappy model editor.
In this case, begin of response takes about 1 min.
Same Qdrant, same collection, same embedder, same network. The only change is association vs. per-request collections.
Logs & Screenshots
Qdrant logs during:
model editor open (slow)
a query with associated collections (slow)
the same query with collections in request (fast)
docker logs -f
(Look for /points/scroll vs /search, limit, with_vector, with_payload sizes, etc.)
OWUI logs with debug level (LOG_LEVEL=debug or similar) to see which calls are made when opening the editor and when building the query in the associated path.
Additional Information
What I Tested / Ruled Out
Qdrant is healthy: status: green; direct /search calls are fast when using sensible params (with_vector=false, minimal with_payload.include, small limit).
Network is not the issue: OWUI and Qdrant co-located or same LAN; negligible RTT.
Embedder is fine: snowflake-arctic-embed2 (1024D) matches collection; no on-the-fly re-embedding.
Payload indices: Tried adding/removing payload indexes (source, file_id, start_index). The associated vs. per-request gap remains.
RAG optimizations (e.g., filtering to start_index=0 for summaries, “summary-mirror” collections, pre/post-retrieval hooks) speed up Qdrant, but the slowdown only appears when collections are associated to the model.
Extra symptom: opening the model editor with large associated collections is extremely slow ⇒ likely doing heavy enumeration/scroll or a large prefetch to build UI metadata.
Additional Observation: Qdrant activity starts immediately with per-request collections, but is delayed when collections are associated to the model
Using btop to watch CPU:
Per-request collections (fast path): right after I send the chat request, the Qdrant process ramps up CPU immediately (all 30 cores light up within ~<1s).
Associated collections (slow path): after I send the same request, there’s a noticeable idle gap before Qdrant shows any CPU activity. Only after several seconds does Qdrant begin to work.
Hypothesis for Maintainers
In the “associated collections” path, OWUI might:
Do prefetch of many points/chunks to build file lists, previews, or counts (possibly via /points/scroll), with liberal with_payload or with_vector=true, or high limit.
Issue N+1 queries per collection to compute aggregates (e.g., dedup of file_id), instead of using lighter endpoints (e.g., /points/count) or lazy evaluation.
Use search_params.exact=true or defaults that trigger full scans.
In contrast, the “per-request” path seems to use a leaner retrieval: typically 1–2 /search calls with with_vector=false, minimal payload.include, and small limit, hence fast.
@silentoplayz commented on GitHub (Oct 21, 2025):
Related - https://github.com/open-webui/open-webui/issues/17998
@deliciousbob commented on GitHub (Oct 24, 2025):
Hi @galvanoid, we have the same issues with 70K files, starting to get worse with 10K files.
There is always a noticable delay before the Request is sent to the API Endpoints / Vector DB.
The delay seems to correlate with the loading time of the knowledge list in the Chat (via + or #) and Workspace/Knowledge.
Our wait time is approx. 15-30sec. before a list loads, or the request is sent to the Endpoints.
One of the colloborators have already created a workaround to disable file listing in the chat (see https://github.com/open-webui/open-webui/pull/18292) , this reduces the Listing of the knowledge collections, but I assume this has to be extended to more points in the code. (like after sending your prompt)