mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #17998] issue: 70K Documents in multiple Collections causes severe slowdown #57125
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @deliciousbob on GitHub (Oct 2, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17998
Originally assigned to: @tjbck on GitHub.
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.30
Ollama Version (if applicable)
No response
Operating System
Docker Container
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Below 10K Documents, opening the Workspace KCs was fast (loaded 1-3 sec.)
Also selecting a Knowledge Collection in the chat was ok (loaded 1-3 sec.)
Actual Behavior
After uploading 70K documents the whole System got very slow, everything related to Documents got extremly slow.
Steps to Reproduce
Good Day Community.
I have some serious performance issues after uploading about 70K Documents (2-4 pages each).
I noticed the selection in the Chat is terribly slow now. It takes 15-25 sec. to load the Knowledge and Chat.
I use PostgreSQL as a DB Backend for OWUI (same VM both docker containers - running on M.2-disks so network and disk performance are very good)
and another PGVector DB for the Vector-Data on another VM on the same Host (so network and disk perf. are good too)
I noticed that the knowledge selection in the Chat shows a huge list of Collections and Files.
- Is there a way to only show Collections ?
Logs & Screenshots
I can provide if needed
Additional Information
PostgreSQL v17 as OWUI DB
PGVector v17 as VectorDB
@deliciousbob commented on GitHub (Oct 2, 2025):
My assumption is that the PostgreSQL for OWUI is causing the slowdown.
Is there any recommendation for the PostgreSQL config ? Thx
@Classic298 commented on GitHub (Oct 2, 2025):
Yes: don't use PostgreSQL for such large file amounts.
Sorry, no other recommendation here.
Better use Qdrant (multitenancy mode) or Milvus (multitenancy mode)
@ka-admin commented on GitHub (Oct 2, 2025):
I'm using qdrant and the situation is just the same
@Classic298 commented on GitHub (Oct 2, 2025):
@ka-admin which qdrant? and what indexing type? and multitenancy or not?
@ka-admin commented on GitHub (Oct 2, 2025):
@Classic298 docker run -d --name open-webui --network host --restart always -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://192.168.127.20:11434 -e VECTOR_DB=qdrant -e QDRANT_URI=http://192.168.127.20:6333 -e ENABLE_QDRANT_MULTITENANCY_MODE=true -e QDRANT_TIMEOUT=300 -e RAG_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B" -e RAG_RERANKING_MODEL="Qwen/Qwen3-Reranker-0.6B" -e RAG_TOP_K=20 -e RAG_TOP_K_RERANKER=20 -e CHUNK_SIZE=1024 -e CHUNK_OVERLAP=100 -e WEB_LOADER_ENGINE=playwright -e USER_AGENT='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' ghcr.io/open-webui/open-webui:main
qdrant -V
qdrant 1.15.1
@Classic298 commented on GitHub (Oct 2, 2025):
hmmm weird.
@ka-admin commented on GitHub (Oct 2, 2025):
@Classic298 I tell you more: to insert one document (code file - text format) to such a collection takes up to 5-6 seconds. I take a day to upload 5000 code files in collection. CUDA accelerated, nvme storage of course. When collection was new it takes only about an hour or maybe two to insert 5000 files. As collection started to grow the speed of insertion started to slows down dramatically. It looks like there is no caching mechanism (or it works very ineffective) that tells is this document in database or not (hashing or something like this). It looks like every insertion has to do rehashing all previously inserted files just to tell is new file potential duplicate. The more files in collection the more checks Open WebUI will do, it is only my suggestion I'm not sure if it's true or not. But it is what it is - managing large collections is pain in the back
@Classic298 commented on GitHub (Oct 2, 2025):
what indexing type do you have configured in qdrant? specs of your machine? especially memory and storage
@deliciousbob commented on GitHub (Oct 2, 2025):
Thx for the quick response.
I am currently testing the PGvector seems to be fine here, i can retrieve Vectors within seconds,
it seems that the file handling or the backend DB of OWUI causes the issue.
I am further investigating into that.
I've tested Qdrant and Milvus too, by far i am no expert, but it seemed to grow on RAM quit fast if you import a lot of vectors.
My Upload procedure is currently done via API, I upload the file and then add it to the Collection. That works well at 3sec/document.
The file handling seems to be the issue. I am investigating further and let you know
@Classic298 commented on GitHub (Oct 2, 2025):
yes that is intended, as these vector databases are much faster if they use the RAM
@deliciousbob commented on GitHub (Oct 2, 2025):
I've now tested direct queries to my PGvector DB, I get a response within 2-3 sec. (Limit 30 / L2 distance / 1024 Dimensions / 70K docs). When querying for cosine distance i get a response instantly within 1 sec.
PGVector seems to be working well and can handle large datasets by consuming less then 6GB RAM.
My next tests will be on the PostgreSQL Backend-DB for OWUI, as this is most likely storing the text and the storage location of the files.
@deliciousbob commented on GitHub (Oct 6, 2025):
I did some checks on the backend DB (pg:17)
I checked the requests from OWUI on opening the Workspace -> KCs :
I saw the following select (could not display the full request as it was to long for the logs):
--> It seems to me like selecting all related documents by mentioning them (where file.id in) is a very unificient way of querying.
When i checked the #knowledge# Table in the DB, i saw that every knowledge directly stores the related file_ids in the #data# Column:
{"file_ids": ["c40cb6f9-c54e-4c4e-9662-656b63c889ae", "e913c33d-527b-4b72-a8d1-7370b331b439", "96e09c3b-46b9-4b13-ad49-79424b1c44c6", "31e96b1c-6151-4b71-8430-51518a5f3bd0", "b3ee8b6f-9ad2-4e47-963d-3a55fb95f7ba", "6bec315d-3b6a-40ca-b4dd-df462da5a09d", "ef2fd808-90ca-4edc-9306-b0ff35180923", "4bb10c8d-eb8e-4a9f-bb87-3c9c23d517ff", "8265c050-1db3-46e7-93c0-bf9140cd7826", "bd4bc637-809a-40f7-845d-8572be45ee0e", "36a5c879-b43f-4f2a-b81d-1008f8f6b485", "f9743037-0994-4216-af9c-I am by far no DB expert, and i have the least experience with PostgreSQL, but it seems to me like a very inefficient way of storing relations. I am used to have like a relationship table in between similar to:
Key-Findinds and Recommendations:
This would dramatically enhance the speed of handling RAG in OWUI in larger environments with some tousend documents.
VectorDB retrieval is not the issue as i mentioned in my previous comment, it seems to be the backend DB, especially the document retrieval probably needs an overhaul. (pls correct me if am wrong)
Let me know if you need further info, i would be happy to help on solving that issue.
Thank you, best regards, Robert
@rgaricano commented on GitHub (Oct 6, 2025):
Other posible optimizations
1. Batch Fetching Across Knowledge Collections
Current Problem: The knowledge listing endpoints (
GET /api/knowledge/andGET /api/knowledge/list) iterate through each knowledge collection and make separate queries for file metadata.4d7fddaf7e/backend/open_webui/routers/knowledge.py (L42-L86)Implementation: Collect all file IDs upfront, fetch in a single query, then distribute results:
This reduces N queries to 1 query.
2. Selective Column Loading with SQLAlchemy
Current Problem:
get_file_metadatas_by_ids()fetches all columns including large JSON blobs infile.dataandfile.meta.4d7fddaf7e/backend/open_webui/models/files.py (L180-L193)Implementation: Add a new method with deferred loading:
This avoids transferring large JSON blobs from PostgreSQL when only metadata is needed.
3. Optimize Retrieval Full Context Mode
Current Problem:
get_sources_from_items()loops through file IDs individually when processing collections in "full" context mode, callingFiles.get_file_by_id()in a loop.4d7fddaf7e/backend/open_webui/retrieval/utils.py (L638-L649)Implementation: Replace the loop with batch fetching:
This changes O(N) queries to O(1) query.
4. Add Database Indexes
Implementation: Add composite indexes to improve
INquery performance:PostgreSQL can use these indexes to optimize the
WHERE file.id IN (...)queries with ordering.5. Implement Result Caching
Implementation: Add Redis caching for file metadata:
This reduces database load for frequently accessed file metadata.
6. Pagination for Large Collections
Implementation: Add cursor-based pagination to knowledge endpoints:
This prevents loading all knowledge collections and their files at once.
7. Optimize Batch File Processing
Current Problem:
process_files_batch()processes files sequentially and makes individual database updates.4d7fddaf7e/backend/open_webui/routers/retrieval.py (L2394-L2467)Implementation: Use bulk operations:
Add the bulk method to
FilesTable:This reduces N updates to 1 transaction.
8. Add Query Result Streaming
Implementation: For endpoints returning large file lists, implement streaming:
This prevents memory exhaustion with large collections.
Other Affected Endpoints
The same optimization patterns apply to (knowledge.py):
get_knowledge_by_id()- UsesFiles.get_file_metadatas_by_ids()for single collectionupdate_knowledge_by_id()- Same patternadd_files_to_knowledge_batch()- Loops through files individuallyreindex_knowledge_files()- UsesFiles.get_files_by_ids()but could benefit from streamingNotes
The frontend also makes individual file requests in
KnowledgeBase.sveltewhen loading file content, using a cache to mitigate repeated requests. This client-side caching helps but doesn't address the underlying N+1 query pattern on the backend.@deliciousbob commented on GitHub (Oct 6, 2025):
Wow thx Ricardo for your detailed reply and your remommendations to that topic.
I am not a programmer, i probably need another day or two to fully undestand all your proposals :)
But i've checked the first part and it you got it on point:
Current Problem: The knowledge listing endpoints (GET /api/knowledge/ and GET /api/knowledge/list) iterate through each knowledge collection and make separate queries for file metadata.I've checked the /api/knowledge and /api/knowledge/list, they both list all files including realted collection_id. (seems to me that these two api endpoints deliver exact the same data :-/ am i wrong ? )
I checked all the knowledge endpoints, and as far as i could tell, there is currently no way to only get a list of all knowledge collections (without related files).
Would it be possible to use the /api/knowledge/ to list only Knowledge collection (without files)
(or introduce a new endpoint /api/knowledge/collection)
Listing the collections on the chat, is currently showing all files from any knowledge (like having a list of 70K documents).
If there would be an implementation to only list the collections - problems solved.
Same counts for workspace -> knowledge -> listing only the collections - > done withing ms.
Clicking onto a collection -> Load only files related to the Collection -> having a Pagination (like you metinoned in part 7) would then be an welcome extra :)
I know it is not that simple, but thx for listenging to my thoughts :)
@rgaricano commented on GitHub (Oct 7, 2025):
the diference is in the permissions, both functions have identical logic for fetching files and handling missing file IDs, but the permission filter determines which knowledge bases are returned:
@Classic298 commented on GitHub (Oct 7, 2025):
Now this sounds like a job for
me@ShirasawaSama😄
@ShirasawaSama commented on GitHub (Oct 7, 2025):
Sorry, I'm not very familiar with the backend code for the knowledge base. I might only be able to help with frontend modifications.
But the main reason is that I've hardly ever used the knowledge base feature. 😂
@expruc commented on GitHub (Oct 11, 2025):
@deliciousbob The recent version
0.6.33addresses some of the problems you have mentioned, especially the loading of the workspace and knowledge pages. I have had a similar issue on my env and after the upgrade loading time is significantly faster. You will still encounter other problems though (such as attaching a knowledge to the chat) but overall the experience seems smoother (on my env at least).@deliciousbob commented on GitHub (Oct 13, 2025):
Hey guys, thx for the Update! I've just tested 0.6.33 on my Test-Env. and i can confirm that it loads collections much faster in Workspace and chats too.
I'll confirm if i updated the PROD environment with the 70K documents. I'll do some more tests on that. Thank you very much for all your changes!
Update: First knowledge retrieval caused an error as it exceeded the max tokens on my models.
I got 413 sources, but I've set Top_K to 35 max. in Retrieval settings.
Is there anything I configured wrong ? Thx for your help!
@deliciousbob commented on GitHub (Oct 13, 2025):
Issue with Retrieval seems to be fixed according to https://github.com/open-webui/open-webui/issues/18133
looking forward to the next update. Thx
@deliciousbob commented on GitHub (Oct 13, 2025):
Just tested 0.6.33 in my Production Env. with 70K documents.
(version from Issue:18133 -> ghcr.io/open-webui/open-webui:git-c4832fd-slim)
Workspace load is probably a bit faster:
Within the Chat-bar, it still loads extremly slow:
-> In total it takes about 50 sec. to show the knowledge list within the chat (same as loading the list of files within a collection)
The Problem is still that it loads the huge list of all files, if i scroll down fast, the list-marker does not even move.
In my opinion there is no need to list files on the knowledge selection within the chat.
Is there any way you could remove the files from chat -> knowledge -> listing ?
Or create an evironment variable to "disable listing singel files in knowledge retrieval (for large collections)"
Thank you very much!
@by-lin commented on GitHub (Oct 13, 2025):
Hey @deliciousbob thanks for sharing your experience. I have a similar setup in which we are running:
version 0.6.33 doesnt seem stable for me. Whenever I prompt a chat it gets stuck with the loading dot and then just shows some citations with no output.
Im curious what your settings and hardware specs look like and what kind of optimization you have tried in DB for both vector and backend owui. @deliciousbob
What are your waiting times per response? I see that you are using Azure OpenAI so resources seem plentiful to you.
@deliciousbob commented on GitHub (Oct 14, 2025):
Hi @by-lin thx for sharing your setup.
We moved from ollama to vllm but we only use vllm for embedding (snowflake-embed-v2-l & reranking (bge-reranker-v2-m3) models localy on a 3 node cluster with each 2x4070TI Super. For the rest we use Azure AI and that is astonishing cheap for chat interferences. (1K total Users consumed ~900€ a year)
I've only done some changes on the PGVector Container, some small adjustments on the postgresql.conf settings:
I've not yet changes the settings on the PostgresSQL OWUI-DB, as i had the feeling that i cannot opimise a lot if OWUI lists all 70K Document when adding knowledge to the Chat prompt.
Do you have similar experience of waiting when adding a knolwedge collection in the chat or opening the collections in the Workspace?
@deliciousbob commented on GitHub (Oct 16, 2025):
I've now tested a fork from @expruc (https://github.com/open-webui/open-webui/pull/18292)
He included a Env Variable to exclude the files from the Knowledge list.
This is the way, it works perfectly with my test-system (10K files).
Before it was like 5-7sec wait for the list to fill, now it is intantly showing the knowledge collections (only without files)
Thank you very much @expruc for fixing this issue!
Hope it will be comitted to productiv soon :) thx
@deliciousbob commented on GitHub (Nov 11, 2025):
Hi guys, i dont see any progress on that topic atm :-( is there anything i can help you with?
I really want to use OWUI for my usecases, is there any tested config that works with 100K documents ?
I could replace pgsql as DB-Backend and pgvector as my VectorDB.
Thank you, best regards, Robert
@Classic298 commented on GitHub (Nov 11, 2025):
@deliciousbob no matter the inefficiencies of the backend, if you work with 100k documents as you say, (each of which probably has 50 chunks) which results in 100000*50 = 5mil vectors, you should use a more performant database than pgvector
@deliciousbob commented on GitHub (Nov 11, 2025):
Pgvector alone has never been an issue with our 70k documents, by direct retrieval, we get results under a sec.
Can you pls tell me recommended setup that definitely works well with 100k files retrievals within OWUI?
Or does it ultimately always fail because of the inefficiencies in the RAG process in OWUI?
@tjbck commented on GitHub (Nov 11, 2025):
70k is definitely an extreme number, with that being said the issue here is purely UI/UX side of things no?
@Classic298 commented on GitHub (Nov 11, 2025):
@tjbck yes the issue is openwebui fetches ALL files when accessing a #knowledgebase like this in the chat, instead of accessing JUST the knowledgebase. Instead it fetches also all files inside the knowledgebase to display them in the little popup which causes slowdown because it will have to retrieve so many files
@deliciousbob commented on GitHub (Nov 12, 2025):
Hit guys! It definitely has something to do with the way the knowledge-list is populated,
that also affects the API when using the /api/chat/completions endpoint with collections.
The PGvector is well capable of handling 100k or more files and giving a response within a sec. with cousin similarity.
@expruc did a good job with #https://github.com/open-webui/open-webui/pull/18292 by disabling single files at the knowledge listing. But it seems to be not yet all endpoints included as we still faced some large delay when testing the changes on the api and in the UI too.
@ka-admin commented on GitHub (Nov 19, 2025):
I noticed that I can't use OpenWebUI in Firefox or Chrome browser anymore because the collection's item count cross the critical point for a browser page memory limit. In Firefox I could select a collection to use in query (after a looooong wait) but I can't run the query itself - the text of a query just disappear after I press Enter or Go button and nothing happens. In Chrome I can't even load a collection - it just shows Out of Memory error.
@deliciousbob commented on GitHub (Nov 19, 2025):
The PR from @expruc (https://github.com/open-webui/open-webui/pull/18328) would introduce a pagination / lazy load function for the Knowledge collections, that helps alot on loading the list of large collections.
@Classic298 commented on GitHub (Nov 19, 2025):
It can be reopened once the knowledge table migration was done. Otherwise his PR will not help.
His PR WOULD help, if he would only query e.g. 100 files.
But current API endpoints dont allow that.
We first need the knowledge file table migration and then we can look at that PR again (if someone reopens it) because then we actually have pagination options
@tjbck commented on GitHub (Dec 2, 2025):
Now that our kb table migration is complete, we just need to introduce proper pagination support alongside with frontend updates!
@Classic298 commented on GitHub (Dec 20, 2025):
should be fixed in dev now finally, pagination was introduced and performance is much better now
@deliciousbob commented on GitHub (Jan 2, 2026):
Hi Everyone! I want to thank everyone that was involved for fixing this issue!
The fixes now work perfectly also for a huge amount of Documents. I'll do further tests, but until now it works very smooth.
Thx alot, you are doing great work!
@deliciousbob commented on GitHub (Jan 14, 2026):
Hi Guys, thx again for all the changes, until now, I was not able to fully test everything yet, but finally i managed to reupload all files and update my productive setup. There is now a huge impovement to adding Knowledge collections to the chat.
I unfortunatelly noticed still a big delay in the retrieval process, not sure if any of you have the same issue as i do:
@ka-admin commented on GitHub (Jan 14, 2026):
i can confirm that the first time loading Knowledge has a delay (I see that in the monitoring software as OpenWebUp loading data from my nVMe intensively). But after that 'warm-up' everything works fine. It is unpleasant but not so critial because before the fix my browser runs into out-of-memory situation and I can't use RAG at all.
@Classic298 commented on GitHub (Jan 14, 2026):
Yeah that is.. that might just be unsolvable from Open WebUI's end.
If the data is on-disk and the Vector Database has to initialize it first by loading it into memory,... yeah. First time is slow, after that, much faster.
@deliciousbob commented on GitHub (Jan 14, 2026):
Hi guys, thx for the quick reply, that makes total sense, i‘ll do further tests on the big Collection tomorrow and will try to monitor the request on the pgvector side too.
@deliciousbob commented on GitHub (Jan 16, 2026):
Hi guys, I've tested it again, I have constant delays of approx. 38 seconds every time I start the request on the 25K Document Knowledge collection. As i told you, requesting chunks on PGVector directly only takes a second on the same Collection.
:-( does the PGVector request maybe still have the old schema to retrieving chunks?
Thank you for your help!
@Classic298 commented on GitHub (Jan 16, 2026):
@deliciousbob are you talking about the original issue (loading files in workspace) or querying the KB?
If the latter, that was not part of this issue and you'd need to provide a LOT more information. Like index type, RAM usage of the database, if you can even load everything to memory or not and many many more aspects. Basically would need to know the full setup.
Especially because preciously you confirmed that querying pgvector was very fast - and absolutely nothing was changed in the code in that sense. So any sudden slowness is most likely due to configuration, deployment and other environment factors rather than Open WebUI.
@deliciousbob commented on GitHub (Jan 16, 2026):
What i mentioned is that the load of the KB List on the Chat and on the Collection list is a massive improvement!
But the request against this large KC takes much longer then on small collections, so there must be something wrong as i max request for 30 junks.
Setup is the same as before, only with 25K Documents instead of 70K. Ram and CPU are not full at all.
As mentioned if I start a direct query (from pgadmin or node-red workflow) to the pgvector-db, i get the response of 30 junks within one second.
The same request on OWUI takes 38sec. Similar speed to what i saw in previous version on loading the KB List of files on the Chat.
@Classic298 commented on GitHub (Jan 16, 2026):
interesting
was it slow before too`?
@deliciousbob commented on GitHub (Jan 16, 2026):
As far as I can remember, it got that bad after I imported the 70K documents.
I had even timeouts on the browser when loading the Knowledge List on the Chat window.
With your changes, working with Collections is now very smooth, but the request itself is still slow, as it was before.
With smaller collections i get an answer after 10–15 seconds. With the 25K doc collections, it takes around a minute.
I did some requests, all with similar load time. Here are the container logs:
`2026-01-16 18:40:09.280 | DEBUG | aiocache.base:set:280 - SET <function at 0x72abb40dc900> 1 (0.0000)s
2026-01-16 18:40:09.330 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 10.0.0.2:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2026-01-16 18:40:09.379 | DEBUG | open_webui.retrieval.utils:get_sources_from_items:948 - items: [{'type': 'collection', 'id': 'cb869842-3b92-4a16-866b-9b440728c1bd', 'user_id': '9c7aa294-9270-4c84-84a3-86714ea301cc', 'name': 'JIRA IT-HELP', 'description': 'JIRA IT-HELP', 'meta': None, 'created_at': 1764672683, 'updated_at': 1767619492, 'write_access': True, 'status': 'processed'}] ['VPN error you are not allowed to access troubleshooting', 'VPN access denied error solutions', 'VPN connection worked yesterday but not today fix'] <function chat_completion_files_handler.. at 0x72ab1ea493a0> <function chat_completion_files_handler.. at 0x72ab28f75ee0> False
2026-01-16 18:40:09.382 | DEBUG | open_webui.retrieval.utils:query_collection_with_hybrid_search:477 - query_collection_with_hybrid_search:VECTOR_DB_CLIENT.get:collection cb869842-3b92-4a16-866b-9b440728c1bd
2026-01-16 18:40:36.051 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:487 - Starting hybrid search for 3 queries in 1 collections...
2026-01-16 18:40:36.052 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd
2026-01-16 18:40:40.809 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd
2026-01-16 18:40:46.770 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd`
@Classic298 commented on GitHub (Jan 16, 2026):
Root Cause Analysis
After investigating the logs, the performance bottleneck may have been identified:
The ~27 second delay occurs in the
VECTOR_DB_CLIENT.get()call which fetches ALL 25K documents from PGVector before any search can begin.18:40:09.382 - query_collection_with_hybrid_search:VECTOR_DB_CLIENT.get:collection 18:40:36.051 - Starting hybrid search for 3 queries in 1 collections...
Why This Happens
You most definitely use hybrid search, no?
Open WebUI's hybrid search uses BM25 (lexical) + vector search. The BM25 component requires loading all document text into memory to build an in-memory index on every query. This is fundamentally different from a pure vector search, which uses database-native indexes (HNSW/IVFFlat) and only returns top-K results.
Direct PGVector queries are fast (~1 second) because they use indexed vector similarity search. Open WebUI must load all docs for BM25.
Potential Solutions
In-memory caching - Cache the
GetResult(collection data) in memory per collection, invalidating when documents are added/removed. This would make subsequent queries instant. BUT: initial queries will still be slow and there is massive memory overhead to keep everything in memory. And if a single file changes, that invalidates the cache.PostgreSQL full-text search - Replace in-memory BM25 with
tsvector/tsqueryfor lexical search, using database-native indexing.Pre-built BM25 indexes - Serialize and store BM25 indexes, rebuilding only when the collection changes.
Disable hybrid for large collections - Add a threshold (e.g., 10K docs) to skip BM25 and use pure vector search.
Workaround (Now)
If you disable hybrid search, it should skip the BM25 component entirely and use pure vector search, which won't require loading all documents.
@deliciousbob
@deliciousbob commented on GitHub (Jan 17, 2026):
Hi @Classic298
Thanks for the tip; BM25 seems to be the problem indeed. I've disabled hybrid search, and the response is generated within 1-2 seconds then.
But disabling hybrid search comes with a downside too. I lose the option to do reranking.
I've configured BGE-reranker-v2-m3 from an external vLLM-API to narrow down the retrieval junks from 30 to the most relevant 10. This worked best for me and enhanced the quality of the retrieval a lot.
The relatively new extension to RAG is the "enrich hybrid search text" which seems to be also relying on BM25 too. According to the comment, it enriches the retrieval with the document title and adds additional context to the BM25 lexical recall.
Is there a way to disable BM25 when reranking is enabled from an external provider?
Thx for your help!
@Classic298 commented on GitHub (Jan 17, 2026):
@deliciousbob the only solution is to use the Database's native hybrid search instead of the Open WebUI-native solution.
Would require some implementation work for every single DB, except Chroma DB -> https://github.com/open-webui/open-webui/discussions/20737