[GH-ISSUE #17998] issue: 70K Documents in multiple Collections causes severe slowdown #57125

New Issue

GiteaMirror · 2026-05-05T20:37:26-05:00

GiteaMirror commented

2026-05-05 20:37:26 -05:00

Originally created by @deliciousbob on GitHub (Oct 2, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17998

Originally assigned to: @tjbck on GitHub.

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.30

Ollama Version (if applicable)

No response

Operating System

Docker Container

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Below 10K Documents, opening the Workspace KCs was fast (loaded 1-3 sec.)
Also selecting a Knowledge Collection in the chat was ok (loaded 1-3 sec.)

Actual Behavior

After uploading 70K documents the whole System got very slow, everything related to Documents got extremly slow.

Workspace -> Knowledge Collection -> Loading screen for 20 sec. -> Opening a big KC (50K) -> Another 20-30 sec.
Chat -> Selecting Knowledge -> loads for 20-30 sec. -> Shows a huge list of Collections & Documents that are already in the collection.

Steps to Reproduce

Good Day Community.
I have some serious performance issues after uploading about 70K Documents (2-4 pages each).
I noticed the selection in the Chat is terribly slow now. It takes 15-25 sec. to load the Knowledge and Chat.
I use PostgreSQL as a DB Backend for OWUI (same VM both docker containers - running on M.2-disks so network and disk performance are very good)
and another PGVector DB for the Vector-Data on another VM on the same Host (so network and disk perf. are good too)

I noticed that the knowledge selection in the Chat shows a huge list of Collections and Files.

I assume, that showing all files is causing a massive slowdown as the list need to be queried from PostgreSQL DB.
- Is there a way to only show Collections ?

Logs & Screenshots

I can provide if needed

Additional Information

PostgreSQL v17 as OWUI DB
PGVector v17 as VectorDB

Originally created by @deliciousbob on GitHub (Oct 2, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/17998 Originally assigned to: @tjbck on GitHub. ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.30 ### Ollama Version (if applicable) _No response_ ### Operating System Docker Container ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Below 10K Documents, opening the Workspace KCs was fast (loaded 1-3 sec.) Also selecting a Knowledge Collection in the chat was ok (loaded 1-3 sec.) ### Actual Behavior After uploading 70K documents the whole System got very slow, everything related to Documents got extremly slow. - Workspace -> Knowledge Collection -> Loading screen for 20 sec. -> Opening a big KC (50K) -> Another 20-30 sec. - Chat -> Selecting Knowledge -> loads for 20-30 sec. -> Shows a huge list of Collections & Documents that are already in the collection. ### Steps to Reproduce Good Day Community. I have some serious performance issues after uploading about 70K Documents (2-4 pages each). I noticed the selection in the Chat is terribly slow now. It takes 15-25 sec. to load the Knowledge and Chat. I use **PostgreSQL** as a **DB Backend** for **OWUI** (same VM both docker containers - running on M.2-disks so network and disk performance are very good) and another **PGVector** DB for the **Vector-Data** on another VM on the same Host (so network and disk perf. are good too) I noticed that the knowledge selection in the **Chat** shows a **huge list** of **Collections** and **Files**. - I assume, that showing all files is causing a massive slowdown as the list need to be queried from PostgreSQL DB. _- Is there a way to only show Collections ?_ ### Logs & Screenshots I can provide if needed ### Additional Information PostgreSQL v17 as OWUI DB PGVector v17 as VectorDB

GiteaMirror added the bug label 2026-05-05 20:37:26 -05:00

GiteaMirror closed this issue

2026-05-05 20:37:26 -05:00

GiteaMirror commented

2026-05-05 20:37:27 -05:00

@deliciousbob commented on GitHub (Oct 2, 2025):

My assumption is that the PostgreSQL for OWUI is causing the slowdown.
Is there any recommendation for the PostgreSQL config ? Thx

@deliciousbob commented on GitHub (Oct 2, 2025): My assumption is that the PostgreSQL for OWUI is causing the slowdown. Is there any recommendation for the PostgreSQL config ? Thx

GiteaMirror commented

2026-05-05 20:37:28 -05:00

@Classic298 commented on GitHub (Oct 2, 2025):

Is there any recommendation for the PostgreSQL config ? Thx

Yes: don't use PostgreSQL for such large file amounts.

Sorry, no other recommendation here.

Better use Qdrant (multitenancy mode) or Milvus (multitenancy mode)

@Classic298 commented on GitHub (Oct 2, 2025): > Is there any recommendation for the PostgreSQL config ? Thx Yes: don't use PostgreSQL for such large file amounts. Sorry, no other recommendation here. Better use Qdrant (multitenancy mode) or Milvus (multitenancy mode)

GiteaMirror commented

2026-05-05 20:37:29 -05:00

@ka-admin commented on GitHub (Oct 2, 2025):

I'm using qdrant and the situation is just the same

@ka-admin commented on GitHub (Oct 2, 2025): I'm using qdrant and the situation is just the same

GiteaMirror commented

2026-05-05 20:37:30 -05:00

@Classic298 commented on GitHub (Oct 2, 2025):

@ka-admin which qdrant? and what indexing type? and multitenancy or not?

@Classic298 commented on GitHub (Oct 2, 2025): @ka-admin which qdrant? and what indexing type? and multitenancy or not?

GiteaMirror commented

2026-05-05 20:37:30 -05:00

@ka-admin commented on GitHub (Oct 2, 2025):

@Classic298 docker run -d --name open-webui --network host --restart always -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://192.168.127.20:11434 -e VECTOR_DB=qdrant -e QDRANT_URI=http://192.168.127.20:6333 -e ENABLE_QDRANT_MULTITENANCY_MODE=true -e QDRANT_TIMEOUT=300 -e RAG_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B" -e RAG_RERANKING_MODEL="Qwen/Qwen3-Reranker-0.6B" -e RAG_TOP_K=20 -e RAG_TOP_K_RERANKER=20 -e CHUNK_SIZE=1024 -e CHUNK_OVERLAP=100 -e WEB_LOADER_ENGINE=playwright -e USER_AGENT='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' ghcr.io/open-webui/open-webui:main

qdrant -V
qdrant 1.15.1

@ka-admin commented on GitHub (Oct 2, 2025): @Classic298 docker run -d --name open-webui --network host --restart always -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://192.168.127.20:11434 -e VECTOR_DB=qdrant -e QDRANT_URI=http://192.168.127.20:6333 -e ENABLE_QDRANT_MULTITENANCY_MODE=true -e QDRANT_TIMEOUT=300 -e RAG_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B" -e RAG_RERANKING_MODEL="Qwen/Qwen3-Reranker-0.6B" -e RAG_TOP_K=20 -e RAG_TOP_K_RERANKER=20 -e CHUNK_SIZE=1024 -e CHUNK_OVERLAP=100 -e WEB_LOADER_ENGINE=playwright -e USER_AGENT='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' ghcr.io/open-webui/open-webui:main qdrant -V qdrant 1.15.1

GiteaMirror commented

2026-05-05 20:37:31 -05:00

@Classic298 commented on GitHub (Oct 2, 2025):

hmmm weird.

@Classic298 commented on GitHub (Oct 2, 2025): hmmm weird.

GiteaMirror commented

2026-05-05 20:37:32 -05:00

@ka-admin commented on GitHub (Oct 2, 2025):

@Classic298 I tell you more: to insert one document (code file - text format) to such a collection takes up to 5-6 seconds. I take a day to upload 5000 code files in collection. CUDA accelerated, nvme storage of course. When collection was new it takes only about an hour or maybe two to insert 5000 files. As collection started to grow the speed of insertion started to slows down dramatically. It looks like there is no caching mechanism (or it works very ineffective) that tells is this document in database or not (hashing or something like this). It looks like every insertion has to do rehashing all previously inserted files just to tell is new file potential duplicate. The more files in collection the more checks Open WebUI will do, it is only my suggestion I'm not sure if it's true or not. But it is what it is - managing large collections is pain in the back

@ka-admin commented on GitHub (Oct 2, 2025): @Classic298 I tell you more: to insert one document (code file - text format) to such a collection takes up to 5-6 seconds. I take a day to upload 5000 code files in collection. CUDA accelerated, nvme storage of course. When collection was new it takes only about an hour or maybe two to insert 5000 files. As collection started to grow the speed of insertion started to slows down dramatically. It looks like there is no caching mechanism (or it works very ineffective) that tells is this document in database or not (hashing or something like this). It looks like every insertion has to do rehashing all previously inserted files just to tell is new file potential duplicate. The more files in collection the more checks Open WebUI will do, it is only my suggestion I'm not sure if it's true or not. But it is what it is - managing large collections is pain in the back

GiteaMirror commented

2026-05-05 20:37:32 -05:00

@Classic298 commented on GitHub (Oct 2, 2025):

what indexing type do you have configured in qdrant? specs of your machine? especially memory and storage

@Classic298 commented on GitHub (Oct 2, 2025): what indexing type do you have configured in qdrant? specs of your machine? especially memory and storage

GiteaMirror commented

2026-05-05 20:37:32 -05:00

@deliciousbob commented on GitHub (Oct 2, 2025):

Thx for the quick response.
I am currently testing the PGvector seems to be fine here, i can retrieve Vectors within seconds,
it seems that the file handling or the backend DB of OWUI causes the issue.
I am further investigating into that.

I've tested Qdrant and Milvus too, by far i am no expert, but it seemed to grow on RAM quit fast if you import a lot of vectors.

My Upload procedure is currently done via API, I upload the file and then add it to the Collection. That works well at 3sec/document.
The file handling seems to be the issue. I am investigating further and let you know

@deliciousbob commented on GitHub (Oct 2, 2025): Thx for the quick response. I am currently testing the PGvector seems to be fine here, i can retrieve Vectors within seconds, it seems that the file handling or the backend DB of OWUI causes the issue. I am further investigating into that. I've tested Qdrant and Milvus too, by far i am no expert, but it seemed to grow on RAM quit fast if you import a lot of vectors. My Upload procedure is currently done via API, I upload the file and then add it to the Collection. That works well at 3sec/document. The file handling seems to be the issue. I am investigating further and let you know

GiteaMirror commented

2026-05-05 20:37:33 -05:00

@Classic298 commented on GitHub (Oct 2, 2025):

I've tested Qdrant and Milvus too, by far i am no expert, but it seemed to grow on RAM quit fast if you import a lot of vectors.

yes that is intended, as these vector databases are much faster if they use the RAM

@Classic298 commented on GitHub (Oct 2, 2025): > I've tested Qdrant and Milvus too, by far i am no expert, but it seemed to grow on RAM quit fast if you import a lot of vectors. yes that is intended, as these vector databases are much faster if they use the RAM

GiteaMirror commented

2026-05-05 20:37:34 -05:00

@deliciousbob commented on GitHub (Oct 2, 2025):

I've now tested direct queries to my PGvector DB, I get a response within 2-3 sec. (Limit 30 / L2 distance / 1024 Dimensions / 70K docs). When querying for cosine distance i get a response instantly within 1 sec.

PGVector seems to be working well and can handle large datasets by consuming less then 6GB RAM.

My next tests will be on the PostgreSQL Backend-DB for OWUI, as this is most likely storing the text and the storage location of the files.

@deliciousbob commented on GitHub (Oct 2, 2025): I've now tested direct queries to my PGvector DB, I get a response within 2-3 sec. (Limit 30 / L2 distance / 1024 Dimensions / 70K docs). When querying for cosine distance i get a response instantly within 1 sec. **PGVector seems to be working well and can handle large datasets by consuming less then 6GB RAM.** My next tests will be on the PostgreSQL Backend-DB for OWUI, as this is most likely storing the text and the storage location of the files.

GiteaMirror commented

2026-05-05 20:37:35 -05:00

@deliciousbob commented on GitHub (Oct 6, 2025):

I did some checks on the backend DB (pg:17)
I checked the requests from OWUI on opening the Workspace -> KCs :
I saw the following select (could not display the full request as it was to long for the logs):

SELECT file.id AS file_id, file.user_id AS file_user_id, file.hash AS file_hash, file.filename AS file_filename, file.path AS file_path, file.data AS file_data, file.meta AS file_meta, file.access_control AS file_access_control, file.created_at AS file_created_at, file.updated_at AS file_updated_at 
FROM file 
WHERE file.id IN ('dc08a171-9ba1-4f1e-b8c2-213cf3000287', 'd28aaba1-c98e-4af1-9630-8f41f3d6d86c', '0617aa7f-72a7-4b39-8673-5d6dbb004221', '8f887024-1c3c-4d03-9f62-fee2acf70310', '270a970c-541c-44c0-817d-11e312900c7d', '4ea79c26-ab76-4d62-a673-25600ddee5db', 'cd23e4ae-ac71-4605-a74e-654fed4435f6', 'bcc20f46-801d-4d03-8532-8562c1e8a503', '95f3f494-45d7-41bb-b1d3-1eff96f8c274', '7eced138-a205-4744-8090-b15102cb310a', 'd2c30ae8-4913-47e6-bdea-93c0e9de8547', 'ae2f940c-dce4-44c7-b14a-98e94033ebc2', '864f4910-a00b-4d53-ad96-ad28efac84f8', '17fc4913-208c-4ed1-af91-dcd9e02d292a', 'b531677f-b3ee-493e-bac9-41e5c0355330', '58944170-6808-4c9f-a3b7-49f3f1969396', 'afdf245f-20c7-4e51-8cef-980f46aea089', '471b0213-753

--> It seems to me like selecting all related documents by mentioning them (where file.id in) is a very unificient way of querying.

When i checked the #knowledge# Table in the DB, i saw that every knowledge directly stores the related file_ids in the #data# Column:
{"file_ids": ["c40cb6f9-c54e-4c4e-9662-656b63c889ae", "e913c33d-527b-4b72-a8d1-7370b331b439", "96e09c3b-46b9-4b13-ad49-79424b1c44c6", "31e96b1c-6151-4b71-8430-51518a5f3bd0", "b3ee8b6f-9ad2-4e47-963d-3a55fb95f7ba", "6bec315d-3b6a-40ca-b4dd-df462da5a09d", "ef2fd808-90ca-4edc-9306-b0ff35180923", "4bb10c8d-eb8e-4a9f-bb87-3c9c23d517ff", "8265c050-1db3-46e7-93c0-bf9140cd7826", "bd4bc637-809a-40f7-845d-8572be45ee0e", "36a5c879-b43f-4f2a-b81d-1008f8f6b485", "f9743037-0994-4216-af9c-

I am by far no DB expert, and i have the least experience with PostgreSQL, but it seems to me like a very inefficient way of storing relations. I am used to have like a relationship table in between similar to:

Knowledge (primary-key: id)   <-> KB_Files (fields: knowledge.id, file.id)   <->  Files (primary.key = id)

Key-Findinds and Recommendations:

Introduce a relation-table between Knowledge and files
If you goto knowledge, only query knowledge collections
- When opening a collection, query for related Files
- When opening a file, query for the files id only
For Chat: I have the feeling that it does not make sense to select single files, in most cases i want to select a collection.
- So for Chat only query for collections (related documents will then be anyhow retrieved by the vector query)
- Maybe introduce another tab for files if a user wants to still select only one file (probably not needed)

This would dramatically enhance the speed of handling RAG in OWUI in larger environments with some tousend documents.
VectorDB retrieval is not the issue as i mentioned in my previous comment, it seems to be the backend DB, especially the document retrieval probably needs an overhaul. (pls correct me if am wrong)

Let me know if you need further info, i would be happy to help on solving that issue.
Thank you, best regards, Robert

@deliciousbob commented on GitHub (Oct 6, 2025): I did some checks on the backend DB (pg:17) I checked the requests from OWUI on opening the Workspace -> KCs : I saw the following select (could not display the full request as it was to long for the logs): ``` SELECT file.id AS file_id, file.user_id AS file_user_id, file.hash AS file_hash, file.filename AS file_filename, file.path AS file_path, file.data AS file_data, file.meta AS file_meta, file.access_control AS file_access_control, file.created_at AS file_created_at, file.updated_at AS file_updated_at FROM file WHERE file.id IN ('dc08a171-9ba1-4f1e-b8c2-213cf3000287', 'd28aaba1-c98e-4af1-9630-8f41f3d6d86c', '0617aa7f-72a7-4b39-8673-5d6dbb004221', '8f887024-1c3c-4d03-9f62-fee2acf70310', '270a970c-541c-44c0-817d-11e312900c7d', '4ea79c26-ab76-4d62-a673-25600ddee5db', 'cd23e4ae-ac71-4605-a74e-654fed4435f6', 'bcc20f46-801d-4d03-8532-8562c1e8a503', '95f3f494-45d7-41bb-b1d3-1eff96f8c274', '7eced138-a205-4744-8090-b15102cb310a', 'd2c30ae8-4913-47e6-bdea-93c0e9de8547', 'ae2f940c-dce4-44c7-b14a-98e94033ebc2', '864f4910-a00b-4d53-ad96-ad28efac84f8', '17fc4913-208c-4ed1-af91-dcd9e02d292a', 'b531677f-b3ee-493e-bac9-41e5c0355330', '58944170-6808-4c9f-a3b7-49f3f1969396', 'afdf245f-20c7-4e51-8cef-980f46aea089', '471b0213-753 ``` --> It seems to me like selecting all related documents by mentioning them (where file.id in) is a very unificient way of querying. When i checked the #knowledge# Table in the DB, i saw that every knowledge directly stores the related file_ids in the #data# Column: ``` {"file_ids": ["c40cb6f9-c54e-4c4e-9662-656b63c889ae", "e913c33d-527b-4b72-a8d1-7370b331b439", "96e09c3b-46b9-4b13-ad49-79424b1c44c6", "31e96b1c-6151-4b71-8430-51518a5f3bd0", "b3ee8b6f-9ad2-4e47-963d-3a55fb95f7ba", "6bec315d-3b6a-40ca-b4dd-df462da5a09d", "ef2fd808-90ca-4edc-9306-b0ff35180923", "4bb10c8d-eb8e-4a9f-bb87-3c9c23d517ff", "8265c050-1db3-46e7-93c0-bf9140cd7826", "bd4bc637-809a-40f7-845d-8572be45ee0e", "36a5c879-b43f-4f2a-b81d-1008f8f6b485", "f9743037-0994-4216-af9c- ``` I am by far no DB expert, and i have the least experience with PostgreSQL, but it seems to me like a very inefficient way of storing relations. I am used to have like a relationship table in between similar to: ``` Knowledge (primary-key: id) <-> KB_Files (fields: knowledge.id, file.id) <-> Files (primary.key = id) ``` **Key-Findinds and Recommendations:** - Introduce a relation-table between **Knowledge** and **files** - If you goto **knowledge**, only query **knowledge collections** - When opening a collection, query for related Files - When opening a file, query for the files id only - For Chat: I have the feeling that it does not make sense to select single files, in most cases i want to select a collection. - So for Chat only query for collections (related documents will then be anyhow retrieved by the vector query) - Maybe introduce another tab for files if a user wants to still select only one file (probably not needed) This would **dramatically enhance the speed of handling RAG in OWUI** in larger environments with some tousend documents. **VectorDB retrieval is not the issue** as i mentioned in my previous comment, it seems to be the backend DB, especially the document retrieval probably needs an overhaul. (pls correct me if am wrong) Let me know if you need further info, i would be happy to help on solving that issue. Thank you, best regards, Robert

GiteaMirror commented

2026-05-05 20:37:36 -05:00

@rgaricano commented on GitHub (Oct 6, 2025):

Other posible optimizations

1. Batch Fetching Across Knowledge Collections

Current Problem: The knowledge listing endpoints (GET /api/knowledge/ and GET /api/knowledge/list) iterate through each knowledge collection and make separate queries for file metadata.
4d7fddaf7e/backend/open_webui/routers/knowledge.py (L42-L86)

Implementation: Collect all file IDs upfront, fetch in a single query, then distribute results:

# Collect all file IDs from all knowledge collections
all_file_ids = set()
for knowledge in knowledge_bases:
    all_file_ids.update(knowledge.data.get("file_ids", []))

# Single batch query
all_files = Files.get_file_metadatas_by_ids(list(all_file_ids))
file_map = {f.id: f for f in all_files}

# Distribute to each knowledge collection
for knowledge in knowledge_bases:
    knowledge.files = [file_map[fid] for fid in knowledge.data.get("file_ids", []) if fid in file_map]

This reduces N queries to 1 query.

2. Selective Column Loading with SQLAlchemy

Current Problem: get_file_metadatas_by_ids() fetches all columns including large JSON blobs in file.data and file.meta.
4d7fddaf7e/backend/open_webui/models/files.py (L180-L193)

Implementation: Add a new method with deferred loading:

def get_file_metadatas_by_ids_minimal(self, ids: list[str]) -> list[FileMetadataResponse]:
    with get_db() as db:
        return [
            FileMetadataResponse(
                id=file.id,
                meta=file.meta,
                created_at=file.created_at,
                updated_at=file.updated_at,
            )
            for file in db.query(File)
            .options(defer(File.data))  # Don't load large data column
            .filter(File.id.in_(ids))
            .order_by(File.updated_at.desc())
            .all()
        ]

This avoids transferring large JSON blobs from PostgreSQL when only metadata is needed.

3. Optimize Retrieval Full Context Mode

Current Problem: get_sources_from_items() loops through file IDs individually when processing collections in "full" context mode, calling Files.get_file_by_id() in a loop.
4d7fddaf7e/backend/open_webui/retrieval/utils.py (L638-L649)

Implementation: Replace the loop with batch fetching:

# Instead of:
for file_id in file_ids:
    file_object = Files.get_file_by_id(file_id)
    if file_object:
        documents.append(file_object.data.get("content", ""))
        metadatas.append({...})

# Use:
file_objects = Files.get_files_by_ids(file_ids)
for file_object in file_objects:
    documents.append(file_object.data.get("content", ""))
    metadatas.append({...})

This changes O(N) queries to O(1) query.

4. Add Database Indexes

Implementation: Add composite indexes to improve IN query performance:

CREATE INDEX idx_file_id_updated_at ON file(id, updated_at DESC);
CREATE INDEX idx_file_user_id_updated_at ON file(user_id, updated_at DESC);

PostgreSQL can use these indexes to optimize the WHERE file.id IN (...) queries with ordering.

5. Implement Result Caching

Implementation: Add Redis caching for file metadata:

def get_file_metadatas_by_ids_cached(self, ids: list[str]) -> list[FileMetadataResponse]:
    cache_keys = [f"file_meta:{fid}" for fid in ids]
    cached = redis_client.mget(cache_keys)

    missing_ids = [ids[i] for i, val in enumerate(cached) if val is None]

    if missing_ids:
        fresh_data = self.get_file_metadatas_by_ids(missing_ids)
        for file_meta in fresh_data:
            redis_client.setex(f"file_meta:{file_meta.id}", 3600, json.dumps(file_meta))

    # Combine cached + fresh results
    return results

This reduces database load for frequently accessed file metadata.

6. Pagination for Large Collections

Implementation: Add cursor-based pagination to knowledge endpoints:

@router.get("/", response_model=PaginatedKnowledgeResponse)
async def get_knowledge_bases(
    cursor: Optional[str] = None,
    limit: int = 50,
    user=Depends(get_verified_user)
):
    # Fetch knowledge collections with pagination
    # Only load file metadata for current page

This prevents loading all knowledge collections and their files at once.

7. Optimize Batch File Processing

Current Problem: process_files_batch() processes files sequentially and makes individual database updates.
4d7fddaf7e/backend/open_webui/routers/retrieval.py (L2394-L2467)

Implementation: Use bulk operations:

# Instead of individual updates:
for result in results:
    Files.update_file_metadata_by_id(result.file_id, {"collection_name": collection_name})

# Use bulk update:
Files.bulk_update_file_metadata(
    [(r.file_id, {"collection_name": collection_name}) for r in results]
)

Add the bulk method to FilesTable:

def bulk_update_file_metadata(self, updates: list[tuple[str, dict]]):
    with get_db() as db:
        for file_id, meta in updates:
            db.query(File).filter_by(id=file_id).update(
                {"meta": func.jsonb_set(File.meta, '{}', meta)}
            )
        db.commit()

This reduces N updates to 1 transaction.

8. Add Query Result Streaming

Implementation: For endpoints returning large file lists, implement streaming:

@router.get("/{id}/files/stream")
async def stream_knowledge_files(id: str, user=Depends(get_verified_user)):
    async def generate():
        file_ids = knowledge.data.get("file_ids", [])
        for batch in chunks(file_ids, 100):  # Process in batches
            files = Files.get_file_metadatas_by_ids(batch)
            yield json.dumps([f.dict() for f in files]) + "\n"

    return StreamingResponse(generate(), media_type="application/x-ndjson")

This prevents memory exhaustion with large collections.

Other Affected Endpoints

The same optimization patterns apply to (knowledge.py):

get_knowledge_by_id() - Uses Files.get_file_metadatas_by_ids() for single collection
update_knowledge_by_id() - Same pattern
add_files_to_knowledge_batch() - Loops through files individually
reindex_knowledge_files() - Uses Files.get_files_by_ids() but could benefit from streaming

Notes

The frontend also makes individual file requests in KnowledgeBase.svelte when loading file content, using a cache to mitigate repeated requests. This client-side caching helps but doesn't address the underlying N+1 query pattern on the backend.

@rgaricano commented on GitHub (Oct 6, 2025): ## Other posible optimizations ### 1. Batch Fetching Across Knowledge Collections **Current Problem**: The knowledge listing endpoints (`GET /api/knowledge/` and `GET /api/knowledge/list`) iterate through each knowledge collection and make separate queries for file metadata. https://github.com/open-webui/open-webui/blob/4d7fddaf7e434bf59fdd879ef11d712a503b7863/backend/open_webui/routers/knowledge.py#L42-L86 **Implementation**: Collect all file IDs upfront, fetch in a single query, then distribute results: ```python # Collect all file IDs from all knowledge collections all_file_ids = set() for knowledge in knowledge_bases: all_file_ids.update(knowledge.data.get("file_ids", [])) # Single batch query all_files = Files.get_file_metadatas_by_ids(list(all_file_ids)) file_map = {f.id: f for f in all_files} # Distribute to each knowledge collection for knowledge in knowledge_bases: knowledge.files = [file_map[fid] for fid in knowledge.data.get("file_ids", []) if fid in file_map] ``` This reduces N queries to 1 query. ### 2. Selective Column Loading with SQLAlchemy **Current Problem**: `get_file_metadatas_by_ids()` fetches all columns including large JSON blobs in `file.data` and `file.meta`. https://github.com/open-webui/open-webui/blob/4d7fddaf7e434bf59fdd879ef11d712a503b7863/backend/open_webui/models/files.py#L180-L193 **Implementation**: Add a new method with deferred loading: ```python def get_file_metadatas_by_ids_minimal(self, ids: list[str]) -> list[FileMetadataResponse]: with get_db() as db: return [ FileMetadataResponse( id=file.id, meta=file.meta, created_at=file.created_at, updated_at=file.updated_at, ) for file in db.query(File) .options(defer(File.data)) # Don't load large data column .filter(File.id.in_(ids)) .order_by(File.updated_at.desc()) .all() ] ``` This avoids transferring large JSON blobs from PostgreSQL when only metadata is needed.<cite/> ### 3. Optimize Retrieval Full Context Mode **Current Problem**: `get_sources_from_items()` loops through file IDs individually when processing collections in "full" context mode, calling `Files.get_file_by_id()` in a loop. https://github.com/open-webui/open-webui/blob/4d7fddaf7e434bf59fdd879ef11d712a503b7863/backend/open_webui/retrieval/utils.py#L638-L649 **Implementation**: Replace the loop with batch fetching: ```python # Instead of: for file_id in file_ids: file_object = Files.get_file_by_id(file_id) if file_object: documents.append(file_object.data.get("content", "")) metadatas.append({...}) # Use: file_objects = Files.get_files_by_ids(file_ids) for file_object in file_objects: documents.append(file_object.data.get("content", "")) metadatas.append({...}) ``` This changes O(N) queries to O(1) query.<cite/> ### 4. Add Database Indexes **Implementation**: Add composite indexes to improve `IN` query performance: ```sql CREATE INDEX idx_file_id_updated_at ON file(id, updated_at DESC); CREATE INDEX idx_file_user_id_updated_at ON file(user_id, updated_at DESC); ``` PostgreSQL can use these indexes to optimize the `WHERE file.id IN (...)` queries with ordering. ### 5. Implement Result Caching **Implementation**: Add Redis caching for file metadata: ```python def get_file_metadatas_by_ids_cached(self, ids: list[str]) -> list[FileMetadataResponse]: cache_keys = [f"file_meta:{fid}" for fid in ids] cached = redis_client.mget(cache_keys) missing_ids = [ids[i] for i, val in enumerate(cached) if val is None] if missing_ids: fresh_data = self.get_file_metadatas_by_ids(missing_ids) for file_meta in fresh_data: redis_client.setex(f"file_meta:{file_meta.id}", 3600, json.dumps(file_meta)) # Combine cached + fresh results return results ``` This reduces database load for frequently accessed file metadata.<cite/> ### 6. Pagination for Large Collections **Implementation**: Add cursor-based pagination to knowledge endpoints: ```python @router.get("/", response_model=PaginatedKnowledgeResponse) async def get_knowledge_bases( cursor: Optional[str] = None, limit: int = 50, user=Depends(get_verified_user) ): # Fetch knowledge collections with pagination # Only load file metadata for current page ``` This prevents loading all knowledge collections and their files at once. ### 7. Optimize Batch File Processing **Current Problem**: `process_files_batch()` processes files sequentially and makes individual database updates. https://github.com/open-webui/open-webui/blob/4d7fddaf7e434bf59fdd879ef11d712a503b7863/backend/open_webui/routers/retrieval.py#L2394-L2467 **Implementation**: Use bulk operations: ```python # Instead of individual updates: for result in results: Files.update_file_metadata_by_id(result.file_id, {"collection_name": collection_name}) # Use bulk update: Files.bulk_update_file_metadata( [(r.file_id, {"collection_name": collection_name}) for r in results] ) ``` Add the bulk method to `FilesTable`: ```python def bulk_update_file_metadata(self, updates: list[tuple[str, dict]]): with get_db() as db: for file_id, meta in updates: db.query(File).filter_by(id=file_id).update( {"meta": func.jsonb_set(File.meta, '{}', meta)} ) db.commit() ``` This reduces N updates to 1 transaction.<cite/> ### 8. Add Query Result Streaming **Implementation**: For endpoints returning large file lists, implement streaming: ```python @router.get("/{id}/files/stream") async def stream_knowledge_files(id: str, user=Depends(get_verified_user)): async def generate(): file_ids = knowledge.data.get("file_ids", []) for batch in chunks(file_ids, 100): # Process in batches files = Files.get_file_metadatas_by_ids(batch) yield json.dumps([f.dict() for f in files]) + "\n" return StreamingResponse(generate(), media_type="application/x-ndjson") ``` This prevents memory exhaustion with large collections.<cite/> ## Other Affected Endpoints The same optimization patterns apply to (knowledge.py): 1. **`get_knowledge_by_id()`** - Uses `Files.get_file_metadatas_by_ids()` for single collection 2. **`update_knowledge_by_id()`** - Same pattern 3. **`add_files_to_knowledge_batch()`** - Loops through files individually 4. **`reindex_knowledge_files()`** - Uses `Files.get_files_by_ids()` but could benefit from streaming ## Notes The frontend also makes individual file requests in `KnowledgeBase.svelte` when loading file content, using a cache to mitigate repeated requests. This client-side caching helps but doesn't address the underlying N+1 query pattern on the backend.

GiteaMirror commented

2026-05-05 20:37:36 -05:00

@deliciousbob commented on GitHub (Oct 6, 2025):

Wow thx Ricardo for your detailed reply and your remommendations to that topic.
I am not a programmer, i probably need another day or two to fully undestand all your proposals :)
But i've checked the first part and it you got it on point:
Current Problem: The knowledge listing endpoints (GET /api/knowledge/ and GET /api/knowledge/list) iterate through each knowledge collection and make separate queries for file metadata.

I've checked the /api/knowledge and /api/knowledge/list, they both list all files including realted collection_id. (seems to me that these two api endpoints deliver exact the same data :-/ am i wrong ? )
I checked all the knowledge endpoints, and as far as i could tell, there is currently no way to only get a list of all knowledge collections (without related files).

Would it be possible to use the /api/knowledge/ to list only Knowledge collection (without files)
(or introduce a new endpoint /api/knowledge/collection)
Listing the collections on the chat, is currently showing all files from any knowledge (like having a list of 70K documents).
If there would be an implementation to only list the collections - problems solved.

Same counts for workspace -> knowledge -> listing only the collections - > done withing ms.
Clicking onto a collection -> Load only files related to the Collection -> having a Pagination (like you metinoned in part 7) would then be an welcome extra :)

I know it is not that simple, but thx for listenging to my thoughts :)

@deliciousbob commented on GitHub (Oct 6, 2025): Wow thx Ricardo for your detailed reply and your remommendations to that topic. I am not a programmer, i probably need another day or two to fully undestand all your proposals :) But i've checked the first part and it you got it on point: `Current Problem: The knowledge listing endpoints (GET /api/knowledge/ and GET /api/knowledge/list) iterate through each knowledge collection and make separate queries for file metadata.` I've checked the **/api/knowledge** and **/api/knowledge/list**, they **both list all files including realted collection_id**. (seems to me that **these two api** endpoints **deliver exact the same data** :-/ am i wrong ? ) I checked all the knowledge endpoints, and as far as i could tell, there is **currently no way** to **only get a list of all knowledge collections** (without related files). **Would it be possible to use the /api/knowledge/ to list only Knowledge collection (without files)** (or introduce a new endpoint /api/knowledge/collection) Listing the collections on the chat, is currently showing all files from any knowledge (like having a list of 70K documents). If there would be an implementation to only list the collections - **_problems solved_**. Same counts for workspace -> knowledge -> listing only the collections - > done withing ms. Clicking onto a collection -> Load only files related to the Collection -> **having a Pagination (like you metinoned in part 7) would then be an welcome extra** :) I know it is not that simple, but thx for listenging to my thoughts :)

GiteaMirror commented

2026-05-05 20:37:36 -05:00

@rgaricano commented on GitHub (Oct 7, 2025):

I've checked the /api/knowledge and /api/knowledge/list, they both list all files including realted collection_id. (seems to me that these two api endpoints deliver exact the same data :-/ am i wrong ? )

the diference is in the permissions, both functions have identical logic for fetching files and handling missing file IDs, but the permission filter determines which knowledge bases are returned:

get_knowledge (GET /api/v1/knowledge/)
- Permission required: "read" access
- Use case: Displays knowledge bases the user can view/use in chat
get_knowledge_list (GET /api/v1/knowledge/list)
- Permission required: "write" access
- Use case: Shows knowledge bases the user can edit/manage in the workspace

@rgaricano commented on GitHub (Oct 7, 2025): > I've checked the **/api/knowledge** and **/api/knowledge/list**, they **both list all files including realted collection_id**. (seems to me that **these two api** endpoints **deliver exact the same data** :-/ am i wrong ? ) the diference is in the permissions, both functions have identical logic for fetching files and handling missing file IDs, but the permission filter determines which knowledge bases are returned: - get_knowledge (GET /api/v1/knowledge/) - Permission required: "read" access - Use case: Displays knowledge bases the user can view/use in chat - get_knowledge_list (GET /api/v1/knowledge/list) - Permission required: "write" access - Use case: Shows knowledge bases the user can edit/manage in the workspace

GiteaMirror commented

2026-05-05 20:37:36 -05:00

@Classic298 commented on GitHub (Oct 7, 2025):

Now this sounds like a job for me @ShirasawaSama

😄

@Classic298 commented on GitHub (Oct 7, 2025): Now this sounds like a job for ~~me~~ @ShirasawaSama 😄

GiteaMirror commented

2026-05-05 20:37:37 -05:00

@ShirasawaSama commented on GitHub (Oct 7, 2025):

Now this sounds like a job for me @ShirasawaSama

😄

Sorry, I'm not very familiar with the backend code for the knowledge base. I might only be able to help with frontend modifications.

But the main reason is that I've hardly ever used the knowledge base feature. 😂

@ShirasawaSama commented on GitHub (Oct 7, 2025): > Now this sounds like a job for ~me~ [@ShirasawaSama](https://github.com/ShirasawaSama) > > 😄 Sorry, I'm not very familiar with the backend code for the knowledge base. I might only be able to help with frontend modifications. _But the main reason is that I've hardly ever used the knowledge base feature. 😂_

GiteaMirror commented

2026-05-05 20:37:38 -05:00

@expruc commented on GitHub (Oct 11, 2025):

@deliciousbob The recent version 0.6.33 addresses some of the problems you have mentioned, especially the loading of the workspace and knowledge pages. I have had a similar issue on my env and after the upgrade loading time is significantly faster. You will still encounter other problems though (such as attaching a knowledge to the chat) but overall the experience seems smoother (on my env at least).

@expruc commented on GitHub (Oct 11, 2025): @deliciousbob The recent version `0.6.33` addresses some of the problems you have mentioned, especially the loading of the workspace and knowledge pages. I have had a similar issue on my env and after the upgrade loading time is significantly faster. You will still encounter other problems though (such as attaching a knowledge to the chat) but overall the experience seems smoother (on my env at least).

GiteaMirror commented

2026-05-05 20:37:40 -05:00

@deliciousbob commented on GitHub (Oct 13, 2025):

Hey guys, thx for the Update! I've just tested 0.6.33 on my Test-Env. and i can confirm that it loads collections much faster in Workspace and chats too.
I'll confirm if i updated the PROD environment with the 70K documents. I'll do some more tests on that. Thank you very much for all your changes!

Update: First knowledge retrieval caused an error as it exceeded the max tokens on my models.
I got 413 sources, but I've set Top_K to 35 max. in Retrieval settings.

Is there anything I configured wrong ? Thx for your help!

@deliciousbob commented on GitHub (Oct 13, 2025): Hey guys, thx for the Update! I've just tested 0.6.33 on my Test-Env. and **i can confirm that it loads collections much faster in Workspace and chats too.** I'll confirm if i updated the PROD environment with the 70K documents. I'll do some more tests on that. Thank you very much for all your changes! Update: First knowledge retrieval caused an error as it exceeded the max tokens on my models. I got **413 sources**, but I've set **Top_K to 35** max. in Retrieval settings. <img width="1239" height="301" alt="Image" src="https://github.com/user-attachments/assets/3c7d3a6e-655a-4107-bdbd-051a148dee2a" /> <img width="1260" height="403" alt="Image" src="https://github.com/user-attachments/assets/dafd0ad9-06ef-4d94-9eb1-9bf9e34376a3" /> Is there anything I configured wrong ? Thx for your help!

GiteaMirror commented

2026-05-05 20:37:41 -05:00

@deliciousbob commented on GitHub (Oct 13, 2025):

Issue with Retrieval seems to be fixed according to https://github.com/open-webui/open-webui/issues/18133
looking forward to the next update. Thx

@deliciousbob commented on GitHub (Oct 13, 2025): Issue with Retrieval seems to be fixed according to https://github.com/open-webui/open-webui/issues/18133 looking forward to the next update. Thx

GiteaMirror commented

2026-05-05 20:37:42 -05:00

@deliciousbob commented on GitHub (Oct 13, 2025):

Just tested 0.6.33 in my Production Env. with 70K documents.
(version from Issue:18133 -> ghcr.io/open-webui/open-webui:git-c4832fd-slim)

Workspace load is probably a bit faster:

takes ~12 sec. to load the list of collections
and ~50 sec. to load the document list of my biggest collection with 60K documents.

Within the Chat-bar, it still loads extremly slow:

click on "+" -> click on Knowledge -> wait -> 15 sec. -> "no Respond message from browser" -> wait -> 15 sec. -> another "no Respond message" -> wait -> 5 sec. then finally list shows all collections and also all files.
-> In total it takes about 50 sec. to show the knowledge list within the chat (same as loading the list of files within a collection)

The Problem is still that it loads the huge list of all files, if i scroll down fast, the list-marker does not even move.
In my opinion there is no need to list files on the knowledge selection within the chat.

There is no option to upload files to knowledge directly, the system is build to maintain documents within collections.
Limiting the knowledge to one file does not make a lot of sense, or is probably not used anymore (maybe was a feature in the past) as the Retrieval process gets top of relevant documents already.

Is there any way you could remove the files from chat -> knowledge -> listing ?
Or create an evironment variable to "disable listing singel files in knowledge retrieval (for large collections)"
Thank you very much!

@deliciousbob commented on GitHub (Oct 13, 2025): Just tested 0.6.33 in my Production Env. with 70K documents. (version from Issue:18133 -> ghcr.io/open-webui/open-webui:git-c4832fd-slim) Workspace load is probably a bit faster: - takes **~12 sec.** to load the **list of collections** - and **~50 sec.** to load the **document list** of my biggest collection with 60K documents. Within the Chat-bar, it **still loads extremly slow**: - click on "+" -> click on Knowledge -> wait -> 15 sec. -> "no Respond message from browser" -> wait -> 15 sec. -> another "no Respond message" -> wait -> 5 sec. then finally list shows all collections and also all files. -> In total it **takes about 50 sec. to show the knowledge list** within the chat (same as loading the list of files within a collection) The **Problem** is **still** that it **loads the huge list of all files**, if i scroll down fast, the list-marker does not even move. In my opinion there is no need to list files on the knowledge selection within the chat. - There is no option to upload files to knowledge directly, the system is build to maintain documents within collections. - Limiting the knowledge to one file does not make a lot of sense, or is probably not used anymore (maybe was a feature in the past) as the Retrieval process gets top of relevant documents already. **Is there any way you could remove the files from chat -> knowledge -> listing ?** Or create an _evironment variable_ to _"disable listing singel files in knowledge retrieval (for large collections)"_ Thank you very much!

GiteaMirror commented

2026-05-05 20:37:42 -05:00

@by-lin commented on GitHub (Oct 13, 2025):

Hey @deliciousbob thanks for sharing your experience. I have a similar setup in which we are running:

ollama backend
openwebui frontend (0.6.30)
postgres DB +pgvector (same VM different container pg18 and pgvector 0.8)
using IVFFLAT now as index ( reindexed with higher number of lists might switch to HNSW)
70k ish files
qwen3:30b
embeddinggemma:300m
bge-reranker-v2-m3

version 0.6.33 doesnt seem stable for me. Whenever I prompt a chat it gets stuck with the loading dot and then just shows some citations with no output.
Im curious what your settings and hardware specs look like and what kind of optimization you have tried in DB for both vector and backend owui. @deliciousbob
What are your waiting times per response? I see that you are using Azure OpenAI so resources seem plentiful to you.

@by-lin commented on GitHub (Oct 13, 2025): Hey @deliciousbob thanks for sharing your experience. I have a similar setup in which we are running: - ollama backend - openwebui frontend (0.6.30) - postgres DB +pgvector (same VM different container pg18 and pgvector 0.8) - using IVFFLAT now as index ( reindexed with higher number of lists might switch to HNSW) - 70k ish files - qwen3:30b - embeddinggemma:300m - bge-reranker-v2-m3 version 0.6.33 doesnt seem stable for me. Whenever I prompt a chat it gets stuck with the loading dot and then just shows some citations with no output. Im curious what your settings and hardware specs look like and what kind of optimization you have tried in DB for both vector and backend owui. @deliciousbob What are your waiting times per response? I see that you are using Azure OpenAI so resources seem plentiful to you.

GiteaMirror commented

2026-05-05 20:37:42 -05:00

@deliciousbob commented on GitHub (Oct 14, 2025):

Hi @by-lin thx for sharing your setup.
We moved from ollama to vllm but we only use vllm for embedding (snowflake-embed-v2-l & reranking (bge-reranker-v2-m3) models localy on a 3 node cluster with each 2x4070TI Super. For the rest we use Azure AI and that is astonishing cheap for chat interferences. (1K total Users consumed ~900€ a year)

We mainly use GPT4.1-mini and LLama3.3 70B as general LLMs.
As an API Gateway for all AI stuff, we route traffic trough LiteLLM
PostgreDB 17 as backend and PGvector for Embeddings.
LLM Response is superb, Azure AI delivers instand response, no matter how many Users run at the same time.
Embedding and Reranking are small models, they can handle paralell processing up to 100 requests at once.

I've only done some changes on the PGVector Container, some small adjustments on the postgresql.conf settings:

shared_buffers = 1GB
effective_cache_size = 3GB
maintenance_work_mem = 256MB
work_mem = 16MB

I've not yet changes the settings on the PostgresSQL OWUI-DB, as i had the feeling that i cannot opimise a lot if OWUI lists all 70K Document when adding knowledge to the Chat prompt.

Do you have similar experience of waiting when adding a knolwedge collection in the chat or opening the collections in the Workspace?

@deliciousbob commented on GitHub (Oct 14, 2025): Hi @by-lin thx for sharing your setup. We moved from ollama to vllm but we only use vllm for embedding (snowflake-embed-v2-l & reranking (bge-reranker-v2-m3) models localy on a 3 node cluster with each 2x4070TI Super. For the rest we use Azure AI and that is astonishing cheap for chat interferences. (1K total Users consumed ~900€ a year) - We mainly use GPT4.1-mini and LLama3.3 70B as general LLMs. - As an API Gateway for all AI stuff, we route traffic trough LiteLLM - PostgreDB 17 as backend and PGvector for Embeddings. - LLM Response is superb, Azure AI delivers instand response, no matter how many Users run at the same time. - Embedding and Reranking are small models, they can handle paralell processing up to 100 requests at once. I've only done some changes on the PGVector Container, some small adjustments on the postgresql.conf settings: - shared_buffers = 1GB - effective_cache_size = 3GB - maintenance_work_mem = 256MB - work_mem = 16MB I've not yet changes the settings on the PostgresSQL OWUI-DB, as i had the feeling that i cannot opimise a lot if OWUI lists all 70K Document when adding knowledge to the Chat prompt. Do you have similar experience of waiting when adding a knolwedge collection in the chat or opening the collections in the Workspace?

GiteaMirror commented

2026-05-05 20:37:43 -05:00

@deliciousbob commented on GitHub (Oct 16, 2025):

I've now tested a fork from @expruc (https://github.com/open-webui/open-webui/pull/18292)
He included a Env Variable to exclude the files from the Knowledge list.
This is the way, it works perfectly with my test-system (10K files).
Before it was like 5-7sec wait for the list to fill, now it is intantly showing the knowledge collections (only without files)
Thank you very much @expruc for fixing this issue!
Hope it will be comitted to productiv soon :) thx

@deliciousbob commented on GitHub (Oct 16, 2025): I've now tested a fork from @expruc (https://github.com/open-webui/open-webui/pull/18292) He included a Env Variable to exclude the files from the Knowledge list. This is the way, it works perfectly with my test-system (10K files). Before it was like 5-7sec wait for the list to fill, now it is intantly showing the knowledge collections (only without files) Thank you very much @expruc for fixing this issue! Hope it will be comitted to productiv soon :) thx

GiteaMirror commented

2026-05-05 20:37:43 -05:00

@deliciousbob commented on GitHub (Nov 11, 2025):

Hi guys, i dont see any progress on that topic atm :-( is there anything i can help you with?

I really want to use OWUI for my usecases, is there any tested config that works with 100K documents ?
I could replace pgsql as DB-Backend and pgvector as my VectorDB.
Thank you, best regards, Robert

@deliciousbob commented on GitHub (Nov 11, 2025): Hi guys, i dont see any progress on that topic atm :-( is there **anything i can help you with**? I really want to use OWUI for my usecases, **is there any tested config that works with 100K documents** ? I could replace pgsql as DB-Backend and pgvector as my VectorDB. Thank you, best regards, Robert

GiteaMirror commented

2026-05-05 20:37:44 -05:00

@Classic298 commented on GitHub (Nov 11, 2025):

@deliciousbob no matter the inefficiencies of the backend, if you work with 100k documents as you say, (each of which probably has 50 chunks) which results in 100000*50 = 5mil vectors, you should use a more performant database than pgvector

@Classic298 commented on GitHub (Nov 11, 2025): @deliciousbob no matter the inefficiencies of the backend, if you work with 100k documents as you say, (each of which probably has 50 chunks) which results in 100000*50 = 5mil vectors, you should use a more performant database than pgvector

GiteaMirror commented

2026-05-05 20:37:44 -05:00

@deliciousbob commented on GitHub (Nov 11, 2025):

Pgvector alone has never been an issue with our 70k documents, by direct retrieval, we get results under a sec.
Can you pls tell me recommended setup that definitely works well with 100k files retrievals within OWUI?
Or does it ultimately always fail because of the inefficiencies in the RAG process in OWUI?

@deliciousbob commented on GitHub (Nov 11, 2025): Pgvector alone has never been an issue with our 70k documents, by direct retrieval, we get results under a sec. Can you pls tell me recommended setup that definitely works well with 100k files retrievals within OWUI? Or does it ultimately always fail because of the inefficiencies in the RAG process in OWUI?

GiteaMirror commented

2026-05-05 20:37:45 -05:00

@tjbck commented on GitHub (Nov 11, 2025):

70k is definitely an extreme number, with that being said the issue here is purely UI/UX side of things no?

@tjbck commented on GitHub (Nov 11, 2025): 70k is definitely an extreme number, with that being said the issue here is purely UI/UX side of things no?

GiteaMirror commented

2026-05-05 20:37:45 -05:00

@Classic298 commented on GitHub (Nov 11, 2025):

@tjbck yes the issue is openwebui fetches ALL files when accessing a #knowledgebase like this in the chat, instead of accessing JUST the knowledgebase. Instead it fetches also all files inside the knowledgebase to display them in the little popup which causes slowdown because it will have to retrieve so many files

@Classic298 commented on GitHub (Nov 11, 2025): @tjbck yes the issue is openwebui fetches ALL files when accessing a #knowledgebase like this in the chat, instead of accessing JUST the knowledgebase. Instead it fetches also all files inside the knowledgebase to display them in the little popup which causes slowdown because it will have to retrieve so many files

GiteaMirror commented

2026-05-05 20:37:46 -05:00

@deliciousbob commented on GitHub (Nov 12, 2025):

Hit guys! It definitely has something to do with the way the knowledge-list is populated,
that also affects the API when using the /api/chat/completions endpoint with collections.

The PGvector is well capable of handling 100k or more files and giving a response within a sec. with cousin similarity.

@expruc did a good job with #https://github.com/open-webui/open-webui/pull/18292 by disabling single files at the knowledge listing. But it seems to be not yet all endpoints included as we still faced some large delay when testing the changes on the api and in the UI too.

@deliciousbob commented on GitHub (Nov 12, 2025): Hit guys! It definitely has something to do with the way the knowledge-list is populated, that also affects the **API** when using the **/api/chat/completions** endpoint with **collections.** The PGvector is well capable of handling 100k or more files and giving a response within a sec. with cousin similarity. @expruc did a good job with #https://github.com/open-webui/open-webui/pull/18292 by disabling single files at the knowledge listing. But it seems to be not yet all endpoints included as we still faced some large delay when testing the changes on the api and in the UI too.

GiteaMirror commented

2026-05-05 20:37:47 -05:00

@ka-admin commented on GitHub (Nov 19, 2025):

I noticed that I can't use OpenWebUI in Firefox or Chrome browser anymore because the collection's item count cross the critical point for a browser page memory limit. In Firefox I could select a collection to use in query (after a looooong wait) but I can't run the query itself - the text of a query just disappear after I press Enter or Go button and nothing happens. In Chrome I can't even load a collection - it just shows Out of Memory error.

@ka-admin commented on GitHub (Nov 19, 2025): I noticed that I can't use OpenWebUI in Firefox or Chrome browser anymore because the collection's item count cross the critical point for a browser page memory limit. In Firefox I could select a collection to use in query (after a looooong wait) but I can't run the query itself - the text of a query just disappear after I press Enter or Go button and nothing happens. In Chrome I can't even load a collection - it just shows Out of Memory error.

GiteaMirror commented

2026-05-05 20:37:47 -05:00

@deliciousbob commented on GitHub (Nov 19, 2025):

The PR from @expruc (https://github.com/open-webui/open-webui/pull/18328) would introduce a pagination / lazy load function for the Knowledge collections, that helps alot on loading the list of large collections.

@deliciousbob commented on GitHub (Nov 19, 2025): The PR from @expruc (https://github.com/open-webui/open-webui/pull/18328) would introduce a pagination / lazy load function for the Knowledge collections, that helps alot on loading the list of large collections.

GiteaMirror commented

2026-05-05 20:37:48 -05:00

@Classic298 commented on GitHub (Nov 19, 2025):

It can be reopened once the knowledge table migration was done. Otherwise his PR will not help.

His PR WOULD help, if he would only query e.g. 100 files.

But current API endpoints dont allow that.

We first need the knowledge file table migration and then we can look at that PR again (if someone reopens it) because then we actually have pagination options

@Classic298 commented on GitHub (Nov 19, 2025): It can be reopened once the knowledge table migration was done. Otherwise his PR will not help. His PR WOULD help, if he would only query e.g. 100 files. But current API endpoints dont allow that. We first need the knowledge file table migration and then we can look at that PR again (if someone reopens it) because then we actually have pagination options

GiteaMirror commented

2026-05-05 20:37:48 -05:00

@tjbck commented on GitHub (Dec 2, 2025):

Now that our kb table migration is complete, we just need to introduce proper pagination support alongside with frontend updates!

@tjbck commented on GitHub (Dec 2, 2025): Now that our kb table migration is complete, we just need to introduce proper pagination support alongside with frontend updates!

GiteaMirror commented

2026-05-05 20:37:49 -05:00

@Classic298 commented on GitHub (Dec 20, 2025):

should be fixed in dev now finally, pagination was introduced and performance is much better now

@Classic298 commented on GitHub (Dec 20, 2025): should be fixed in dev now finally, pagination was introduced and performance is much better now

GiteaMirror commented

2026-05-05 20:37:49 -05:00

@deliciousbob commented on GitHub (Jan 2, 2026):

Hi Everyone! I want to thank everyone that was involved for fixing this issue!
The fixes now work perfectly also for a huge amount of Documents. I'll do further tests, but until now it works very smooth.
Thx alot, you are doing great work!

@deliciousbob commented on GitHub (Jan 2, 2026): Hi Everyone! I want to thank everyone that was involved for fixing this issue! The fixes now work perfectly also for a huge amount of Documents. I'll do further tests, but until now it works very smooth. Thx alot, you are doing great work!

GiteaMirror commented

2026-05-05 20:37:50 -05:00

@deliciousbob commented on GitHub (Jan 14, 2026):

Hi Guys, thx again for all the changes, until now, I was not able to fully test everything yet, but finally i managed to reupload all files and update my productive setup. There is now a huge impovement to adding Knowledge collections to the chat.
I unfortunatelly noticed still a big delay in the retrieval process, not sure if any of you have the same issue as i do:

It felt it and i saw it now on the logs, that it takes very long for the first action - I dont see that long response time on the smaller Knowledge Collection though. If i directly request from my PGVectorDB, i get the list of junks within a second from that big knowledge Collection. So i guess that the request itself is still not fully optimized and still geathers all documents somehow, but i was not able to prove that yet, it is just a feeling. But as you see there is still a big delay in the request. (same accounts for API requests against the Knowledge collection, it also takes much longer then on smaler collections) Thx for your help!

@deliciousbob commented on GitHub (Jan 14, 2026): Hi Guys, thx again for all the changes, until now, I was not able to fully test everything yet, but finally i managed to reupload all files and update my productive setup. There is now a huge impovement to adding Knowledge collections to the chat. I unfortunatelly noticed **still a big delay** in the **retrieval process**, not sure if any of you have the same issue as i do: <img width="1412" height="697" alt="Image" src="https://github.com/user-attachments/assets/03ae27b1-49b9-40c7-b2f3-1d105e8030c0" /> It felt it and i saw it now on the logs, that it takes very long for the first action - I dont see that long response time on the smaller Knowledge Collection though. If i directly request from my PGVectorDB, i get the list of junks within a second from that big knowledge Collection. So i guess that the request itself is still not fully optimized and still geathers all documents somehow, but i was not able to prove that yet, it is just a feeling. But as you see there is still a big delay in the request. (same accounts for API requests against the Knowledge collection, it also takes much longer then on smaler collections) Thx for your help!

GiteaMirror commented

2026-05-05 20:37:51 -05:00

@ka-admin commented on GitHub (Jan 14, 2026):

i can confirm that the first time loading Knowledge has a delay (I see that in the monitoring software as OpenWebUp loading data from my nVMe intensively). But after that 'warm-up' everything works fine. It is unpleasant but not so critial because before the fix my browser runs into out-of-memory situation and I can't use RAG at all.

@ka-admin commented on GitHub (Jan 14, 2026): i can confirm that the first time loading Knowledge has a delay (I see that in the monitoring software as OpenWebUp loading data from my nVMe intensively). But after that 'warm-up' everything works fine. It is unpleasant but not so critial because before the fix my browser runs into out-of-memory situation and I can't use RAG at all.

GiteaMirror commented

2026-05-05 20:37:51 -05:00

@Classic298 commented on GitHub (Jan 14, 2026):

But after that 'warm-up' everything works fine. It is unpleasant but not so critial because before the fix my browser runs into out-of-memory situation and I can't use RAG at all.

Yeah that is.. that might just be unsolvable from Open WebUI's end.

If the data is on-disk and the Vector Database has to initialize it first by loading it into memory,... yeah. First time is slow, after that, much faster.

@Classic298 commented on GitHub (Jan 14, 2026): > But after that 'warm-up' everything works fine. It is unpleasant but not so critial because before the fix my browser runs into out-of-memory situation and I can't use RAG at all. Yeah that is.. that might just be unsolvable from Open WebUI's end. If the data is on-disk and the Vector Database has to initialize it first by loading it into memory,... yeah. First time is slow, after that, much faster.

GiteaMirror commented

2026-05-05 20:37:52 -05:00

@deliciousbob commented on GitHub (Jan 14, 2026):

Hi guys, thx for the quick reply, that makes total sense, i‘ll do further tests on the big Collection tomorrow and will try to monitor the request on the pgvector side too.

@deliciousbob commented on GitHub (Jan 14, 2026): Hi guys, thx for the quick reply, that makes total sense, i‘ll do further tests on the big Collection tomorrow and will try to monitor the request on the pgvector side too.

GiteaMirror commented

2026-05-05 20:37:53 -05:00

@deliciousbob commented on GitHub (Jan 16, 2026):

Hi guys, I've tested it again, I have constant delays of approx. 38 seconds every time I start the request on the 25K Document Knowledge collection. As i told you, requesting chunks on PGVector directly only takes a second on the same Collection.
:-( does the PGVector request maybe still have the old schema to retrieving chunks?
Thank you for your help!

@deliciousbob commented on GitHub (Jan 16, 2026): Hi guys, I've tested it again, I have constant **delays** of **approx. 38 seconds** every time I start the request on the 25K Document Knowledge collection. As i told you, requesting chunks on PGVector directly only takes a second on the same Collection. :-( does the PGVector request maybe still have the old schema to retrieving chunks? Thank you for your help!

GiteaMirror commented

2026-05-05 20:37:54 -05:00

@Classic298 commented on GitHub (Jan 16, 2026):

@deliciousbob are you talking about the original issue (loading files in workspace) or querying the KB?

If the latter, that was not part of this issue and you'd need to provide a LOT more information. Like index type, RAM usage of the database, if you can even load everything to memory or not and many many more aspects. Basically would need to know the full setup.

Especially because preciously you confirmed that querying pgvector was very fast - and absolutely nothing was changed in the code in that sense. So any sudden slowness is most likely due to configuration, deployment and other environment factors rather than Open WebUI.

@Classic298 commented on GitHub (Jan 16, 2026): @deliciousbob are you talking about the original issue (loading files in workspace) or querying the KB? If the latter, that was not part of this issue and you'd need to provide a LOT more information. Like index type, RAM usage of the database, if you can even load everything to memory or not and many many more aspects. Basically would need to know the full setup. Especially because preciously you confirmed that querying pgvector was very fast - and absolutely nothing was changed in the code in that sense. So any sudden slowness is most likely due to configuration, deployment and other environment factors rather than Open WebUI.

GiteaMirror commented

2026-05-05 20:37:56 -05:00

@deliciousbob commented on GitHub (Jan 16, 2026):

What i mentioned is that the load of the KB List on the Chat and on the Collection list is a massive improvement!
But the request against this large KC takes much longer then on small collections, so there must be something wrong as i max request for 30 junks.

Setup is the same as before, only with 25K Documents instead of 70K. Ram and CPU are not full at all.
As mentioned if I start a direct query (from pgadmin or node-red workflow) to the pgvector-db, i get the response of 30 junks within one second.
The same request on OWUI takes 38sec. Similar speed to what i saw in previous version on loading the KB List of files on the Chat.

@deliciousbob commented on GitHub (Jan 16, 2026): What i mentioned is that the load of the KB List on the Chat and on the Collection list is a massive improvement! But the request against this large KC takes much longer then on small collections, so there must be something wrong as i max request for 30 junks. Setup is the same as before, only with 25K Documents instead of 70K. Ram and CPU are not full at all. As mentioned if I start a direct query (from pgadmin or node-red workflow) to the pgvector-db, i get the response of 30 junks within one second. The same request on OWUI takes 38sec. Similar speed to what i saw in previous version on loading the KB List of files on the Chat.

GiteaMirror commented

2026-05-05 20:37:57 -05:00

@Classic298 commented on GitHub (Jan 16, 2026):

interesting

was it slow before too`?

@Classic298 commented on GitHub (Jan 16, 2026): interesting was it slow before too`?

GiteaMirror commented

2026-05-05 20:37:57 -05:00

@deliciousbob commented on GitHub (Jan 16, 2026):

As far as I can remember, it got that bad after I imported the 70K documents.
I had even timeouts on the browser when loading the Knowledge List on the Chat window.
With your changes, working with Collections is now very smooth, but the request itself is still slow, as it was before.
With smaller collections i get an answer after 10–15 seconds. With the 25K doc collections, it takes around a minute.

I did some requests, all with similar load time. Here are the container logs:

`2026-01-16 18:40:09.280 | DEBUG | aiocache.base:set:280 - SET <function at 0x72abb40dc900> 1 (0.0000)s

2026-01-16 18:40:09.330 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 10.0.0.2:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200

2026-01-16 18:40:09.379 | DEBUG | open_webui.retrieval.utils:get_sources_from_items:948 - items: [{'type': 'collection', 'id': 'cb869842-3b92-4a16-866b-9b440728c1bd', 'user_id': '9c7aa294-9270-4c84-84a3-86714ea301cc', 'name': 'JIRA IT-HELP', 'description': 'JIRA IT-HELP', 'meta': None, 'created_at': 1764672683, 'updated_at': 1767619492, 'write_access': True, 'status': 'processed'}] ['VPN error you are not allowed to access troubleshooting', 'VPN access denied error solutions', 'VPN connection worked yesterday but not today fix'] <function chat_completion_files_handler.. at 0x72ab1ea493a0> <function chat_completion_files_handler.. at 0x72ab28f75ee0> False

2026-01-16 18:40:09.382 | DEBUG | open_webui.retrieval.utils:query_collection_with_hybrid_search:477 - query_collection_with_hybrid_search:VECTOR_DB_CLIENT.get:collection cb869842-3b92-4a16-866b-9b440728c1bd

2026-01-16 18:40:36.051 | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:487 - Starting hybrid search for 3 queries in 1 collections...

2026-01-16 18:40:36.052 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd

2026-01-16 18:40:40.809 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd

2026-01-16 18:40:46.770 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd`

@deliciousbob commented on GitHub (Jan 16, 2026): As far as I can remember, it got that bad after I imported the 70K documents. I had even timeouts on the browser when loading the Knowledge List on the Chat window. With your changes, working with Collections is now very smooth, but the request itself is still slow, as it was before. With smaller collections i get an answer after 10–15 seconds. With the 25K doc collections, it takes around a minute. **I did some requests, all with similar load time. Here are the container logs:** `2026-01-16 18:40:09.280 | DEBUG | aiocache.base:set:280 - SET <function <lambda> at 0x72abb40dc900> 1 (0.0000)s 2026-01-16 18:40:09.330 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 10.0.0.2:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 2026-01-16 18:40:09.379 | DEBUG | **open_webui.retrieval.utils:get_sources_from_items:948 - items: [{'type': 'collection', 'id': 'cb869842-3b92-4a16-866b-9b440728c1bd', 'user_id': '9c7aa294-9270-4c84-84a3-86714ea301cc'**, 'name': 'JIRA IT-HELP', 'description': 'JIRA IT-HELP', 'meta': None, 'created_at': 1764672683, 'updated_at': 1767619492, 'write_access': True, 'status': 'processed'}] ['VPN error you are not allowed to access troubleshooting', 'VPN access denied error solutions', 'VPN connection worked yesterday but not today fix'] <function chat_completion_files_handler.<locals>.<lambda> at 0x72ab1ea493a0> <function chat_completion_files_handler.<locals>.<lambda> at 0x72ab28f75ee0> False 2026-01-16 **18:40:09.382** | DEBUG | open_webui.retrieval.utils:query_collection_with_hybrid_search:477 - query_collection_with_hybrid_search:VECTOR_DB_CLIENT.get:collection cb869842-3b92-4a16-866b-9b440728c1bd 2026-01-16 **18:40:36.051** | INFO | open_webui.retrieval.utils:query_collection_with_hybrid_search:487 - Starting hybrid search for 3 queries in 1 collections... 2026-01-16 18:40:36.052 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd 2026-01-16 18:40:40.809 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd 2026-01-16 18:40:46.770 | DEBUG | open_webui.retrieval.utils:query_doc_with_hybrid_search:241 - query_doc_with_hybrid_search:doc cb869842-3b92-4a16-866b-9b440728c1bd`

GiteaMirror commented

2026-05-05 20:37:58 -05:00

@Classic298 commented on GitHub (Jan 16, 2026):

Root Cause Analysis

After investigating the logs, the performance bottleneck may have been identified:

The ~27 second delay occurs in the VECTOR_DB_CLIENT.get() call which fetches ALL 25K documents from PGVector before any search can begin.

18:40:09.382 - query_collection_with_hybrid_search:VECTOR_DB_CLIENT.get:collection 18:40:36.051 - Starting hybrid search for 3 queries in 1 collections...

Why This Happens

You most definitely use hybrid search, no?

Open WebUI's hybrid search uses BM25 (lexical) + vector search. The BM25 component requires loading all document text into memory to build an in-memory index on every query. This is fundamentally different from a pure vector search, which uses database-native indexes (HNSW/IVFFlat) and only returns top-K results.

Direct PGVector queries are fast (~1 second) because they use indexed vector similarity search. Open WebUI must load all docs for BM25.

Potential Solutions

In-memory caching - Cache the GetResult (collection data) in memory per collection, invalidating when documents are added/removed. This would make subsequent queries instant. BUT: initial queries will still be slow and there is massive memory overhead to keep everything in memory. And if a single file changes, that invalidates the cache.
PostgreSQL full-text search - Replace in-memory BM25 with tsvector/tsquery for lexical search, using database-native indexing.
Pre-built BM25 indexes - Serialize and store BM25 indexes, rebuilding only when the collection changes.
Disable hybrid for large collections - Add a threshold (e.g., 10K docs) to skip BM25 and use pure vector search.

Workaround (Now)

If you disable hybrid search, it should skip the BM25 component entirely and use pure vector search, which won't require loading all documents.

@deliciousbob

@Classic298 commented on GitHub (Jan 16, 2026): ## Root Cause Analysis After investigating the logs, the performance bottleneck may have been identified: **The ~27 second delay occurs in the `VECTOR_DB_CLIENT.get()` call** which fetches **ALL 25K documents** from PGVector before any search can begin. 18:40:09.382 - query_collection_with_hybrid_search:VECTOR_DB_CLIENT.get:collection 18:40:36.051 - Starting hybrid search for 3 queries in 1 collections... ## Why This Happens You most definitely use hybrid search, no? Open WebUI's hybrid search uses BM25 (lexical) + vector search. The BM25 component requires loading all document text into memory to build an in-memory index on every query. This is fundamentally different from a pure vector search, which uses database-native indexes (HNSW/IVFFlat) and only returns top-K results. Direct PGVector queries are fast (~1 second) because they use indexed vector similarity search. Open WebUI must load all docs for BM25. ### Potential Solutions 1. **In-memory caching** - Cache the `GetResult` (collection data) in memory per collection, invalidating when documents are added/removed. This would make subsequent queries instant. BUT: initial queries will still be slow and there is massive memory overhead to keep everything in memory. And if a single file changes, that invalidates the cache. 2. **PostgreSQL full-text search** - Replace in-memory BM25 with `tsvector`/`tsquery` for lexical search, using database-native indexing. 3. **Pre-built BM25 indexes** - Serialize and store BM25 indexes, rebuilding only when the collection changes. 4. **Disable hybrid for large collections** - Add a threshold (e.g., 10K docs) to skip BM25 and use pure vector search. ### Workaround (Now) If you disable hybrid search, it should skip the BM25 component entirely and use pure vector search, which won't require loading all documents. @deliciousbob

GiteaMirror commented

2026-05-05 20:37:58 -05:00

@deliciousbob commented on GitHub (Jan 17, 2026):

Hi @Classic298
Thanks for the tip; BM25 seems to be the problem indeed. I've disabled hybrid search, and the response is generated within 1-2 seconds then.

But disabling hybrid search comes with a downside too. I lose the option to do reranking.
I've configured BGE-reranker-v2-m3 from an external vLLM-API to narrow down the retrieval junks from 30 to the most relevant 10. This worked best for me and enhanced the quality of the retrieval a lot.

The relatively new extension to RAG is the "enrich hybrid search text" which seems to be also relying on BM25 too. According to the comment, it enriches the retrieval with the document title and adds additional context to the BM25 lexical recall.

Is there a way to disable BM25 when reranking is enabled from an external provider?
Thx for your help!

@deliciousbob commented on GitHub (Jan 17, 2026): Hi @Classic298 Thanks for the tip; BM25 seems to be the problem indeed. I've disabled hybrid search, and the response is generated within 1-2 seconds then. But disabling hybrid search comes with a downside too. I lose the option to do reranking. I've configured BGE-reranker-v2-m3 from an external vLLM-API to narrow down the retrieval junks from 30 to the most relevant 10. This worked best for me and enhanced the quality of the retrieval a lot. The relatively new extension to RAG is the "enrich hybrid search text" which seems to be also relying on BM25 too. According to the comment, it enriches the retrieval with the document title and adds additional context to the BM25 lexical recall. Is there a way to disable BM25 when reranking is enabled from an external provider? Thx for your help!

GiteaMirror commented

2026-05-05 20:37:59 -05:00

@Classic298 commented on GitHub (Jan 17, 2026):

@deliciousbob the only solution is to use the Database's native hybrid search instead of the Open WebUI-native solution.

Would require some implementation work for every single DB, except Chroma DB -> https://github.com/open-webui/open-webui/discussions/20737

@Classic298 commented on GitHub (Jan 17, 2026): @deliciousbob the only solution is to use the Database's native hybrid search instead of the Open WebUI-native solution. Would require some implementation work for every single DB, except Chroma DB -> https://github.com/open-webui/open-webui/discussions/20737

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#57125