[GH-ISSUE #19884] feat: Limit files with a large number of chunks #57693

Closed
opened 2026-05-05 21:24:50 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @tomasloksa on GitHub (Dec 11, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19884

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

Uploading files with a large number of chunks, that are not necessarily large (let's say 1.5MB) but having a lot of characters/text, like log files, json etc. created a huge number of chunks (25000).

Image

I'm using external embedding model: text-embedding-3-small, so it managed to process it all, but the problem appears when saving the embeddings into the vector DB. There it often caused the container to run out of memory and get killed, or just becoming unresponsive. Originally i was using ChromaDB in Azure File Share, But even after migrating to PGVector, the issue still persists (altough it's generally better).

Desired Solution you'd like

A simple solution would be setting a max chunk limit for a file - MAX_CHUNKS_PER_FILE env variable or ui setting instead of the max file size.

A 5MB pdf document can have much less chunks that a 1.5MB config file, so i don't want to limit the file size.

Alternatives Considered

I migrated from ChromaDB in file share to PGvector, which helped a bit

Additional Context

No response

Originally created by @tomasloksa on GitHub (Dec 11, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19884 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description Uploading files with a large number of chunks, that are not necessarily large (let's say 1.5MB) but having a lot of characters/text, like log files, json etc. created a huge number of chunks (25000). <img width="785" height="603" alt="Image" src="https://github.com/user-attachments/assets/16cbf799-82ab-4cca-9efb-6c86d1c774ba" /> I'm using external embedding model: text-embedding-3-small, so it managed to process it all, but the problem appears when saving the embeddings into the vector DB. There it often caused the container to run out of memory and get killed, or just becoming unresponsive. Originally i was using ChromaDB in Azure File Share, But even after migrating to PGVector, the issue still persists (altough it's generally better). ### Desired Solution you'd like A simple solution would be setting a max chunk limit for a file - MAX_CHUNKS_PER_FILE env variable or ui setting instead of the max file size. A 5MB pdf document can have much less chunks that a 1.5MB config file, so i don't want to limit the file size. ### Alternatives Considered I migrated from ChromaDB in file share to PGvector, which helped a bit ### Additional Context _No response_
Author
Owner

@owui-terminator[bot] commented on GitHub (Dec 11, 2025):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

  1. #19595 feat: Intelligent Minimum Chunk Merging (Token & Character Thresholds) to Prevent RAG Fragmentation
    by Classic298 • Nov 29, 2025

  2. #19665 feat: use vert to avoid file upload
    by surak • Dec 01, 2025

  3. #3065 Quantized Embeddings
    by snadeem1362 • Jun 12, 2024


💡 Tips:

  • If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
  • If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

<!-- gh-comment-id:3642414085 --> @owui-terminator[bot] commented on GitHub (Dec 11, 2025): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#19595](https://github.com/open-webui/open-webui/issues/19595) **feat: Intelligent Minimum Chunk Merging (Token & Character Thresholds) to Prevent RAG Fragmentation** *by Classic298 • Nov 29, 2025* 2. [#19665](https://github.com/open-webui/open-webui/issues/19665) **feat: use vert to avoid file upload** *by surak • Dec 01, 2025* 3. [#3065](https://github.com/open-webui/open-webui/issues/3065) **Quantized Embeddings** *by snadeem1362 • Jun 12, 2024* --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
Author
Owner

@Classic298 commented on GitHub (Dec 11, 2025):

I don't understand - what would this max chunk size per file entail? Do you just want to not process part of the file if it results in too many chunks? Throw away the content?

<!-- gh-comment-id:3643350129 --> @Classic298 commented on GitHub (Dec 11, 2025): I don't understand - what would this max chunk size per file entail? Do you just want to not process part of the file if it results in too many chunks? Throw away the content?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#57693