mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #19884] feat: Limit files with a large number of chunks #34556
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tomasloksa on GitHub (Dec 11, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19884
Check Existing Issues
Verify Feature Scope
Problem Description
Uploading files with a large number of chunks, that are not necessarily large (let's say 1.5MB) but having a lot of characters/text, like log files, json etc. created a huge number of chunks (25000).
I'm using external embedding model: text-embedding-3-small, so it managed to process it all, but the problem appears when saving the embeddings into the vector DB. There it often caused the container to run out of memory and get killed, or just becoming unresponsive. Originally i was using ChromaDB in Azure File Share, But even after migrating to PGVector, the issue still persists (altough it's generally better).
Desired Solution you'd like
A simple solution would be setting a max chunk limit for a file - MAX_CHUNKS_PER_FILE env variable or ui setting instead of the max file size.
A 5MB pdf document can have much less chunks that a 1.5MB config file, so i don't want to limit the file size.
Alternatives Considered
I migrated from ChromaDB in file share to PGvector, which helped a bit
Additional Context
No response
@owui-terminator[bot] commented on GitHub (Dec 11, 2025):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#19595 feat: Intelligent Minimum Chunk Merging (Token & Character Thresholds) to Prevent RAG Fragmentation
by Classic298 • Nov 29, 2025
#19665 feat: use vert to avoid file upload
by surak • Dec 01, 2025
#3065 Quantized Embeddings
by snadeem1362 • Jun 12, 2024
💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@Classic298 commented on GitHub (Dec 11, 2025):
I don't understand - what would this max chunk size per file entail? Do you just want to not process part of the file if it results in too many chunks? Throw away the content?