[GH-ISSUE #22158] feat: dynamic chunk size and overlap values #58311

Closed
opened 2026-05-05 22:52:19 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @rawaha-e on GitHub (Mar 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22158

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

OpenWebUI currently only supports fixed CHUNK_SIZE and CHUNK_OVERLAP values when processing documents. This works for standard documents but is problematic in practice:

  • Small documents get over-chunked into many tiny segments
  • Larger documents (tens of thousands of tokens) generate hundreds of chunks, causing slowdowns, especially when full context mode is enabled

Desired Solution you'd like

Implement dynamic chunk sizing based on document length or token count as following:

  • Pre-tokenize the document to determine its size.
  • Adjust CHUNK_SIZE and CHUNK_OVERLAP according to document length.

Alternatives Considered

No response

Additional Context

No response

Originally created by @rawaha-e on GitHub (Mar 2, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22158 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description OpenWebUI currently only supports fixed CHUNK_SIZE and CHUNK_OVERLAP values when processing documents. This works for standard documents but is problematic in practice: - Small documents get over-chunked into many tiny segments - Larger documents (tens of thousands of tokens) generate hundreds of chunks, causing slowdowns, especially when full context mode is enabled ### Desired Solution you'd like Implement dynamic chunk sizing based on document length or token count as following: * Pre-tokenize the document to determine its size. * Adjust CHUNK_SIZE and CHUNK_OVERLAP according to document length. ### Alternatives Considered _No response_ ### Additional Context _No response_
Author
Owner

@Classic298 commented on GitHub (Mar 2, 2026):

why not use markdown header splitting with minimum chunk size merging? that's exactly what you're describing

<!-- gh-comment-id:3986383285 --> @Classic298 commented on GitHub (Mar 2, 2026): why not use markdown header splitting with minimum chunk size merging? that's exactly what you're describing
Author
Owner

@rawaha-e commented on GitHub (Mar 2, 2026):

why not use markdown header splitting with minimum chunk size merging? that's exactly what you're describing

@Classic298 There are documents where markdown headers are not present, for example if I want to process an unstructured table. My feature request makes more sense when using character splitting.

<!-- gh-comment-id:3986443528 --> @rawaha-e commented on GitHub (Mar 2, 2026): > why not use markdown header splitting with minimum chunk size merging? that's exactly what you're describing @Classic298 There are documents where markdown headers are not present, for example if I want to process an unstructured table. My feature request makes more sense when using character splitting.
Author
Owner

@Classic298 commented on GitHub (Mar 2, 2026):

aha in this case this should be discussed first this would be a massive feature

<!-- gh-comment-id:3986448682 --> @Classic298 commented on GitHub (Mar 2, 2026): aha in this case this should be discussed first this would be a massive feature
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58311