mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-05 18:38:17 -05:00
[GH-ISSUE #5762] Support for Additional Token Splitting Strategies in Text Preprocessing #14115
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @sir3mat on GitHub (Sep 27, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/5762
Currently, OpenWebUI RAG's text preprocessing step is limited to using the RecursiveCharacterTextSplitter. However, the resulting text chunks are not always optimal for all use cases, which can lead to suboptimal processing and analysis outcomes.
Would it be feasible to introduce support for configurable token-splitting strategies? This would enable users to select or define more efficient ways to split text based on their specific needs. Additionally, offering the option to integrate a custom API for text splitting during the document upload process through the UI would provide greater flexibility.
What are your thoughts on this proposal?