[GH-ISSUE #5762] Support for Additional Token Splitting Strategies in Text Preprocessing #52781

Closed
opened 2026-05-05 13:54:01 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @sir3mat on GitHub (Sep 27, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/5762

Currently, OpenWebUI RAG's text preprocessing step is limited to using the RecursiveCharacterTextSplitter. However, the resulting text chunks are not always optimal for all use cases, which can lead to suboptimal processing and analysis outcomes.

Would it be feasible to introduce support for configurable token-splitting strategies? This would enable users to select or define more efficient ways to split text based on their specific needs. Additionally, offering the option to integrate a custom API for text splitting during the document upload process through the UI would provide greater flexibility.

What are your thoughts on this proposal?

Originally created by @sir3mat on GitHub (Sep 27, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/5762 Currently, OpenWebUI RAG's text preprocessing step is limited to using the RecursiveCharacterTextSplitter. However, the resulting text chunks are not always optimal for all use cases, which can lead to suboptimal processing and analysis outcomes. Would it be feasible to introduce support for configurable token-splitting strategies? This would enable users to select or define more efficient ways to split text based on their specific needs. Additionally, offering the option to integrate a custom API for text splitting during the document upload process through the UI would provide greater flexibility. What are your thoughts on this proposal?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#52781