mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #12059] feat: Add an embedding option that specifies document separator characters #31984
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @fgfg54321 on GitHub (Mar 26, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12059
Check Existing Issues
Problem Description
Add an embedding option that specifies document separator characters
Desired Solution you'd like
In retrieval. py files the RecursiveCharacterTextSplitter separators options can be set environment variables or the background It can be more flexible processing document segmentation
now i manually modify the code here
separators_str = "\n\n\n\n,.,。"
separators = [s.strip() for s in separators_str.split(",") if s.strip()]
text_splitter = RecursiveCharacterTextSplitter(
separators=separators, # New separator parameter
chunk_size=request.app.state.config.CHUNK_SIZE,
chunk_overlap=request.app.state.config.CHUNK_OVERLAP,
add_start_index=True,
)
Alternatives Considered
No response
Additional Context
No response