mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[PR #21520] [CLOSED] fix: preserve header metadata in markdown splitter #41747
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/21520
Author: @Baireinhold
Created: 2/17/2026
Status: ❌ Closed
Base:
main← Head:fix/markdown-header-metadata📝 Commits (1)
91507effix: preserve header metadata in markdown splitter📊 Changes
1 file changed (+1 additions, -1 deletions)
View changed files
📝
backend/open_webui/routers/retrieval.py(+1 -1)📄 Description
Summary
Fixes header metadata loss in
MarkdownHeaderTextSplitteroutput.Relates to #21486 (Bug 1).
Change
MarkdownHeaderTextSplitter.split_text()returns chunks with metadata containing the header hierarchy, e.g.:Currently, only the parent document's metadata is preserved:
This PR merges both:
Impact
ENABLE_MARKDOWN_HEADER_TEXT_SPLITTERis disabledTesting
Before: all chunks have identical metadata (parent doc only)
After: each chunk includes
Header 1,Header 2, etc. from its section🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.