mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[GH-ISSUE #16529] feat: Markdown Chunkers #17945
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Sank-WoT on GitHub (Aug 12, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/16529
Check Existing Issues
Problem Description
"Is there a way to include and pass the context of Markdown headers from metadata when chunking documents for retrieval? Specifically, I'm looking into how header hierarchy (like H1, H2, etc.) can be preserved and used to enrich chunk context during RAG. I checked the middleware code here — https://github.com/open-webui/open-webui/blob/438e5d966f0f64f9ea3feab22724a5bd96a4127b/backend/open_webui/utils/middleware.py#L967 L980— but couldn't find any implementation related to extracting or transmitting header metadata for chunks. Is this supported, or are there plans to include such functionality?"
Link on hedings list
438e5d966f/backend/open_webui/routers/retrieval.py (L1220)Desired Solution you'd like
Add in context headings
@rgaricano commented on GitHub (Aug 12, 2025):
yes, it's suported, adminSettings/Documents/Text Splitter: Markdown (Header)
@Sank-WoT commented on GitHub (Aug 13, 2025):
Sorry, but I don't see the header hierarchy being passed to the prompt
@rgaricano commented on GitHub (Aug 13, 2025):
No, isn't passed to prompt, its are processed when the file is embedded/vectorized
438e5d966f/backend/open_webui/routers/retrieval.py (L1190-L1235)Edit: I just show your other issue https://github.com/open-webui/open-webui/issues/16558