[GH-ISSUE #16529] feat: Markdown Chunkers #17945

Closed
opened 2026-04-19 23:50:47 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @Sank-WoT on GitHub (Aug 12, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/16529

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

"Is there a way to include and pass the context of Markdown headers from metadata when chunking documents for retrieval? Specifically, I'm looking into how header hierarchy (like H1, H2, etc.) can be preserved and used to enrich chunk context during RAG. I checked the middleware code here — https://github.com/open-webui/open-webui/blob/438e5d966f0f64f9ea3feab22724a5bd96a4127b/backend/open_webui/utils/middleware.py#L967 L980— but couldn't find any implementation related to extracting or transmitting header metadata for chunks. Is this supported, or are there plans to include such functionality?"

Link on hedings list 438e5d966f/backend/open_webui/routers/retrieval.py (L1220)

Desired Solution you'd like

Add in context headings

Originally created by @Sank-WoT on GitHub (Aug 12, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/16529 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description "Is there a way to include and pass the context of Markdown headers from metadata when chunking documents for retrieval? Specifically, I'm looking into how header hierarchy (like H1, H2, etc.) can be preserved and used to enrich chunk context during RAG. I checked the middleware code here — [https://github.com/open-webui/open-webui/blob/438e5d966f0f64f9ea3feab22724a5bd96a4127b/backend/open_webui/utils/middleware.py#L967 ](https://github.com/open-webui/open-webui/blob/438e5d966f0f64f9ea3feab22724a5bd96a4127b/backend/open_webui/utils/middleware.py#L967) L980— but couldn't find any implementation related to extracting or transmitting header metadata for chunks. Is this supported, or are there plans to include such functionality?" Link on hedings list https://github.com/open-webui/open-webui/blob/438e5d966f0f64f9ea3feab22724a5bd96a4127b/backend/open_webui/routers/retrieval.py#L1220 ### Desired Solution you'd like Add in context headings
Author
Owner

@rgaricano commented on GitHub (Aug 12, 2025):

yes, it's suported, adminSettings/Documents/Text Splitter: Markdown (Header)

<!-- gh-comment-id:3180230183 --> @rgaricano commented on GitHub (Aug 12, 2025): yes, it's suported, adminSettings/Documents/Text Splitter: Markdown (Header)
Author
Owner

@Sank-WoT commented on GitHub (Aug 13, 2025):

Sorry, but I don't see the header hierarchy being passed to the prompt

<!-- gh-comment-id:3182358554 --> @Sank-WoT commented on GitHub (Aug 13, 2025): Sorry, but I don't see the header hierarchy being passed to the prompt
Author
Owner

@rgaricano commented on GitHub (Aug 13, 2025):

No, isn't passed to prompt, its are processed when the file is embedded/vectorized

438e5d966f/backend/open_webui/routers/retrieval.py (L1190-L1235)

Edit: I just show your other issue https://github.com/open-webui/open-webui/issues/16558

<!-- gh-comment-id:3182555298 --> @rgaricano commented on GitHub (Aug 13, 2025): No, isn't passed to prompt, its are processed when the file is embedded/vectorized https://github.com/open-webui/open-webui/blob/438e5d966f0f64f9ea3feab22724a5bd96a4127b/backend/open_webui/routers/retrieval.py#L1190-L1235 Edit: I just show your other issue https://github.com/open-webui/open-webui/issues/16558
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#17945