mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[PR #21488] enh: add backward merge for undersized chunks in merge_docs_to_target_size #26101
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/21488
Author: @Classic298
Created: 2/16/2026
Status: 🔄 Open
Base:
dev← Head:claude/improve-issue-efficiency-hB7AF📝 Commits (6)
d05301dfix: add backward merge for undersized chunks in merge_docs_to_target_size03dc985test: add test document generator for Bug 3 backward merge validation28ffa67Delete test_merge_bug3.mdf8364b8Delete generate_test_doc.py777fa6cchore: convert merge_docs_to_target_size to asyncfc722a6fix: revert merge_docs_to_target_size to sync📊 Changes
1 file changed (+48 additions, -10 deletions)
View changed files
📝
backend/open_webui/routers/retrieval.py(+48 -10)📄 Description
When a tiny chunk (e.g. an isolated heading line) sits between two large chunks, the forward-only merge strategy cannot absorb it: the preceding chunk is already above min_chunk_size_target, and the following chunk is too large for the combined size to fit within max_chunk_size. This leaves the tiny fragment as a standalone chunk.
Add a backward merge pass: before emitting an undersized chunk that failed to merge forward, attempt to append it to the previously emitted chunk (respecting source/file boundaries and max size). This also handles the case where the last chunk in the sequence is undersized.
Addresses #21486
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.