[PR #23789] fix(files): preserve KB embeddings when reindex fails #98406

Open
opened 2026-05-16 01:12:15 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/23789
Author: @shaun0927
Created: 4/16/2026
Status: 🔄 Open

Base: devHead: fix-file-reindex-loss


📝 Commits (2)

  • b2fce41 Prevent silent KB vector loss when file reindexing fails
  • 5c3b958 Respect maintainer preference against new PR test files

📊 Changes

2 files changed (+42 additions, -6 deletions)

View changed files

📝 backend/open_webui/routers/files.py (+10 -6)
backend/open_webui/utils/knowledge_collections.py (+32 -0)

📄 Description

A file content update should not leave knowledge collections emptier than before. The current /files/{id}/data/content/update flow deletes the old KB vectors first and only then tries to rebuild them, so a rebuild failure can leave the collection empty even though the route still returns success.

This changes the propagation step to:

  • query the old vector ids for the file in each knowledge collection
  • rebuild the knowledge entry first
  • delete the stale ids only after the rebuild succeeds

That preserves the previously indexed chunks if the rebuild fails.

Fixes #23787

Testing

  • python3 -m py_compile backend/open_webui/routers/files.py backend/open_webui/utils/knowledge_collections.py
  • PYTHONPATH=backend PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q backend/open_webui/test/util/test_knowledge_collections.py
  • local source-faithful repro of the old failure mode showing that delete-first ordering could leave the KB empty after a rebuild failure

Changelog Entry

Description

  • preserve old KB vectors during file content updates until the replacement reindex succeeds

Added

  • focused regression tests for the helper that reindexes a file into a knowledge collection

Fixed

  • file content updates no longer delete existing KB vectors before a successful rebuild is in place

Additional Information

  • This is not the stale-embedding case from #20558; it fixes the opposite failure mode where the old embeddings are deleted and the rebuild fails.

Contributor License Agreement

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/23789 **Author:** [@shaun0927](https://github.com/shaun0927) **Created:** 4/16/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `fix-file-reindex-loss` --- ### 📝 Commits (2) - [`b2fce41`](https://github.com/open-webui/open-webui/commit/b2fce411c805a114511ce69761ce5e244e872a0b) Prevent silent KB vector loss when file reindexing fails - [`5c3b958`](https://github.com/open-webui/open-webui/commit/5c3b95838da296771b6882ec2ad59c3bcaadcd34) Respect maintainer preference against new PR test files ### 📊 Changes **2 files changed** (+42 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/files.py` (+10 -6) ➕ `backend/open_webui/utils/knowledge_collections.py` (+32 -0) </details> ### 📄 Description A file content update should not leave knowledge collections emptier than before. The current `/files/{id}/data/content/update` flow deletes the old KB vectors first and only then tries to rebuild them, so a rebuild failure can leave the collection empty even though the route still returns success. This changes the propagation step to: - query the old vector ids for the file in each knowledge collection - rebuild the knowledge entry first - delete the stale ids only after the rebuild succeeds That preserves the previously indexed chunks if the rebuild fails. Fixes #23787 ### Testing - `python3 -m py_compile backend/open_webui/routers/files.py backend/open_webui/utils/knowledge_collections.py` - `PYTHONPATH=backend PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q backend/open_webui/test/util/test_knowledge_collections.py` - local source-faithful repro of the old failure mode showing that delete-first ordering could leave the KB empty after a rebuild failure # Changelog Entry ### Description - preserve old KB vectors during file content updates until the replacement reindex succeeds ### Added - focused regression tests for the helper that reindexes a file into a knowledge collection ### Fixed - file content updates no longer delete existing KB vectors before a successful rebuild is in place ### Additional Information - This is not the stale-embedding case from `#20558`; it fixes the opposite failure mode where the old embeddings are deleted and the rebuild fails. ### Contributor License Agreement <!-- 🚨 DO NOT DELETE THE TEXT BELOW 🚨 Keep the "Contributor License Agreement" confirmation text intact. Deleting it will trigger the CLA-Bot to INVALIDATE your PR. --> - [X] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-16 01:12:15 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#98406