[PR #21105] [MERGED] fix: prevent RuntimeError in process_metadata when excluding keys #25931

Closed
opened 2026-04-20 06:12:57 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/21105
Author: @Classic298
Created: 2/2/2026
Status: Merged
Merged: 2/13/2026
Merged by: @tjbck

Base: devHead: patch-3


📝 Commits (1)

📊 Changes

1 file changed (+10 additions, -10 deletions)

View changed files

📝 backend/open_webui/retrieval/vector/utils.py (+10 -10)

📄 Description

Fixed process_metadata function that would raise RuntimeError: dictionary changed size during iteration when metadata contained any of the excluded keys (content, pages, tables, paragraphs, sections, figures).

The function was deleting keys while iterating over dict.items(), which invalidates the iterator in Python 3. Now builds a new dict instead of mutating the original.

Why this has never been reported

This bug has never surfaced in practice due to how the code paths are structured:

  1. Multitenancy mode users - The multitenancy Milvus client (and others) doesn't call process_metadata at all, passing metadata directly without processing.

  2. Standard mode users - Before metadata reaches process_metadata, it passes through filter_metadata first (in process_file), which already removes the excluded keys. By the time process_metadata is called, those keys are already gone.

  3. Document loaders - The custom loaders (MinerU, Mistral OCR, Datalab Marker, Docling, Tika) don't include these keys in their metadata output. Only certain LangChain loaders like AzureAIDocumentIntelligenceLoader might return them, but they get filtered upstream.

Why it should still be fixed

Despite being unreachable in current code paths, this is a latent bug that could cause failures if:

  • A new code path calls process_metadata directly without prior filtering
  • A new document loader returns metadata with these keys
  • Someone refactors the existing flow and removes the upstream filter_metadata call

Fixing it ensures the function works correctly in isolation, as its signature and docstring imply it should.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/21105 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 2/2/2026 **Status:** ✅ Merged **Merged:** 2/13/2026 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `patch-3` --- ### 📝 Commits (1) - [`119c229`](https://github.com/open-webui/open-webui/commit/119c229b53432c012f38e072e5195396aed47f96) Update utils.py ### 📊 Changes **1 file changed** (+10 additions, -10 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/vector/utils.py` (+10 -10) </details> ### 📄 Description Fixed `process_metadata` function that would raise `RuntimeError: dictionary changed size during iteration` when metadata contained any of the excluded keys (`content`, `pages`, `tables`, `paragraphs`, `sections`, `figures`). <ins>**The function was deleting keys while iterating over `dict.items()`, which invalidates the iterator in Python 3. Now builds a new dict instead of mutating the original.**</ins> ## Why this has never been reported This bug has never surfaced in practice due to how the code paths are structured: 1. **Multitenancy mode users** - The multitenancy Milvus client (and others) doesn't call `process_metadata` at all, passing metadata directly without processing. 2. **Standard mode users** - <ins>**Before metadata reaches `process_metadata`, it passes through `filter_metadata` first (in `process_file`), which already removes the excluded keys. By the time `process_metadata` is called, those keys are already gone.**</ins> 3. **Document loaders** - The custom loaders (MinerU, Mistral OCR, Datalab Marker, Docling, Tika) don't include these keys in their metadata output. <ins>**Only certain LangChain loaders like `AzureAIDocumentIntelligenceLoader` might return them, but they get filtered upstream.**</ins> ## Why it should still be fixed <ins>**Despite being unreachable in current code paths, this is a latent bug that could cause failures**</ins> if: - A new code path calls `process_metadata` directly without prior filtering - A new document loader returns metadata with these keys - Someone refactors the existing flow and removes the upstream `filter_metadata` call Fixing it ensures the function works correctly in isolation, as its signature and docstring imply it should. ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 06:12:57 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#25931