[PR #21733] [CLOSED] fix: deduplicate RAG content injection when builtin tools execute #65105

Closed
opened 2026-05-06 10:53:02 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/21733
Author: @Classic298
Created: 2/22/2026
Status: Closed

Base: devHead: fix-rag


📝 Commits (3)

  • ad9a98e fix: deduplicate RAG content injection when builtin tools execute
  • 8b26b72 Update middleware.py
  • 0460fa7 Update middleware.py

📊 Changes

1 file changed (+96 additions, -8 deletions)

View changed files

📝 backend/open_webui/utils/middleware.py (+96 -8)

📄 Description

Fixes: https://github.com/open-webui/open-webui/issues/21726

fix: deduplicate RAG content injection when builtin tools execute

When builtin tools (view_knowledge_file, query_knowledge_files, search_web, fetch_url) execute during a chat, the model receives the same content multiple times:

  • With file context OFF: content appears 2x (post-tool RAG source injection into the user/system message + tool result message)
  • With file context ON: content appears up to 3x (initial RAG injection + post-tool RAG injection + tool result message)

The fix addresses duplication via two mechanisms while preserving both citation functionality and agentic tool compatibility:

  1. Reference-only RAG injection: After a tool call, the post-tool RAG source tags carry only the source ID and name for citation mapping, not the full content (which is already in the tool result message). This preserves the [1], [2] citation format while avoiding content duplication.

  2. Source deduplication: After the initial RAG pass (when file context is enabled), all injected source/file IDs are tracked. When tools return citation sources already present from the initial pass, they are filtered out before post-tool RAG injection. Newly injected sources are tracked for subsequent tool call rounds.

Additionally, the repeated inline list of citation tool names is extracted into a module-level CITATION_TOOL_NAMES constant, and the repeated source ID collection logic is extracted into a collect_source_ids helper function.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/21733 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 2/22/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix-rag` --- ### 📝 Commits (3) - [`ad9a98e`](https://github.com/open-webui/open-webui/commit/ad9a98e7b6da8fe8b3dd6e63ebaf24ee32cefeea) fix: deduplicate RAG content injection when builtin tools execute - [`8b26b72`](https://github.com/open-webui/open-webui/commit/8b26b72aa9a89555e8b29b8c8a10bd2818ad6e63) Update middleware.py - [`0460fa7`](https://github.com/open-webui/open-webui/commit/0460fa7d26cbef0c40860c132d2577a3202bc474) Update middleware.py ### 📊 Changes **1 file changed** (+96 additions, -8 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/utils/middleware.py` (+96 -8) </details> ### 📄 Description Fixes: https://github.com/open-webui/open-webui/issues/21726 fix: deduplicate RAG content injection when builtin tools execute When builtin tools (view_knowledge_file, query_knowledge_files, search_web, fetch_url) execute during a chat, the model receives the same content multiple times: - With file context OFF: content appears 2x (post-tool RAG source injection into the user/system message + tool result message) - With file context ON: content appears up to 3x (initial RAG injection + post-tool RAG injection + tool result message) The fix addresses duplication via two mechanisms while preserving both citation functionality and agentic tool compatibility: 1. Reference-only RAG injection: After a tool call, the post-tool RAG source tags carry only the source ID and name for citation mapping, not the full content (which is already in the tool result message). This preserves the [1], [2] citation format while avoiding content duplication. 2. Source deduplication: After the initial RAG pass (when file context is enabled), all injected source/file IDs are tracked. When tools return citation sources already present from the initial pass, they are filtered out before post-tool RAG injection. Newly injected sources are tracked for subsequent tool call rounds. Additionally, the repeated inline list of citation tool names is extracted into a module-level CITATION_TOOL_NAMES constant, and the repeated source ID collection logic is extracted into a collect_source_ids helper function. ### Contributor License Agreement <!-- 🚨 DO NOT DELETE THE TEXT BELOW 🚨 Keep the "Contributor License Agreement" confirmation text intact. Deleting it will trigger the CLA-Bot to INVALIDATE your PR. --> By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 10:53:02 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#65105