[PR #19602] [MERGED] fix: Update milvus.py #40891

Closed
opened 2026-04-25 13:16:31 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/19602
Author: @Classic298
Created: 11/30/2025
Status: Merged
Merged: 12/2/2025
Merged by: @tjbck

Base: devHead: milvus-test


📝 Commits (10+)

📊 Changes

1 file changed (+17 additions, -15 deletions)

View changed files

📝 backend/open_webui/retrieval/vector/dbs/milvus.py (+17 -15)

📄 Description

  • Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • fix: Bug fix or error correction

Changelog Entry

Description

Fixes: https://github.com/open-webui/open-webui/discussions/18119
Fixes: https://github.com/open-webui/open-webui/discussions/16345
Fixes: https://github.com/open-webui/open-webui/issues/17088
Fixes: https://github.com/open-webui/open-webui/issues/18485

Why This Was Necessary (The Problem)

Root Cause: Milvus's query_iterator() method has a bug where it ignores JSON metadata field filters.
Evidence from Testing:
When querying for metadata["hash"] == "abc123...":

  • query_iterator() returned ALL documents in the collection (e.g., 16, 38, 42 results)
  • ZERO of those results actually had the matching hash
  • This caused false "duplicate content detected" errors

Why It Manifested:

When uploading a second file to a knowledge base:

  • System queries: metadata["hash"] == "hash_of_file2"
  • query_iterator() returns ALL documents (from file1)
  • Duplicate detection sees non-empty results
  • Falsely rejects file2 as duplicate

How The Fix Works (The Mechanism)

1. Proper String Quote Handling
Milvus requires string values in filter expressions to be explicitly quoted:

# Wrong (what json.dumps produces for all values):
metadata["hash"] == "abc123"  # json.dumps adds quotes to everything

# Right (what we now do):
metadata["hash"] == "abc123"  # for strings
metadata["count"] == 5        # for numbers (no quotes)

By checking isinstance(value, str), we add quotes only when needed.

2. Direct Query Method

The collection.query() method (not iterator):

  • Properly applies JSON metadata filters
  • Returns results synchronously in a single call
  • Respects the filter expression exactly

We confirmed this works because:

  • The multitenancy implementation uses query() and has NO issues
    **- Testing showed query() returns 0 results when no hash matches (correct!)
  • Testing showed query_iterator() returns all documents ignoring the filter (broken!)**

3. Limit Adjustment
query() requires a positive limit, while query_iterator() accepted -1 (unlimited):
limit=limit if limit > 0 else 16384 # Milvus max limit


Additional Information

Tested locally

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/19602 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 11/30/2025 **Status:** ✅ Merged **Merged:** 12/2/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `milvus-test` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`e9d0cb0`](https://github.com/open-webui/open-webui/commit/e9d0cb0d1a8471a069540908b58a827b52a0ffa7) Update milvus.py - [`03d7acf`](https://github.com/open-webui/open-webui/commit/03d7acf4f2e5842761c0b97a50a0dd5ac9f6cb8a) Update milvus.py - [`5f4d13f`](https://github.com/open-webui/open-webui/commit/5f4d13f5a13339a8459fd8c19b82e1198e4ba330) Update milvus.py - [`7eba3e2`](https://github.com/open-webui/open-webui/commit/7eba3e252ca280b827afd422f487007967e8059d) Update milvus.py - [`319e0c3`](https://github.com/open-webui/open-webui/commit/319e0c35460891a5c6ab57b22bc2a440adbe7e6e) Merge branch 'open-webui:main' into milvus-test ### 📊 Changes **1 file changed** (+17 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/retrieval/vector/dbs/milvus.py` (+17 -15) </details> ### 📄 Description - [X] **Target branch:** Verify that the pull request targets the `dev` branch. **Not targeting the `dev` branch will lead to immediate closure of the PR.** - [X] **Description:** Provide a concise description of the changes made in this pull request down below. - [X] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [X] **Documentation:** If necessary, update relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs) like environment variables, the tutorials, or other documentation sources. - [X] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [X] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Take this as an opportunity to **make screenshots of the feature/fix and include it in the PR description**. - [X] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [X] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [X] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **fix**: Bug fix or error correction # Changelog Entry ### Description Fixes: https://github.com/open-webui/open-webui/discussions/18119 Fixes: https://github.com/open-webui/open-webui/discussions/16345 Fixes: https://github.com/open-webui/open-webui/issues/17088 Fixes: https://github.com/open-webui/open-webui/issues/18485 #### **Why This Was Necessary (The Problem)** **Root Cause:** Milvus's query_iterator() method has a bug where it ignores JSON metadata field filters. **Evidence from Testing:** When querying for metadata["hash"] == "abc123...": - query_iterator() returned ALL documents in the collection (e.g., 16, 38, 42 results) - ZERO of those results actually had the matching hash - This caused false "duplicate content detected" errors #### **Why It Manifested:** When uploading a second file to a knowledge base: - System queries: metadata["hash"] == "hash_of_file2" - query_iterator() returns ALL documents (from file1) - Duplicate detection sees non-empty results - Falsely rejects file2 as duplicate ❌ #### **How The Fix Works (The Mechanism)** **1. Proper String Quote Handling** Milvus requires string values in filter expressions to be explicitly quoted: ``` # Wrong (what json.dumps produces for all values): metadata["hash"] == "abc123" # json.dumps adds quotes to everything # Right (what we now do): metadata["hash"] == "abc123" # for strings metadata["count"] == 5 # for numbers (no quotes) ``` By checking isinstance(value, str), we add quotes only when needed. **2. Direct Query Method** The collection.query() method (not iterator): - Properly applies JSON metadata filters - Returns results synchronously in a single call - Respects the filter expression exactly We confirmed this works because: - The multitenancy implementation uses query() and has NO issues **- Testing showed query() returns 0 results when no hash matches (correct!) - Testing showed query_iterator() returns all documents ignoring the filter (broken!)** **3. Limit Adjustment** query() requires a positive limit, while query_iterator() accepted -1 (unlimited): `limit=limit if limit > 0 else 16384 # Milvus max limit` --- ### Additional Information Tested locally ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 13:16:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#40891