[GH-ISSUE #15249] issue: Race condition in opensearch vector db implementation #33036

Closed
opened 2026-04-25 06:54:05 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @dlamoris on GitHub (Jun 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15249

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.15

Ollama Version (if applicable)

0.9.2

Operating System

macOS

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

I'm using an external custom content extraction engine that returns multiple documents for a single file upload (one reason is to take advantage of metadata.page support in citations), with opensearch vector db.

When I upload a file to a knowledge base (in this case the file is the only file in the kb), then either reference the file or knowledge base in a chat, I expect the citation results to be similar for the same query

Actual Behavior

When referencing the single file, results and citations come up as expected and I see the correct chunks and page number.

When referencing the knowledge base, different chunks and citations are shown and does not match the results of the single file referenced.

Steps to Reproduce

This is due to a race condition in the opensearch implementation when adding a file to a knowledge base - the file is uploaded and processed first, and immediately another call is made to add the file id to the knowledge base.

This does a vector db query to get the indexed documents of the file to add to the knowledge base collection, but if opensearch hasn't made those metadata ready for search yet, nothing is returned and a single document is made as fallback with the contents stored from the db

In open_webui/routers/retrival.py/process_file line 1311 (at the time of this issue)

elif form_data.collection_name:
            # Check if the file has already been processed and save the content
            # Usage: /knowledge/{id}/file/add, /knowledge/{id}/file/update

            result = VECTOR_DB_CLIENT.query(
                collection_name=f"file-{file.id}", filter={"file_id": file.id}
            )

            if result is not None and len(result.ids[0]) > 0:
                docs = [
                    Document(
                        page_content=result.documents[0][idx],
                        metadata=result.metadatas[0][idx],
                    )
                    for idx, id in enumerate(result.ids[0])
                ]
            else:
                docs = [
                    Document(
                        page_content=file.data.get("content", ""),
                        metadata={
                            **file.meta,
                            "name": file.filename,
                            "created_by": file.user_id,
                            "file_id": file.id,
                            "source": file.filename,
                        },
                    )
                ]

            text_content = file.data.get("content", "")

Logs & Screenshots

calls from browser when uploading a file to a knowledge base

Image

Additional Information

Will make a pr to force opensearch to refresh indexes as collections gets changed, so search queries will work immediately

Originally created by @dlamoris on GitHub (Jun 24, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/15249 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.15 ### Ollama Version (if applicable) 0.9.2 ### Operating System macOS ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior I'm using an external custom content extraction engine that returns multiple documents for a single file upload (one reason is to take advantage of metadata.page support in citations), with opensearch vector db. When I upload a file to a knowledge base (in this case the file is the only file in the kb), then either reference the file or knowledge base in a chat, I expect the citation results to be similar for the same query ### Actual Behavior When referencing the single file, results and citations come up as expected and I see the correct chunks and page number. When referencing the knowledge base, different chunks and citations are shown and does not match the results of the single file referenced. ### Steps to Reproduce This is due to a race condition in the opensearch implementation when adding a file to a knowledge base - the file is uploaded and processed first, and immediately another call is made to add the file id to the knowledge base. This does a vector db query to get the indexed documents of the file to add to the knowledge base collection, but if opensearch hasn't made those metadata ready for search yet, nothing is returned and a single document is made as fallback with the contents stored from the db In open_webui/routers/retrival.py/process_file line 1311 (at the time of this issue) ``` elif form_data.collection_name: # Check if the file has already been processed and save the content # Usage: /knowledge/{id}/file/add, /knowledge/{id}/file/update result = VECTOR_DB_CLIENT.query( collection_name=f"file-{file.id}", filter={"file_id": file.id} ) if result is not None and len(result.ids[0]) > 0: docs = [ Document( page_content=result.documents[0][idx], metadata=result.metadatas[0][idx], ) for idx, id in enumerate(result.ids[0]) ] else: docs = [ Document( page_content=file.data.get("content", ""), metadata={ **file.meta, "name": file.filename, "created_by": file.user_id, "file_id": file.id, "source": file.filename, }, ) ] text_content = file.data.get("content", "") ``` ### Logs & Screenshots calls from browser when uploading a file to a knowledge base <img width="1714" alt="Image" src="https://github.com/user-attachments/assets/6d4c8636-bba3-4c79-9981-cab0685a92e1" /> ### Additional Information Will make a pr to force opensearch to refresh indexes as collections gets changed, so search queries will work immediately
GiteaMirror added the bug label 2026-04-25 06:54:05 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#33036