Documents Indexing [ Collection already exists ] #1283

New Issue

GiteaMirror · 2025-11-11T14:41:48-06:00

GiteaMirror commented

2025-11-11 14:41:48 -06:00

Originally created by @AhmadMuj on GitHub (Jun 17, 2024).

Bug Report

Description

Bug Summary:
I'm getting an exception while trying to index ( Add ) documents to the workspace collection

Steps to Reproduce:
I think the main problem is that I added a document before and somehow it didn't finish to the end or something so now I'm unable to add the same document again because a chunk of the collection already exists

Expected Behavior:
There should be some handling for large documents ( +2000 pages ) by introducing some kind of queue for the indexing instead of the document not being visible until it's fully indexed ( Which might take up to a few hours in my case )

Actual Behavior:
The document is not appearing until it's fully indexed so I'm not really sure what is the status of the document now

Environment

Open WebUI Version: 0.35
Operating System: Windows [ Client not host ]
Browser (if applicable): Chromium 126.0.6478.71

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
Nothing is getting logged on the browser, the request is timing out due to taking so long from the server side

Docker Container Logs:

metadata={'source': '/app/backend/data/uploads/RSS-D15', 'page': 128, 'start_index': 41})] 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772
ERROR:apps.rag.main:Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists
Traceback (most recent call last):
  File "/app/backend/apps/rag/main.py", line 938, in store_docs_in_vector_db
    collection = CHROMA_CLIENT.create_collection(name=collection_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/api/client.py", line 198, in create_collection
    return self._server.create_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/api/segment.py", line 173, in create_collection
    coll, created = self._sysdb.create_collection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/db/mixins/sysdb.py", line 220, in create_collection
    raise UniqueConstraintError(f"Collection {name} already exists")
chromadb.db.base.UniqueConstraintError: Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists

Installation Method

Docker

Additional Information

Basically the main problem is assuming that a document indexing could be done instantly, I have documents up to 4000 pages that I would like to index and the only way to do that is by adding some kind of queue with jobs for the indexing and showing the document as ( under processing )

Originally created by @AhmadMuj on GitHub (Jun 17, 2024). # Bug Report ## Description **Bug Summary:** I'm getting an exception while trying to index ( Add ) documents to the workspace collection **Steps to Reproduce:** I think the main problem is that I added a document before and somehow it didn't finish to the end or something so now I'm unable to add the same document again because a chunk of the collection already exists **Expected Behavior:** There should be some handling for large documents ( +2000 pages ) by introducing some kind of queue for the indexing instead of the document not being visible until it's fully indexed ( Which might take up to a few hours in my case ) **Actual Behavior:** The document is not appearing until it's fully indexed so I'm not really sure what is the status of the document now ## Environment - **Open WebUI Version:** 0.35 - **Operating System:** Windows [ Client not host ] - **Browser (if applicable):** Chromium 126.0.6478.71 ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. ## Logs and Screenshots **Browser Console Logs:** Nothing is getting logged on the browser, the request is timing out due to taking so long from the server side **Docker Container Logs:** ``` metadata={'source': '/app/backend/data/uploads/RSS-D15', 'page': 128, 'start_index': 41})] 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 ERROR:apps.rag.main:Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists Traceback (most recent call last): File "/app/backend/apps/rag/main.py", line 938, in store_docs_in_vector_db collection = CHROMA_CLIENT.create_collection(name=collection_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/chromadb/api/client.py", line 198, in create_collection return self._server.create_collection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/chromadb/api/segment.py", line 173, in create_collection coll, created = self._sysdb.create_collection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/chromadb/db/mixins/sysdb.py", line 220, in create_collection raise UniqueConstraintError(f"Collection {name} already exists") chromadb.db.base.UniqueConstraintError: Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists ``` ## Installation Method Docker ## Additional Information Basically the main problem is assuming that a document indexing could be done instantly, I have documents up to 4000 pages that I would like to index and the only way to do that is by adding some kind of queue with jobs for the indexing and showing the document as ( under processing )

GiteaMirror closed this issue