File deletion doesn't properly clean up database entries, causing issues with re-uploads #2751

Closed
opened 2025-11-11 15:13:34 -06:00 by GiteaMirror · 27 comments
Owner

Originally created by @eml-henn on GitHub (Nov 21, 2024).

Bug Report


Installation Method

Kubernetes on Azure Kubernetes Service

Environment

  • **Open WebUI Version: 0.4.0

  • **Operating System: AKSUbuntu-2204

  • **Browser: Firefox 132.0.2

Confirmation:

  • [ X ] I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • [ X ] I have included the Docker container logs.
  • [ X ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

  1. File deletion from a knowledge base should remove both the UI entry and the corresponding database content
  2. The operation should provide feedback about its success/failure
  3. Re-uploading a previously deleted file should work as if it were a new file

Actual Behavior:

  1. File deletion sometimes only removes the file from the frontend FilesTable
  2. No feedback is provided about whether the database deletion was successful
  3. The vector database can retain the old entries even after file deletion
  4. Attempting to re-upload the same file results in a "duplicate content" error

Description

Bug Summary:
When deleting a processed file from a knowledge base through the frontend, the file appears to be removed from the UI but its content sometimes remains in the vector database. This creates issues when trying to re-upload the same file, as the system detects it as duplicate content.

Reproduction Details

This is frustratingly a "sometimes" error. So my proposed solution would be to add logging to make it easier to reproduce.

Steps to Reproduce:

  1. Upload a file to a knowledge base
  2. Delete the file using the remove button in the FilesTable
  3. Try to upload the same file again
  4. Observe the "duplicate content" error.

Logs and Screenshots

Docker Container Logs:
// When removing a file:
INFO: 10.1.1.4:0 - "POST /api/v1/knowledge/{id}/file/remove HTTP/1.1" 200 OK

// When trying to re-upload (error due to remaining content):
INFO: 10.1.1.4:0 - "GET /api/v1/knowledge/{id} HTTP/1.1" 200 OK
INFO [open_webui.apps.webui.routers.files] file.content_type: text/plain
INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {file} {file_collection_id}
INFO [open_webui.apps.retrieval.main] adding to collection {file_collection_id}
Collection {file_collection_id} does not exist.
INFO: 10.1.1.4:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {file} {id}
INFO [open_webui.apps.retrieval.main] Document with hash [file hash} already exists
ERROR [open_webui.apps.retrieval.main] Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):
File "/app/backend/open_webui/apps/retrieval/main.py", line 1001, in process_file
raise e
File "/app/backend/open_webui/apps/retrieval/main.py", line 975, in process_file
result = save_docs_to_vector_db(
^^^^^^^^^^^^^^^^^^^^^^^
File "/app/backend/open_webui/apps/retrieval/main.py", line 759, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
ValueError: Duplicate content detected. Please provide unique content to proceed.
INFO: 10.1.1.4:0 - "POST /api/v1/knowledge/{id}/file/add HTTP/1.1" 400 Bad Request

// Compare with Logs when adding a file:
INFO: 10.1.1.222:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO [open_webui.apps.webui.routers.files] file.content_type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {filename} {collection_id}
INFO [open_webui.apps.retrieval.main] collection {collection_id} already exists
INFO [open_webui.apps.retrieval.main] adding to collection {collection_id}

Additional Information

The issue appears to be in the file removal endpoint (@router.post("/{id}/file/remove") in knowledge.py. Currently:

  • The deletion operation doesn't verify if the vector database cleanup was successful
  • The frontend updates regardless of the backend operation's success
  • The deletion doesn´t give feedback on the new state of the collection

Proposed Solution

  1. Add verification of vector database deletion
  2. Add error handling and user feedback
  3. Only update the frontend UI after confirmed successful deletion
Originally created by @eml-henn on GitHub (Nov 21, 2024). # Bug Report --- ## Installation Method Kubernetes on Azure Kubernetes Service ## Environment - **Open WebUI Version: 0.4.0 - **Operating System: AKSUbuntu-2204 - **Browser: Firefox 132.0.2 **Confirmation:** - [ X ] I have read and followed all the instructions provided in the README.md. - [ ] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [ X ] I have included the Docker container logs. - [ X ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: 1. File deletion from a knowledge base should remove both the UI entry and the corresponding database content 2. The operation should provide feedback about its success/failure 3. Re-uploading a previously deleted file should work as if it were a new file ## Actual Behavior: 1. File deletion sometimes only removes the file from the frontend FilesTable 2. No feedback is provided about whether the database deletion was successful 3. The vector database can retain the old entries even after file deletion 4. Attempting to re-upload the same file results in a "duplicate content" error ## Description **Bug Summary:** When deleting a processed file from a knowledge base through the frontend, the file appears to be removed from the UI but its content sometimes remains in the vector database. This creates issues when trying to re-upload the same file, as the system detects it as duplicate content. ## Reproduction Details This is frustratingly a "sometimes" error. So my proposed solution would be to add logging to make it easier to reproduce. **Steps to Reproduce:** 1. Upload a file to a knowledge base 2. Delete the file using the remove button in the FilesTable 3. Try to upload the same file again 4. Observe the "duplicate content" error. ## Logs and Screenshots **Docker Container Logs:** // When removing a file: INFO: 10.1.1.4:0 - "POST /api/v1/knowledge/{id}/file/remove HTTP/1.1" 200 OK // When trying to re-upload (error due to remaining content): INFO: 10.1.1.4:0 - "GET /api/v1/knowledge/{id} HTTP/1.1" 200 OK INFO [open_webui.apps.webui.routers.files] file.content_type: text/plain INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {file} {file_collection_id} INFO [open_webui.apps.retrieval.main] adding to collection {file_collection_id} Collection {file_collection_id} does not exist. INFO: 10.1.1.4:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {file} {id} INFO [open_webui.apps.retrieval.main] Document with hash [file hash} already exists ERROR [open_webui.apps.retrieval.main] Duplicate content detected. Please provide unique content to proceed. Traceback (most recent call last): File "/app/backend/open_webui/apps/retrieval/main.py", line 1001, in process_file raise e File "/app/backend/open_webui/apps/retrieval/main.py", line 975, in process_file result = save_docs_to_vector_db( ^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/apps/retrieval/main.py", line 759, in save_docs_to_vector_db raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT) ValueError: Duplicate content detected. Please provide unique content to proceed. INFO: 10.1.1.4:0 - "POST /api/v1/knowledge/{id}/file/add HTTP/1.1" 400 Bad Request // Compare with Logs when adding a file: INFO: 10.1.1.222:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK INFO [open_webui.apps.webui.routers.files] file.content_type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {filename} {collection_id} INFO [open_webui.apps.retrieval.main] collection {collection_id} already exists INFO [open_webui.apps.retrieval.main] adding to collection {collection_id} ## Additional Information The issue appears to be in the file removal endpoint (@router.post("/{id}/file/remove") in knowledge.py. Currently: - The deletion operation doesn't verify if the vector database cleanup was successful - The frontend updates regardless of the backend operation's success - The deletion doesn´t give feedback on the new state of the collection ## Proposed Solution 1. Add verification of vector database deletion 2. Add error handling and user feedback 3. Only update the frontend UI after confirmed successful deletion
Author
Owner

@tjbck commented on GitHub (Nov 22, 2024):

Would love to investigate more but we'll need a more reliable way to reproduce the issue, definitely continue our troubleshooting journey and keep us posted!

@tjbck commented on GitHub (Nov 22, 2024): Would love to investigate more but we'll need a more reliable way to reproduce the issue, definitely continue our troubleshooting journey and keep us posted!
Author
Owner

@sreinwald commented on GitHub (Dec 2, 2024):

I just ran across this issue as well and I can reproduce it 100% using the API on v0.4.7, deployed via docker compose.

Steps to reproduce:

  • Reset Vector Storage/Knowledge
  • Upload file via API
  • Delete file via API
  • Open chroma.sqlite with a sqlite browser
sqlite> select * from embedding_metadata;

In my specific example:

curl -X POST $HOST'/api/v1/files/' \
  --header 'Authorization: Bearer '$API_KEY \
  --header 'Accept: application/json' \
  -F 'file=@/home/sre/foo.txt;type=text/plain'
{"id":"6052a7df-99aa-4570-b5f5-1f06a54acddf","user_id":"126aaeba-bc1c-48b5-b47c-c9dc9a5394a1","hash":"ee1abdd5d09f7426d4950b928c7a73ba28e5085b06c7d909790a36891348ee29","filename":"foo.txt","data":{"content":""},"meta":{"name":"foo.txt","content_type":"text/plain","size":1491,"collection_name":"file-6052a7df-99aa-4570-b5f5-1f06a54acddf"},"created_at":1733137414,"updated_at":1733137414}%

curl -X DELETE $HOST'/api/v1/files/6052a7df-99aa-4570-b5f5-1f06a54acddf' \
  --header 'Authorization: Bearer '$API_KEY
{"message":"File deleted successfully"}%
root@ai /var/lib/docker/volumes/open-webui_open-webui/_data/vector_db # sqlite3 chroma.sqlite3
SQLite version 3.45.1 2024-01-30 16:01:20                                               

Enter ".help" for usage hints.
sqlite> select * from embedding_metadata;
1|source|foo.txt|||
1|name|foo.txt|||
1|created_by|126aaeba-bc1c-48b5-b47c-c9dc9a5394a1|||
1|file_id|6052a7df-99aa-4570-b5f5-1f06a54acddf|||
1|start_index||0||
1|hash|ee1abdd5d09f7426d4950b928c7a73ba28e5085b06c7d909790a36891348ee29|||
1|embedding_config|{"engine": "ollama", "model": "bge-m3:latest"}|||
1|chroma:document|{content}|||

The issue with re-uploads very much seems to be related to the issue above, and I can reproduce it 100% with these steps using the API directly:

  • Reset Vector DB
  • Create a knowedgebase
  • Upload a file
  • Add the file to the knowledgebase
  • Delete the file
  • Upload the file again
  • Trying to add the file to the knowledgebase will now fail with 400: Duplicate content detected
@sreinwald commented on GitHub (Dec 2, 2024): I just ran across this issue as well and I can reproduce it 100% using the API on v0.4.7, deployed via docker compose. Steps to reproduce: - Reset Vector Storage/Knowledge - Upload file via API - Delete file via API - Open chroma.sqlite with a sqlite browser ```sql sqlite> select * from embedding_metadata; ``` In my specific example: ```sh curl -X POST $HOST'/api/v1/files/' \ --header 'Authorization: Bearer '$API_KEY \ --header 'Accept: application/json' \ -F 'file=@/home/sre/foo.txt;type=text/plain' {"id":"6052a7df-99aa-4570-b5f5-1f06a54acddf","user_id":"126aaeba-bc1c-48b5-b47c-c9dc9a5394a1","hash":"ee1abdd5d09f7426d4950b928c7a73ba28e5085b06c7d909790a36891348ee29","filename":"foo.txt","data":{"content":""},"meta":{"name":"foo.txt","content_type":"text/plain","size":1491,"collection_name":"file-6052a7df-99aa-4570-b5f5-1f06a54acddf"},"created_at":1733137414,"updated_at":1733137414}% curl -X DELETE $HOST'/api/v1/files/6052a7df-99aa-4570-b5f5-1f06a54acddf' \ --header 'Authorization: Bearer '$API_KEY {"message":"File deleted successfully"}% ``` ```sh root@ai /var/lib/docker/volumes/open-webui_open-webui/_data/vector_db # sqlite3 chroma.sqlite3 SQLite version 3.45.1 2024-01-30 16:01:20 Enter ".help" for usage hints. sqlite> select * from embedding_metadata; 1|source|foo.txt||| 1|name|foo.txt||| 1|created_by|126aaeba-bc1c-48b5-b47c-c9dc9a5394a1||| 1|file_id|6052a7df-99aa-4570-b5f5-1f06a54acddf||| 1|start_index||0|| 1|hash|ee1abdd5d09f7426d4950b928c7a73ba28e5085b06c7d909790a36891348ee29||| 1|embedding_config|{"engine": "ollama", "model": "bge-m3:latest"}||| 1|chroma:document|{content}||| ``` The issue with re-uploads very much seems to be related to the issue above, and I can reproduce it 100% with these steps using the API directly: - Reset Vector DB - Create a knowedgebase - Upload a file - Add the file to the knowledgebase - Delete the file - Upload the file again - Trying to add the file to the knowledgebase will now fail with `400: Duplicate content detected`
Author
Owner

@Constey commented on GitHub (Dec 8, 2024):

From my thought i can just call the api with the same file again to do a re-upload.
it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue.
running on: (v0.4.8
(latest)

Steps to Reproduce:

  • Upload a file

  • add the file to the knowledgebase

  • Upload the file again

  • Try adding it to the knowledgebase again

    Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746
    Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b...
    requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add

My test script for upload:
def add_file_to_knowledge(token, knowledge_id, file_id, base_url):
url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add'
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'
}
data = {'file_id': file_id}
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
return response.json()

@Constey commented on GitHub (Dec 8, 2024): From my thought i can just call the api with the same file again to do a re-upload. it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue. running on: (v0.4.8 [(latest)](https://github.com/open-webui/open-webui/releases/tag/v0.4.8) Steps to Reproduce: - Upload a file - add the file to the knowledgebase - Upload the file again - Try adding it to the knowledgebase again Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746 Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b... requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add My test script for upload: def add_file_to_knowledge(token, knowledge_id, file_id, base_url): url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add' headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } data = {'file_id': file_id} response = requests.post(url, headers=headers, json=data) response.raise_for_status() return response.json()
Author
Owner

@Constey commented on GitHub (Dec 8, 2024):

I think the issue is somewhere here located: 29a2719595/backend/open_webui/apps/webui/routers/knowledge.py (L244)

@Constey commented on GitHub (Dec 8, 2024): I think the issue is somewhere here located: https://github.com/open-webui/open-webui/blob/29a271959556743e6deb4d55a5a982983335d7ab/backend/open_webui/apps/webui/routers/knowledge.py#L244
Author
Owner

@AlgorithmicKing737 commented on GitHub (Dec 31, 2024):

any solution yet?

@AlgorithmicKing737 commented on GitHub (Dec 31, 2024): any solution yet?
Author
Owner

@Classic298 commented on GitHub (Jan 13, 2025):

I want to add to this issue, that files uploaded in normal chats are not deleted from the vector database either. Even if you delete the chat, the vector database does not shrink. It stays the same size. And in fact, it only grows.

Even if you press "reset vector-storage" in the admin panel under documents, nothing gets deleted from the database.

So 1) nothing gets deleted even if the chat where the file was uploaded is deleted and 2) the reset vectorstorage button also doesn't do anything.

I am on version 0.5.4 but this was always the case for me on previous versions as well. I am on pip installation if that matters and this issue has been discussed here as well: https://github.com/open-webui/open-webui/discussions/5558

@Classic298 commented on GitHub (Jan 13, 2025): I want to add to this issue, that files uploaded in normal chats are not deleted from the vector database either. Even if you delete the chat, the vector database does not shrink. It stays the same size. And in fact, it only grows. Even if you press "reset vector-storage" in the admin panel under documents, nothing gets deleted from the database. So 1) nothing gets deleted even if the chat where the file was uploaded is deleted and 2) the reset vectorstorage button also doesn't do anything. I am on version 0.5.4 but this was always the case for me on previous versions as well. I am on pip installation if that matters and this issue has been discussed here as well: https://github.com/open-webui/open-webui/discussions/5558
Author
Owner

@juananpe commented on GitHub (Jan 13, 2025):

@Classic298 Oh, I see. My PR https://github.com/open-webui/open-webui/pull/8499 fixes the situation when you remove a file that has been added via a Knowledge Base, but it doesn't fix the problem when the file is added directly from the Upload Documents option in the chat. I'll have a look at it tomorrow.

@juananpe commented on GitHub (Jan 13, 2025): @Classic298 Oh, I see. My PR https://github.com/open-webui/open-webui/pull/8499 fixes the situation when you remove a file that has been added via a `Knowledge Base`, but it doesn't fix the problem when the file is added directly from the `Upload Documents` option in the chat. I'll have a look at it tomorrow.
Author
Owner

@Classic298 commented on GitHub (Jan 22, 2025):

Was the issue with files not being deleted from the db even after deleting the chat fixed?

@Classic298 commented on GitHub (Jan 22, 2025): Was the issue with files not being deleted from the db even after deleting the chat fixed?
Author
Owner

@tjbck commented on GitHub (Jan 22, 2025):

Everything uploaded to Open WebUI is being kept for audit/logging purposes which is a security requirement for many organisations. You should utilise external scripts to clean the upload directory for now!

@tjbck commented on GitHub (Jan 22, 2025): Everything uploaded to Open WebUI is being kept for audit/logging purposes which is a security requirement for many organisations. You should utilise external scripts to clean the upload directory for now!
Author
Owner

@Classic298 commented on GitHub (Jan 23, 2025):

Then why was the deletion of files, when deleting them from the knowledge base, even implemented and accepted by you? If files should not get deleted.

And even the implementation of file deletion from chats was accepted and merged by you - it is 90% implemented. Only the actual deletion logic for the chroma db is missing.

Maybe with an environment variable or admin setting (either is fine), it would be cool to be able to set this.
An ever growing chroma database and uploads folder will grow to be a problem relatively quickly, no?

@Classic298 commented on GitHub (Jan 23, 2025): Then why was the deletion of files, when deleting them from the knowledge base, even implemented and accepted by you? If files should not get deleted. And even the implementation of file deletion from chats was accepted and merged by you - it is 90% implemented. Only the actual deletion logic for the chroma db is missing. Maybe with an environment variable or admin setting (either is fine), it would be cool to be able to set this. An ever growing chroma database and uploads folder will grow to be a problem relatively quickly, no?
Author
Owner

@Classic298 commented on GitHub (Jan 25, 2025):

This issue was not fixed yet as there is literally a placeholder for the missing code, just saying. Writing in the commit notes that 7181 is fixed is weird

@Classic298 commented on GitHub (Jan 25, 2025): This issue was not fixed yet as there is literally a placeholder for the missing code, just saying. Writing in the commit notes that 7181 is fixed is weird
Author
Owner

@Classic298 commented on GitHub (Jan 29, 2025):

Bump; - issue is not fixed and current implementation goes against ethos that Tim described.

@Classic298 commented on GitHub (Jan 29, 2025): Bump; - issue is not fixed and current implementation goes against ethos that Tim described.
Author
Owner

@tjbck commented on GitHub (Jan 29, 2025):

Reverted #8499 4abede9a2b

@tjbck commented on GitHub (Jan 29, 2025): Reverted #8499 4abede9a2bad7902e23e8bff2de93fff2c163ce4
Author
Owner

@Jeevanhm commented on GitHub (Feb 6, 2025):

From my thought i can just call the api with the same file again to do a re-upload. it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue. running on: (v0.4.8 (latest)

Steps to Reproduce:

  • Upload a file
  • add the file to the knowledgebase
  • Upload the file again
  • Try adding it to the knowledgebase again
    Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746
    Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b...
    requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add

My test script for upload: def add_file_to_knowledge(token, knowledge_id, file_id, base_url): url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add' headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } data = {'file_id': file_id} response = requests.post(url, headers=headers, json=data) response.raise_for_status() return response.json()

can you share the final script used to upload and add files to the knowledge base please.

@Jeevanhm commented on GitHub (Feb 6, 2025): > From my thought i can just call the api with the same file again to do a re-upload. it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue. running on: (v0.4.8 [(latest)](https://github.com/open-webui/open-webui/releases/tag/v0.4.8) > > Steps to Reproduce: > > * Upload a file > * add the file to the knowledgebase > * Upload the file again > * Try adding it to the knowledgebase again > Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746 > Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b... > requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add > > My test script for upload: def add_file_to_knowledge(token, knowledge_id, file_id, base_url): url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add' headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } data = {'file_id': file_id} response = requests.post(url, headers=headers, json=data) response.raise_for_status() return response.json() can you share the final script used to upload and add files to the knowledge base please.
Author
Owner

@Jeevanhm commented on GitHub (Feb 7, 2025):

Getting this error while adding files to Knowledge Collections.. any idea? Uploading and deleting the files works.

C:\Windows\system32>curl -X POST http://192.xx.xx.xx:/api/v1/knowledge/fb24ac30-d611-4988-90dc-b29fe10d118a/file/add -H "Authorization: Bearer sk-f821a6733a024915932dc30ed44b2d4a" -H "Content-Type: application/json" -d '{"file_id": "430e0a59-5fea-4d4e-87c1-8f2ad38c3dda"}'

{"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}curl: (3) unmatched close brace/bracket in URL position 37:
430e0a59-5fea-4d4e-87c1-8f2ad38c3dda}'
^

@Jeevanhm commented on GitHub (Feb 7, 2025): Getting this error while adding files to Knowledge Collections.. any idea? Uploading and deleting the files works. C:\Windows\system32>curl -X POST http://192.xx.xx.xx:/api/v1/knowledge/fb24ac30-d611-4988-90dc-b29fe10d118a/file/add -H "Authorization: Bearer sk-f821a6733a024915932dc30ed44b2d4a" -H "Content-Type: application/json" -d '{"file_id": "430e0a59-5fea-4d4e-87c1-8f2ad38c3dda"}' {"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}curl: (3) unmatched close brace/bracket in URL position 37: 430e0a59-5fea-4d4e-87c1-8f2ad38c3dda}' ^
Author
Owner

@Constey commented on GitHub (Feb 7, 2025):

From my thought i can just call the api with the same file again to do a re-upload. it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue. running on: (v0.4.8 (latest)
Steps to Reproduce:

  • Upload a file
  • add the file to the knowledgebase
  • Upload the file again
  • Try adding it to the knowledgebase again
    Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746
    Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b...
    requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add

My test script for upload: def add_file_to_knowledge(token, knowledge_id, file_id, base_url): url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add' headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } data = {'file_id': file_id} response = requests.post(url, headers=headers, json=data) response.raise_for_status() return response.json()

can you share the final script used to upload and add files to the knowledge base please.

like this, we put the files in separate kb's so its a bit more complex, but should show how it works:

"""
This script uploads .txt files from a directory structure into two different knowledgebases
based on a "special_folder" configuration. If a folder matches the special_folder, the file
is uploaded to knowledgebase B, otherwise to knowledgebase A.

New Feature (version 1.4.0):
    A command-line argument `--only-special` has been added. When used, only files within
    the special_folder subdirectories will be uploaded. All other files/folders will be skipped.

Exception handling has been added to ensure that if an error occurs (like a HTTPError from
the server), the script logs the error to upload_webui.log and continues processing the
remaining files.

A per-space summary is included to show how many files were processed, how many were
successfully uploaded, and how many failed to upload for each top-level folder (space).

Prerequisites:
    pip install requests

Usage:
    python upload_script.py [--only-special]

    --only-special   Only upload files that belong to the special folder (e.g., "5027").

Configuration:
    - Update config.py to include:
      {
        "output_text_base_dir": "text_output",
        "openwebui_url": "http://localhost:3000",
        "openwebui_token": "YOUR_OPENWEBUI_TOKEN_HERE",
        "openwebui_knowledge_id_a": "KNOWLEDGE_COLLECTION_ID_A",
        "openwebui_knowledge_id_b": "KNOWLEDGE_COLLECTION_ID_B",
        "special_folder": "5027"
      }
"""

import os
import sys
import argparse
import requests
import logging
from config import config

# Configure logging to a file named "upload_webui.log".
# INFO level logs general operation flow, WARNING for unusual states, ERROR for exceptions.
logging.basicConfig(
    filename='upload_webui.log',
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

def upload_file(token, file_path, base_url):
    """
    Uploads a single file to the web UI. Returns the JSON response from the server.
    Raises requests.exceptions.HTTPError if the server responds with an error status code.

    :param token: Auth token.
    :param file_path: Path to the file being uploaded.
    :param base_url: Base URL for the OpenWebUI server.
    :return: JSON response from the server with the 'id' of the uploaded file.
    """
    url = f'{base_url}/api/v1/files/'
    headers = {
        'Authorization': f'Bearer {token}',
        'Accept': 'application/json'
    }

    with open(file_path, 'rb') as f:
        files = {'file': f}
        response = requests.post(url, headers=headers, files=files)
    response.raise_for_status()  # Will raise HTTPError for 4xx/5xx responses
    return response.json()

def add_file_to_knowledge(token, knowledge_id, file_id, base_url):
    """
    Adds an uploaded file to a specified knowledgebase.

    :param token: Auth token.
    :param knowledge_id: ID of the target knowledgebase.
    :param file_id: ID of the file to be added.
    :param base_url: Base URL for the OpenWebUI server.
    :return: JSON response from the server.
    """
    url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add'
    headers = {
        'Authorization': f'Bearer {token}',
        'Content-Type': 'application/json'
    }
    data = {'file_id': file_id}
    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()  # Will raise HTTPError for 4xx/5xx responses
    return response.json()

def main():
    """
    Main logic to:
    1. Parse command-line arguments.
    2. Traverse text_base_dir for .txt files.
    3. If --only-special is set, only process files in the special_folder.
    4. Decide which knowledgebase to upload the file to based on 'special_folder'.
    5. Attempt file upload and knowledgebase addition, handle errors by logging and continuing.
    6. Keep track of per-space metrics (processed, success, fail).
    7. Print and log summary metrics for each space at the end of the script.
    """
    parser = argparse.ArgumentParser(description="Upload text files to knowledgebase(s).")
    parser.add_argument(
        "--only-special",
        action="store_true",
        help="Only upload files in the special_folder (skip others)."
    )
    args = parser.parse_args()
    only_special = args.only_special

    text_base_dir = config["output_text_base_dir"]
    token = config["openwebui_token"]
    base_url = config["openwebui_url"]
    knowledge_id_a = config["openwebui_knowledge_id_a"]
    knowledge_id_b = config["openwebui_knowledge_id_b"]
    special_folder = config["special_folder"]

    if not os.path.exists(text_base_dir):
        msg = f"Text directory '{text_base_dir}' does not exist. Nothing to upload."
        print(msg)
        logging.warning(msg)
        return

    # Counters for knowledgebase A and B
    uploaded_count_a = 0
    uploaded_count_b = 0

    # Dictionary to track counts per space: { 'SPACEKEY': {'processed': 0, 'success': 0, 'fail': 0}, ... }
    space_summary = {}

    for root, dirs, files in os.walk(text_base_dir):
        # Determine the relative path from text_base_dir
        relative_path = os.path.relpath(root, text_base_dir)
        current_folders = relative_path.split(os.sep)

        # Identify the top-level space name (if we're not at the root of text_base_dir)
        if relative_path == ".":
            space_name = "ROOT"
        else:
            space_name = current_folders[0]

        # Ensure this space_name is in our space_summary dictionary
        if space_name not in space_summary:
            space_summary[space_name] = {"processed": 0, "success": 0, "fail": 0}

        # If only_special is set, skip this entire path unless it has special_folder
        if only_special and special_folder not in current_folders:
            continue

        # Determine the target knowledgebase (A or B)
        if special_folder in current_folders:
            target_knowledge_id = knowledge_id_b
            target_count_var = "B"
        else:
            target_knowledge_id = knowledge_id_a
            target_count_var = "A"

        for file in files:
            if file.endswith(".txt"):
                # We have a text file; increment the processed counter for this space
                space_summary[space_name]["processed"] += 1

                file_path = os.path.join(root, file)
                message = f"Uploading file: {file_path} to knowledgebase {target_knowledge_id}"
                print(message)
                logging.info(message)

                # Attempt the upload
                try:
                    upload_response = upload_file(token, file_path, base_url)
                    file_id = upload_response.get("id")
                except requests.exceptions.HTTPError as http_err:
                    error_msg = (
                        f"HTTP error occurred while uploading {file_path}: {http_err}"
                    )
                    print(error_msg)
                    logging.error(error_msg, exc_info=True)
                    # Mark as fail for this space
                    space_summary[space_name]["fail"] += 1
                    continue  # Skip to the next file
                except Exception as ex:
                    error_msg = (
                        f"Unexpected error occurred while uploading {file_path}: {ex}"
                    )
                    print(error_msg)
                    logging.error(error_msg, exc_info=True)
                    # Mark as fail for this space
                    space_summary[space_name]["fail"] += 1
                    continue  # Skip to the next file

                if file_id:
                    success_msg = f"  Uploaded successfully with file_id: {file_id}"
                    print(success_msg)
                    logging.info(success_msg)
                else:
                    warn_msg = f"  Could not get file_id from response: {upload_response}"
                    print(warn_msg)
                    logging.warning(warn_msg)
                    # Mark as fail for this space since we have no file_id
                    space_summary[space_name]["fail"] += 1
                    continue

                # Attempt to add the uploaded file to the chosen knowledgebase
                try:
                    adding_msg = f"  Adding file {file_id} to knowledge {target_knowledge_id}..."
                    print(adding_msg)
                    logging.info(adding_msg)

                    add_response = add_file_to_knowledge(token, target_knowledge_id, file_id, base_url)

                    added_msg = f"  Added to knowledge successfully. Response: {add_response}"
                    print(added_msg)
                    logging.info(added_msg)
                except requests.exceptions.HTTPError as http_err:
                    error_msg = (
                        f"HTTP error occurred while adding file {file_id} to knowledge "
                        f"{target_knowledge_id}: {http_err}"
                    )
                    print(error_msg)
                    logging.error(error_msg, exc_info=True)
                    space_summary[space_name]["fail"] += 1
                    continue
                except Exception as ex:
                    error_msg = (
                        f"Unexpected error occurred while adding file {file_id} to knowledge "
                        f"{target_knowledge_id}: {ex}"
                    )
                    print(error_msg)
                    logging.error(error_msg, exc_info=True)
                    space_summary[space_name]["fail"] += 1
                    continue

                # If we made it this far, the file was successfully uploaded and added
                if target_count_var == "A":
                    uploaded_count_a += 1
                else:
                    uploaded_count_b += 1

                space_summary[space_name]["success"] += 1

    print("Finished uploading.")
    print(f"Uploaded {uploaded_count_a} files to knowledge A (ID: {knowledge_id_a})")
    print(f"Uploaded {uploaded_count_b} files to knowledge B (ID: {knowledge_id_b})")
    logging.info(
        f"Finished uploading. Uploaded {uploaded_count_a} files to knowledge A and "
        f"{uploaded_count_b} files to knowledge B."
    )

    # Print and log a summary for each space:
    print("\nPer-space summary:")
    logging.info("Per-space summary:")
    for space, stats in space_summary.items():
        processed = stats["processed"]
        success = stats["success"]
        fail = stats["fail"]
        msg_summary = (
            f"Space '{space}' - Processed: {processed}, Successful: {success}, Failed: {fail}"
        )
        print(msg_summary)
        logging.info(msg_summary)

if __name__ == "__main__":
    main()

@Constey commented on GitHub (Feb 7, 2025): > > From my thought i can just call the api with the same file again to do a re-upload. it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue. running on: (v0.4.8 [(latest)](https://github.com/open-webui/open-webui/releases/tag/v0.4.8) > > Steps to Reproduce: > > > > * Upload a file > > * add the file to the knowledgebase > > * Upload the file again > > * Try adding it to the knowledgebase again > > Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746 > > Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b... > > requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add > > > > My test script for upload: def add_file_to_knowledge(token, knowledge_id, file_id, base_url): url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add' headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } data = {'file_id': file_id} response = requests.post(url, headers=headers, json=data) response.raise_for_status() return response.json() > > can you share the final script used to upload and add files to the knowledge base please. like this, we put the files in separate kb's so its a bit more complex, but should show how it works: ``` """ This script uploads .txt files from a directory structure into two different knowledgebases based on a "special_folder" configuration. If a folder matches the special_folder, the file is uploaded to knowledgebase B, otherwise to knowledgebase A. New Feature (version 1.4.0): A command-line argument `--only-special` has been added. When used, only files within the special_folder subdirectories will be uploaded. All other files/folders will be skipped. Exception handling has been added to ensure that if an error occurs (like a HTTPError from the server), the script logs the error to upload_webui.log and continues processing the remaining files. A per-space summary is included to show how many files were processed, how many were successfully uploaded, and how many failed to upload for each top-level folder (space). Prerequisites: pip install requests Usage: python upload_script.py [--only-special] --only-special Only upload files that belong to the special folder (e.g., "5027"). Configuration: - Update config.py to include: { "output_text_base_dir": "text_output", "openwebui_url": "http://localhost:3000", "openwebui_token": "YOUR_OPENWEBUI_TOKEN_HERE", "openwebui_knowledge_id_a": "KNOWLEDGE_COLLECTION_ID_A", "openwebui_knowledge_id_b": "KNOWLEDGE_COLLECTION_ID_B", "special_folder": "5027" } """ import os import sys import argparse import requests import logging from config import config # Configure logging to a file named "upload_webui.log". # INFO level logs general operation flow, WARNING for unusual states, ERROR for exceptions. logging.basicConfig( filename='upload_webui.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s' ) def upload_file(token, file_path, base_url): """ Uploads a single file to the web UI. Returns the JSON response from the server. Raises requests.exceptions.HTTPError if the server responds with an error status code. :param token: Auth token. :param file_path: Path to the file being uploaded. :param base_url: Base URL for the OpenWebUI server. :return: JSON response from the server with the 'id' of the uploaded file. """ url = f'{base_url}/api/v1/files/' headers = { 'Authorization': f'Bearer {token}', 'Accept': 'application/json' } with open(file_path, 'rb') as f: files = {'file': f} response = requests.post(url, headers=headers, files=files) response.raise_for_status() # Will raise HTTPError for 4xx/5xx responses return response.json() def add_file_to_knowledge(token, knowledge_id, file_id, base_url): """ Adds an uploaded file to a specified knowledgebase. :param token: Auth token. :param knowledge_id: ID of the target knowledgebase. :param file_id: ID of the file to be added. :param base_url: Base URL for the OpenWebUI server. :return: JSON response from the server. """ url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add' headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } data = {'file_id': file_id} response = requests.post(url, headers=headers, json=data) response.raise_for_status() # Will raise HTTPError for 4xx/5xx responses return response.json() def main(): """ Main logic to: 1. Parse command-line arguments. 2. Traverse text_base_dir for .txt files. 3. If --only-special is set, only process files in the special_folder. 4. Decide which knowledgebase to upload the file to based on 'special_folder'. 5. Attempt file upload and knowledgebase addition, handle errors by logging and continuing. 6. Keep track of per-space metrics (processed, success, fail). 7. Print and log summary metrics for each space at the end of the script. """ parser = argparse.ArgumentParser(description="Upload text files to knowledgebase(s).") parser.add_argument( "--only-special", action="store_true", help="Only upload files in the special_folder (skip others)." ) args = parser.parse_args() only_special = args.only_special text_base_dir = config["output_text_base_dir"] token = config["openwebui_token"] base_url = config["openwebui_url"] knowledge_id_a = config["openwebui_knowledge_id_a"] knowledge_id_b = config["openwebui_knowledge_id_b"] special_folder = config["special_folder"] if not os.path.exists(text_base_dir): msg = f"Text directory '{text_base_dir}' does not exist. Nothing to upload." print(msg) logging.warning(msg) return # Counters for knowledgebase A and B uploaded_count_a = 0 uploaded_count_b = 0 # Dictionary to track counts per space: { 'SPACEKEY': {'processed': 0, 'success': 0, 'fail': 0}, ... } space_summary = {} for root, dirs, files in os.walk(text_base_dir): # Determine the relative path from text_base_dir relative_path = os.path.relpath(root, text_base_dir) current_folders = relative_path.split(os.sep) # Identify the top-level space name (if we're not at the root of text_base_dir) if relative_path == ".": space_name = "ROOT" else: space_name = current_folders[0] # Ensure this space_name is in our space_summary dictionary if space_name not in space_summary: space_summary[space_name] = {"processed": 0, "success": 0, "fail": 0} # If only_special is set, skip this entire path unless it has special_folder if only_special and special_folder not in current_folders: continue # Determine the target knowledgebase (A or B) if special_folder in current_folders: target_knowledge_id = knowledge_id_b target_count_var = "B" else: target_knowledge_id = knowledge_id_a target_count_var = "A" for file in files: if file.endswith(".txt"): # We have a text file; increment the processed counter for this space space_summary[space_name]["processed"] += 1 file_path = os.path.join(root, file) message = f"Uploading file: {file_path} to knowledgebase {target_knowledge_id}" print(message) logging.info(message) # Attempt the upload try: upload_response = upload_file(token, file_path, base_url) file_id = upload_response.get("id") except requests.exceptions.HTTPError as http_err: error_msg = ( f"HTTP error occurred while uploading {file_path}: {http_err}" ) print(error_msg) logging.error(error_msg, exc_info=True) # Mark as fail for this space space_summary[space_name]["fail"] += 1 continue # Skip to the next file except Exception as ex: error_msg = ( f"Unexpected error occurred while uploading {file_path}: {ex}" ) print(error_msg) logging.error(error_msg, exc_info=True) # Mark as fail for this space space_summary[space_name]["fail"] += 1 continue # Skip to the next file if file_id: success_msg = f" Uploaded successfully with file_id: {file_id}" print(success_msg) logging.info(success_msg) else: warn_msg = f" Could not get file_id from response: {upload_response}" print(warn_msg) logging.warning(warn_msg) # Mark as fail for this space since we have no file_id space_summary[space_name]["fail"] += 1 continue # Attempt to add the uploaded file to the chosen knowledgebase try: adding_msg = f" Adding file {file_id} to knowledge {target_knowledge_id}..." print(adding_msg) logging.info(adding_msg) add_response = add_file_to_knowledge(token, target_knowledge_id, file_id, base_url) added_msg = f" Added to knowledge successfully. Response: {add_response}" print(added_msg) logging.info(added_msg) except requests.exceptions.HTTPError as http_err: error_msg = ( f"HTTP error occurred while adding file {file_id} to knowledge " f"{target_knowledge_id}: {http_err}" ) print(error_msg) logging.error(error_msg, exc_info=True) space_summary[space_name]["fail"] += 1 continue except Exception as ex: error_msg = ( f"Unexpected error occurred while adding file {file_id} to knowledge " f"{target_knowledge_id}: {ex}" ) print(error_msg) logging.error(error_msg, exc_info=True) space_summary[space_name]["fail"] += 1 continue # If we made it this far, the file was successfully uploaded and added if target_count_var == "A": uploaded_count_a += 1 else: uploaded_count_b += 1 space_summary[space_name]["success"] += 1 print("Finished uploading.") print(f"Uploaded {uploaded_count_a} files to knowledge A (ID: {knowledge_id_a})") print(f"Uploaded {uploaded_count_b} files to knowledge B (ID: {knowledge_id_b})") logging.info( f"Finished uploading. Uploaded {uploaded_count_a} files to knowledge A and " f"{uploaded_count_b} files to knowledge B." ) # Print and log a summary for each space: print("\nPer-space summary:") logging.info("Per-space summary:") for space, stats in space_summary.items(): processed = stats["processed"] success = stats["success"] fail = stats["fail"] msg_summary = ( f"Space '{space}' - Processed: {processed}, Successful: {success}, Failed: {fail}" ) print(msg_summary) logging.info(msg_summary) if __name__ == "__main__": main() ```
Author
Owner

@Jeevanhm commented on GitHub (Feb 7, 2025):

thank you it works like a charm!

@Jeevanhm commented on GitHub (Feb 7, 2025): thank you it works like a charm!
Author
Owner

@Jeevanhm commented on GitHub (Feb 9, 2025):

@Constey how do you manage the files on the server?

I'm unable to delete specific files using file id but when I try with "all" the files are deleted on the server.

curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/2bc4340d-4b70-477d-b621-714b854c9817' -H 'accept: application/json'

curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/all' -H 'accept: application/json'

@Jeevanhm commented on GitHub (Feb 9, 2025): @Constey how do you manage the files on the server? I'm unable to delete specific files using file id but when I try with "all" the files are deleted on the server. curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/2bc4340d-4b70-477d-b621-714b854c9817' -H 'accept: application/json' curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/all' -H 'accept: application/json'
Author
Owner

@Constey commented on GitHub (Feb 9, 2025):

My Initial Plan was to just overwrite the files, but since this did not worked Ive currently just created new kbs an relinked them to the Model (and deleted the old kbs manually). I have to test how the current behaviour is (I guess it's not fixed) but if you find a way to delete the old ones, that would be nice. Am 09.02.2025 05:29 schrieb Jay @.***>:
@Constey how do you manage the files on the server?
I'm unable to delete specific files using file id but when I try with "all" the files are deleted on the server.
curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/2bc4340d-4b70-477d-b621-714b854c9817' -H 'accept: application/json'
curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/all' -H 'accept: application/json'

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

@Constey commented on GitHub (Feb 9, 2025): My Initial Plan was to just overwrite the files, but since this did not worked Ive currently just created new kbs an relinked them to the Model (and deleted the old kbs manually). I have to test how the current behaviour is (I guess it's not fixed) but if you find a way to delete the old ones, that would be nice. Am 09.02.2025 05:29 schrieb Jay ***@***.***>: @Constey how do you manage the files on the server? I'm unable to delete specific files using file id but when I try with "all" the files are deleted on the server. curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/2bc4340d-4b70-477d-b621-714b854c9817' -H 'accept: application/json' curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/all' -H 'accept: application/json' —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
Author
Owner

@ozp commented on GitHub (Feb 10, 2025):

Hello,

I uploaded the OpenWebUI docs to the RAG system following the documentation instructions. This allows me to ask the chat questions about OpenWebUI.

However, I noticed poor performance with the default RAG settings. So, I created a new configuration that requires deleting all previously uploaded files and re-uploading them.

This is when I encountered the duplication issue.

What I’ve Tried So Far:

  • Deleted files from the directory (uploads, cache, vector_db, etc.).
  • Deleted data from the database (with the help of GPT, Claude, DeepSeek, and others).
  • Here are some of the SQL commands I attempted:
    DELETE FROM embedding_metadata;
    DELETE FROM embeddings;
    DELETE FROM segment_metadata;
    DELETE FROM segments;
    DELETE FROM collections;
    VACUUM;
    
  • Modified .md and .mdx files by adding an extra line, yet they were still detected as duplicates.

Request:

Could you provide a step-by-step guide on how to completely remove all previously indexed files from the knowledge base? I’d really appreciate it.

@ozp commented on GitHub (Feb 10, 2025): Hello, I uploaded the OpenWebUI **docs** to the RAG system following the documentation instructions. This allows me to ask the chat questions about OpenWebUI. However, I noticed **poor performance** with the default RAG settings. So, I created a new configuration that requires **deleting all previously uploaded files** and re-uploading them. This is when I encountered the **duplication issue**. ### **What I’ve Tried So Far:** - **Deleted files from the directory** (uploads, cache, vector_db, etc.). - **Deleted data from the database** (with the help of GPT, Claude, DeepSeek, and others). - Here are some of the SQL commands I attempted: ```sql DELETE FROM embedding_metadata; DELETE FROM embeddings; DELETE FROM segment_metadata; DELETE FROM segments; DELETE FROM collections; VACUUM; ``` - **Modified .md and .mdx files by adding an extra line**, yet they were still detected as duplicates. ### **Request:** Could you provide a **step-by-step guide** on how to **completely remove all previously indexed files** from the knowledge base? I’d really appreciate it.
Author
Owner

@gilbrotheraway commented on GitHub (Mar 28, 2025):

in 24h with less than 5 knowledge bases my vector-db folder has:

Total disk usage: 7.7 GiB Apparent size: 7.6 GiB Items: 9881

and it's not even user error it's because uploads fail when uploading many files(github .md docs)

@gilbrotheraway commented on GitHub (Mar 28, 2025): in 24h with less than 5 knowledge bases my vector-db folder has: Total disk usage: 7.7 GiB Apparent size: 7.6 GiB Items: 9881 and it's not even user error it's because uploads fail when uploading many files(github .md docs)
Author
Owner

@Constey commented on GitHub (Apr 1, 2025):

I think this issue still exists.
If i have a knowledgebase blowing up my vector db to 10gb and i remove the whole knowledgebase, my space (vector db) will not be freed up.
/var/lib/docker/volumes/open-webui/_data/vector_db

@Constey commented on GitHub (Apr 1, 2025): I think this issue still exists. If i have a knowledgebase blowing up my vector db to 10gb and i remove the whole knowledgebase, my space (vector db) will not be freed up. /var/lib/docker/volumes/open-webui/_data/vector_db
Author
Owner

@Classic298 commented on GitHub (Apr 1, 2025):

yes, according to tim this is intentional :/

@Classic298 commented on GitHub (Apr 1, 2025): yes, according to tim this is intentional :/
Author
Owner

@Jean-Reinhold commented on GitHub (Jul 15, 2025):

Hey Guys, I fixed this by modifying my file router to add an endpoint to delete files if they are not attached to a knowledge:

@router.post("/{id}/remove", response_model=dict)
def remove_file_from_knowledge_by_id(
    id: str,
    user=Depends(get_verified_user),
):

    if user.role != "admin":
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    file = Files.get_file_by_id(id)
    if not file:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    # Remove content from the vector database
    VECTOR_DB_CLIENT.delete(
        collection_name=f"file-{file.id}", filter={"file_id": id}
    )

    # This must be okay, but I will not remove
    file_collection = f"file-{id}"
    if VECTOR_DB_CLIENT.has_collection(collection_name=file_collection):
        VECTOR_DB_CLIENT.delete_collection(collection_name=file_collection)

    # Delete file from database
    Files.delete_file_by_id(id)

    return {"message": "File deleted successfully"}

Here is the full code

import logging
import os
import uuid
from pathlib import Path
from typing import Optional
from urllib.parse import quote

from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile, status
from fastapi.responses import FileResponse, StreamingResponse
from open_webui.constants import ERROR_MESSAGES
from open_webui.env import SRC_LOG_LEVELS
from open_webui.models.files import (
    FileForm,
    FileModel,
    FileModelResponse,
    Files,
)
from open_webui.retrieval.vector.connector import VECTOR_DB_CLIENT
from open_webui.routers.retrieval import ProcessFileForm, process_file
from open_webui.storage.provider import Storage
from open_webui.utils.auth import get_admin_user, get_verified_user
from pydantic import BaseModel

log = logging.getLogger(__name__)
log.setLevel(SRC_LOG_LEVELS["MODELS"])


router = APIRouter()

############################
# Upload File
############################


@router.post("/", response_model=FileModelResponse)
def upload_file(
    request: Request,
    file: UploadFile = File(...),
    user=Depends(get_verified_user),
    file_metadata: dict = {},
):
    log.info(f"file.content_type: {file.content_type}")
    try:
        unsanitized_filename = file.filename
        filename = os.path.basename(unsanitized_filename)

        # replace filename with uuid
        id = str(uuid.uuid4())
        name = filename
        filename = f"{id}_{filename}"
        contents, file_path = Storage.upload_file(file.file, filename)

        file_item = Files.insert_new_file(
            user.id,
            FileForm(
                **{
                    "id": id,
                    "filename": name,
                    "path": file_path,
                    "meta": {
                        "name": name,
                        "content_type": file.content_type,
                        "size": len(contents),
                        "data": file_metadata,
                    },
                }
            ),
        )

        try:
            process_file(request, ProcessFileForm(file_id=id), user=user)
            file_item = Files.get_file_by_id(id=id)
        except Exception as e:
            log.exception(e)
            log.error(f"Error processing file: {file_item.id}")
            file_item = FileModelResponse(
                **{
                    **file_item.model_dump(),
                    "error": str(e.detail) if hasattr(e, "detail") else str(e),
                }
            )

        if file_item:
            return file_item
        else:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("Error uploading file"),
            )

    except Exception as e:
        log.exception(e)
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.DEFAULT(e),
        )


############################
# List Files
############################


@router.get("/", response_model=list[FileModelResponse])
async def list_files(user=Depends(get_verified_user)):
    if user.role == "admin":
        files = Files.get_files()
    else:
        files = Files.get_files_by_user_id(user.id)
    return files


############################
# Delete All Files
############################


@router.delete("/all")
async def delete_all_files(user=Depends(get_admin_user)):
    result = Files.delete_all_files()
    if result:
        try:
            Storage.delete_all_files()
        except Exception as e:
            log.exception(e)
            log.error("Error deleting files")
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("Error deleting files"),
            )
        return {"message": "All files deleted successfully"}
    else:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.DEFAULT("Error deleting files"),
        )


############################
# Get File By Id
############################


@router.get("/{id}", response_model=Optional[FileModel])
async def get_file_by_id(id: str, user=Depends(get_verified_user)):
    file = Files.get_file_by_id(id)

    if file and (file.user_id == user.id or user.role == "admin"):
        return file
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )


############################
# Get File Data Content By Id
############################


@router.get("/{id}/data/content")
async def get_file_data_content_by_id(id: str, user=Depends(get_verified_user)):
    file = Files.get_file_by_id(id)

    if file and (file.user_id == user.id or user.role == "admin"):
        return {"content": file.data.get("content", "")}
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )


############################
# Update File Data Content By Id
############################


class ContentForm(BaseModel):
    content: str


@router.post("/{id}/data/content/update")
async def update_file_data_content_by_id(
    request: Request, id: str, form_data: ContentForm, user=Depends(get_verified_user)
):
    file = Files.get_file_by_id(id)

    if file and (file.user_id == user.id or user.role == "admin"):
        try:
            process_file(
                request,
                ProcessFileForm(file_id=id, content=form_data.content),
                user=user,
            )
            file = Files.get_file_by_id(id=id)
        except Exception as e:
            log.exception(e)
            log.error(f"Error processing file: {file.id}")

        return {"content": file.data.get("content", "")}
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )


############################
# Get File Content By Id
############################


@router.get("/{id}/content")
async def get_file_content_by_id(id: str, user=Depends(get_verified_user)):
    file = Files.get_file_by_id(id)
    if file and (file.user_id == user.id or user.role == "admin"):
        try:
            file_path = Storage.get_file(file.path)
            file_path = Path(file_path)

            # Check if the file already exists in the cache
            if file_path.is_file():
                # Handle Unicode filenames
                filename = file.meta.get("name", file.filename)
                encoded_filename = quote(filename)  # RFC5987 encoding

                headers = {}
                if file.meta.get("content_type") not in [
                    "application/pdf",
                    "text/plain",
                ]:
                    headers = {
                        **headers,
                        "Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}",
                    }

                return FileResponse(file_path, headers=headers)

            else:
                raise HTTPException(
                    status_code=status.HTTP_404_NOT_FOUND,
                    detail=ERROR_MESSAGES.NOT_FOUND,
                )
        except Exception as e:
            log.exception(e)
            log.error("Error getting file content")
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("Error getting file content"),
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )


@router.get("/{id}/content/html")
async def get_html_file_content_by_id(id: str, user=Depends(get_verified_user)):
    file = Files.get_file_by_id(id)
    if file and (file.user_id == user.id or user.role == "admin"):
        try:
            file_path = Storage.get_file(file.path)
            file_path = Path(file_path)

            # Check if the file already exists in the cache
            if file_path.is_file():
                print(f"file_path: {file_path}")
                return FileResponse(file_path)
            else:
                raise HTTPException(
                    status_code=status.HTTP_404_NOT_FOUND,
                    detail=ERROR_MESSAGES.NOT_FOUND,
                )
        except Exception as e:
            log.exception(e)
            log.error("Error getting file content")
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("Error getting file content"),
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )


@router.get("/{id}/content/{file_name}")
async def get_file_content_by_id(id: str, user=Depends(get_verified_user)):
    file = Files.get_file_by_id(id)

    if file and (file.user_id == user.id or user.role == "admin"):
        file_path = file.path

        # Handle Unicode filenames
        filename = file.meta.get("name", file.filename)
        encoded_filename = quote(filename)  # RFC5987 encoding
        headers = {
            "Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}"
        }

        if file_path:
            file_path = Storage.get_file(file_path)
            file_path = Path(file_path)

            # Check if the file already exists in the cache
            if file_path.is_file():
                return FileResponse(file_path, headers=headers)
            else:
                raise HTTPException(
                    status_code=status.HTTP_404_NOT_FOUND,
                    detail=ERROR_MESSAGES.NOT_FOUND,
                )
        else:
            # File path doesn’t exist, return the content as .txt if possible
            file_content = file.content.get("content", "")
            file_name = file.filename

            # Create a generator that encodes the file content
            def generator():
                yield file_content.encode("utf-8")

            return StreamingResponse(
                generator(),
                media_type="text/plain",
                headers=headers,
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )


############################
# Delete File By Id
############################


@router.delete("/{id}")
async def delete_file_by_id(id: str, user=Depends(get_verified_user)):
    file = Files.get_file_by_id(id)
    if file and (file.user_id == user.id or user.role == "admin"):
        # We should add Chroma cleanup here

        result = Files.delete_file_by_id(id)
        if result:
            try:
                Storage.delete_file(file.path)
            except Exception as e:
                log.exception(e)
                log.error("Error deleting files")
                raise HTTPException(
                    status_code=status.HTTP_400_BAD_REQUEST,
                    detail=ERROR_MESSAGES.DEFAULT("Error deleting files"),
                )
            return {"message": "File deleted successfully"}
        else:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=ERROR_MESSAGES.DEFAULT("Error deleting file"),
            )
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

@router.post("/{id}/remove", response_model=dict)
def remove_file_from_knowledge_by_id(
    id: str,
    user=Depends(get_verified_user),
):

    if user.role != "admin":
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.ACCESS_PROHIBITED,
        )

    file = Files.get_file_by_id(id)
    if not file:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=ERROR_MESSAGES.NOT_FOUND,
        )

    # Remove content from the vector database
    VECTOR_DB_CLIENT.delete(
        collection_name=f"file-{file.id}", filter={"file_id": id}
    )

    # This must be okay, but I will not remove
    file_collection = f"file-{id}"
    if VECTOR_DB_CLIENT.has_collection(collection_name=file_collection):
        VECTOR_DB_CLIENT.delete_collection(collection_name=file_collection)

    # Delete file from database
    Files.delete_file_by_id(id)

    return {"message": "File deleted successfully"}
@Jean-Reinhold commented on GitHub (Jul 15, 2025): Hey Guys, I fixed this by modifying my file router to add an endpoint to delete files if they are not attached to a knowledge: ```python @router.post("/{id}/remove", response_model=dict) def remove_file_from_knowledge_by_id( id: str, user=Depends(get_verified_user), ): if user.role != "admin": raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) file = Files.get_file_by_id(id) if not file: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) # Remove content from the vector database VECTOR_DB_CLIENT.delete( collection_name=f"file-{file.id}", filter={"file_id": id} ) # This must be okay, but I will not remove file_collection = f"file-{id}" if VECTOR_DB_CLIENT.has_collection(collection_name=file_collection): VECTOR_DB_CLIENT.delete_collection(collection_name=file_collection) # Delete file from database Files.delete_file_by_id(id) return {"message": "File deleted successfully"} ``` Here is the full code ```python import logging import os import uuid from pathlib import Path from typing import Optional from urllib.parse import quote from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile, status from fastapi.responses import FileResponse, StreamingResponse from open_webui.constants import ERROR_MESSAGES from open_webui.env import SRC_LOG_LEVELS from open_webui.models.files import ( FileForm, FileModel, FileModelResponse, Files, ) from open_webui.retrieval.vector.connector import VECTOR_DB_CLIENT from open_webui.routers.retrieval import ProcessFileForm, process_file from open_webui.storage.provider import Storage from open_webui.utils.auth import get_admin_user, get_verified_user from pydantic import BaseModel log = logging.getLogger(__name__) log.setLevel(SRC_LOG_LEVELS["MODELS"]) router = APIRouter() ############################ # Upload File ############################ @router.post("/", response_model=FileModelResponse) def upload_file( request: Request, file: UploadFile = File(...), user=Depends(get_verified_user), file_metadata: dict = {}, ): log.info(f"file.content_type: {file.content_type}") try: unsanitized_filename = file.filename filename = os.path.basename(unsanitized_filename) # replace filename with uuid id = str(uuid.uuid4()) name = filename filename = f"{id}_{filename}" contents, file_path = Storage.upload_file(file.file, filename) file_item = Files.insert_new_file( user.id, FileForm( **{ "id": id, "filename": name, "path": file_path, "meta": { "name": name, "content_type": file.content_type, "size": len(contents), "data": file_metadata, }, } ), ) try: process_file(request, ProcessFileForm(file_id=id), user=user) file_item = Files.get_file_by_id(id=id) except Exception as e: log.exception(e) log.error(f"Error processing file: {file_item.id}") file_item = FileModelResponse( **{ **file_item.model_dump(), "error": str(e.detail) if hasattr(e, "detail") else str(e), } ) if file_item: return file_item else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error uploading file"), ) except Exception as e: log.exception(e) raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT(e), ) ############################ # List Files ############################ @router.get("/", response_model=list[FileModelResponse]) async def list_files(user=Depends(get_verified_user)): if user.role == "admin": files = Files.get_files() else: files = Files.get_files_by_user_id(user.id) return files ############################ # Delete All Files ############################ @router.delete("/all") async def delete_all_files(user=Depends(get_admin_user)): result = Files.delete_all_files() if result: try: Storage.delete_all_files() except Exception as e: log.exception(e) log.error("Error deleting files") raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error deleting files"), ) return {"message": "All files deleted successfully"} else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error deleting files"), ) ############################ # Get File By Id ############################ @router.get("/{id}", response_model=Optional[FileModel]) async def get_file_by_id(id: str, user=Depends(get_verified_user)): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): return file else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) ############################ # Get File Data Content By Id ############################ @router.get("/{id}/data/content") async def get_file_data_content_by_id(id: str, user=Depends(get_verified_user)): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): return {"content": file.data.get("content", "")} else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) ############################ # Update File Data Content By Id ############################ class ContentForm(BaseModel): content: str @router.post("/{id}/data/content/update") async def update_file_data_content_by_id( request: Request, id: str, form_data: ContentForm, user=Depends(get_verified_user) ): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): try: process_file( request, ProcessFileForm(file_id=id, content=form_data.content), user=user, ) file = Files.get_file_by_id(id=id) except Exception as e: log.exception(e) log.error(f"Error processing file: {file.id}") return {"content": file.data.get("content", "")} else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) ############################ # Get File Content By Id ############################ @router.get("/{id}/content") async def get_file_content_by_id(id: str, user=Depends(get_verified_user)): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): try: file_path = Storage.get_file(file.path) file_path = Path(file_path) # Check if the file already exists in the cache if file_path.is_file(): # Handle Unicode filenames filename = file.meta.get("name", file.filename) encoded_filename = quote(filename) # RFC5987 encoding headers = {} if file.meta.get("content_type") not in [ "application/pdf", "text/plain", ]: headers = { **headers, "Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}", } return FileResponse(file_path, headers=headers) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) except Exception as e: log.exception(e) log.error("Error getting file content") raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error getting file content"), ) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) @router.get("/{id}/content/html") async def get_html_file_content_by_id(id: str, user=Depends(get_verified_user)): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): try: file_path = Storage.get_file(file.path) file_path = Path(file_path) # Check if the file already exists in the cache if file_path.is_file(): print(f"file_path: {file_path}") return FileResponse(file_path) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) except Exception as e: log.exception(e) log.error("Error getting file content") raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error getting file content"), ) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) @router.get("/{id}/content/{file_name}") async def get_file_content_by_id(id: str, user=Depends(get_verified_user)): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): file_path = file.path # Handle Unicode filenames filename = file.meta.get("name", file.filename) encoded_filename = quote(filename) # RFC5987 encoding headers = { "Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}" } if file_path: file_path = Storage.get_file(file_path) file_path = Path(file_path) # Check if the file already exists in the cache if file_path.is_file(): return FileResponse(file_path, headers=headers) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) else: # File path doesn’t exist, return the content as .txt if possible file_content = file.content.get("content", "") file_name = file.filename # Create a generator that encodes the file content def generator(): yield file_content.encode("utf-8") return StreamingResponse( generator(), media_type="text/plain", headers=headers, ) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) ############################ # Delete File By Id ############################ @router.delete("/{id}") async def delete_file_by_id(id: str, user=Depends(get_verified_user)): file = Files.get_file_by_id(id) if file and (file.user_id == user.id or user.role == "admin"): # We should add Chroma cleanup here result = Files.delete_file_by_id(id) if result: try: Storage.delete_file(file.path) except Exception as e: log.exception(e) log.error("Error deleting files") raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error deleting files"), ) return {"message": "File deleted successfully"} else: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.DEFAULT("Error deleting file"), ) else: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=ERROR_MESSAGES.NOT_FOUND, ) @router.post("/{id}/remove", response_model=dict) def remove_file_from_knowledge_by_id( id: str, user=Depends(get_verified_user), ): if user.role != "admin": raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.ACCESS_PROHIBITED, ) file = Files.get_file_by_id(id) if not file: raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail=ERROR_MESSAGES.NOT_FOUND, ) # Remove content from the vector database VECTOR_DB_CLIENT.delete( collection_name=f"file-{file.id}", filter={"file_id": id} ) # This must be okay, but I will not remove file_collection = f"file-{id}" if VECTOR_DB_CLIENT.has_collection(collection_name=file_collection): VECTOR_DB_CLIENT.delete_collection(collection_name=file_collection) # Delete file from database Files.delete_file_by_id(id) return {"message": "File deleted successfully"} ```
Author
Owner

@bluelight773 commented on GitHub (Jul 27, 2025):

It seems to me that it's still that case (as of v0.6.18) that if you:

  1. Add a file to knowledge base via the UI (/workspace/knowledge/{id})
  2. Delete the file from open-webui rather than removing the file from the knowledge base via /api/v1/files/{id} DELETE endpoint.

Then when viewing the knowledge base in the UI (/workspace/knowledge/{id}), you'll no longer see that file listed for the knowledge base. However, if you try re-adding the same file via the UI, you'll get duplicate content error.

I was able to see the file ID listed when calling the /api/v1/knowledge/{id} endpoint, but I was not able to remove it from the knowledge base using any endpoint related to deletion/removal provided by Open WebUI.

The only ways to "recover" from the above situation (aside from from resetting/recreating the knowledge base) that I've found were:

  1. Reindexing knowledge files, but this can take hours with a sufficiently large knowledge base.
  2. Interact with the lower level API of the underlying database to find the relevant entries and delete them. However, when I did this, the /workspace/knowledge/{id} continued to show the problematic file ID even though it seems like the file itself is gone and I no longer got the duplicate content error.

Note that I had this issue while using Qdrant as the database backend, but suspect the same would apply with the default ChromaDB setup.

@bluelight773 commented on GitHub (Jul 27, 2025): It seems to me that it's still that case (as of v0.6.18) that if you: 1. Add a file to knowledge base via the UI (`/workspace/knowledge/{id}`) 2. Delete the file from open-webui rather than removing the file from the knowledge base via `/api/v1/files/{id}` `DELETE` endpoint. Then when viewing the knowledge base in the UI (`/workspace/knowledge/{id}`), you'll no longer see that file listed for the knowledge base. However, if you try re-adding the same file via the UI, you'll get `duplicate content` error. I was able to see the file ID listed when calling the `/api/v1/knowledge/{id}` endpoint, but I was not able to remove it from the knowledge base using any endpoint related to deletion/removal provided by Open WebUI. The only ways to "recover" from the above situation (aside from from resetting/recreating the knowledge base) that I've found were: 1. Reindexing knowledge files, but this can take hours with a sufficiently large knowledge base. 2. Interact with the lower level API of the underlying database to find the relevant entries and delete them. However, when I did this, the `/workspace/knowledge/{id}` continued to show the problematic file ID even though it seems like the file itself is gone and I no longer got the `duplicate content` error. Note that I had this issue while using Qdrant as the database backend, but suspect the same would apply with the default ChromaDB setup.
Author
Owner

@xylobol commented on GitHub (Aug 11, 2025):

I can confirm that this happens with the stock ChromaDB setup. My workflow is hitting DELETE /api/v1/files/{id}.

@xylobol commented on GitHub (Aug 11, 2025): I can confirm that this happens with the stock ChromaDB setup. My workflow is hitting DELETE `/api/v1/files/{id}`.
Author
Owner

@Classic298 commented on GitHub (Aug 22, 2025):

Hey guys.

I built something that may be interesting to you.
Testing wanted on this PR:

https://github.com/open-webui/open-webui/pull/16520

@Classic298 commented on GitHub (Aug 22, 2025): Hey guys. I built something that may be interesting to you. Testing wanted on this PR: https://github.com/open-webui/open-webui/pull/16520
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#2751