mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-10 15:54:15 -05:00
File deletion doesn't properly clean up database entries, causing issues with re-uploads #2751
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @eml-henn on GitHub (Nov 21, 2024).
Bug Report
Installation Method
Kubernetes on Azure Kubernetes Service
Environment
**Open WebUI Version: 0.4.0
**Operating System: AKSUbuntu-2204
**Browser: Firefox 132.0.2
Confirmation:
Expected Behavior:
Actual Behavior:
Description
Bug Summary:
When deleting a processed file from a knowledge base through the frontend, the file appears to be removed from the UI but its content sometimes remains in the vector database. This creates issues when trying to re-upload the same file, as the system detects it as duplicate content.
Reproduction Details
This is frustratingly a "sometimes" error. So my proposed solution would be to add logging to make it easier to reproduce.
Steps to Reproduce:
Logs and Screenshots
Docker Container Logs:
// When removing a file:
INFO: 10.1.1.4:0 - "POST /api/v1/knowledge/{id}/file/remove HTTP/1.1" 200 OK
// When trying to re-upload (error due to remaining content):
INFO: 10.1.1.4:0 - "GET /api/v1/knowledge/{id} HTTP/1.1" 200 OK
INFO [open_webui.apps.webui.routers.files] file.content_type: text/plain
INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {file} {file_collection_id}
INFO [open_webui.apps.retrieval.main] adding to collection {file_collection_id}
Collection {file_collection_id} does not exist.
INFO: 10.1.1.4:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {file} {id}
INFO [open_webui.apps.retrieval.main] Document with hash [file hash} already exists
ERROR [open_webui.apps.retrieval.main] Duplicate content detected. Please provide unique content to proceed.
Traceback (most recent call last):
File "/app/backend/open_webui/apps/retrieval/main.py", line 1001, in process_file
raise e
File "/app/backend/open_webui/apps/retrieval/main.py", line 975, in process_file
result = save_docs_to_vector_db(
^^^^^^^^^^^^^^^^^^^^^^^
File "/app/backend/open_webui/apps/retrieval/main.py", line 759, in save_docs_to_vector_db
raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
ValueError: Duplicate content detected. Please provide unique content to proceed.
INFO: 10.1.1.4:0 - "POST /api/v1/knowledge/{id}/file/add HTTP/1.1" 400 Bad Request
// Compare with Logs when adding a file:
INFO: 10.1.1.222:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK
INFO [open_webui.apps.webui.routers.files] file.content_type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
INFO [open_webui.apps.retrieval.main] save_docs_to_vector_db: document {filename} {collection_id}
INFO [open_webui.apps.retrieval.main] collection {collection_id} already exists
INFO [open_webui.apps.retrieval.main] adding to collection {collection_id}
Additional Information
The issue appears to be in the file removal endpoint (@router.post("/{id}/file/remove") in knowledge.py. Currently:
Proposed Solution
@tjbck commented on GitHub (Nov 22, 2024):
Would love to investigate more but we'll need a more reliable way to reproduce the issue, definitely continue our troubleshooting journey and keep us posted!
@sreinwald commented on GitHub (Dec 2, 2024):
I just ran across this issue as well and I can reproduce it 100% using the API on v0.4.7, deployed via docker compose.
Steps to reproduce:
In my specific example:
The issue with re-uploads very much seems to be related to the issue above, and I can reproduce it 100% with these steps using the API directly:
400: Duplicate content detected@Constey commented on GitHub (Dec 8, 2024):
From my thought i can just call the api with the same file again to do a re-upload.
it seems the upload of a file works, but adding the file to the knowledge brings the 400 bad request issue.
running on: (v0.4.8
(latest)
Steps to Reproduce:
Upload a file
add the file to the knowledgebase
Upload the file again
Try adding it to the knowledgebase again
Uploaded successfully with file_id: 967dd000-429c-46bb-9931-f352364dd746
Adding file 967dd000-429c-46bb-9931-f352364dd746 to knowledge def088e5-a452-4dd9-b67d-4de942f3785b...
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xxx/api/v1/knowledge/def088e5-a452-4dd9-b67d-4de942f3785b/file/add
My test script for upload:
def add_file_to_knowledge(token, knowledge_id, file_id, base_url):
url = f'{base_url}/api/v1/knowledge/{knowledge_id}/file/add'
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'
}
data = {'file_id': file_id}
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
return response.json()
@Constey commented on GitHub (Dec 8, 2024):
I think the issue is somewhere here located:
29a2719595/backend/open_webui/apps/webui/routers/knowledge.py (L244)@AlgorithmicKing737 commented on GitHub (Dec 31, 2024):
any solution yet?
@Classic298 commented on GitHub (Jan 13, 2025):
I want to add to this issue, that files uploaded in normal chats are not deleted from the vector database either. Even if you delete the chat, the vector database does not shrink. It stays the same size. And in fact, it only grows.
Even if you press "reset vector-storage" in the admin panel under documents, nothing gets deleted from the database.
So 1) nothing gets deleted even if the chat where the file was uploaded is deleted and 2) the reset vectorstorage button also doesn't do anything.
I am on version 0.5.4 but this was always the case for me on previous versions as well. I am on pip installation if that matters and this issue has been discussed here as well: https://github.com/open-webui/open-webui/discussions/5558
@juananpe commented on GitHub (Jan 13, 2025):
@Classic298 Oh, I see. My PR https://github.com/open-webui/open-webui/pull/8499 fixes the situation when you remove a file that has been added via a
Knowledge Base, but it doesn't fix the problem when the file is added directly from theUpload Documentsoption in the chat. I'll have a look at it tomorrow.@Classic298 commented on GitHub (Jan 22, 2025):
Was the issue with files not being deleted from the db even after deleting the chat fixed?
@tjbck commented on GitHub (Jan 22, 2025):
Everything uploaded to Open WebUI is being kept for audit/logging purposes which is a security requirement for many organisations. You should utilise external scripts to clean the upload directory for now!
@Classic298 commented on GitHub (Jan 23, 2025):
Then why was the deletion of files, when deleting them from the knowledge base, even implemented and accepted by you? If files should not get deleted.
And even the implementation of file deletion from chats was accepted and merged by you - it is 90% implemented. Only the actual deletion logic for the chroma db is missing.
Maybe with an environment variable or admin setting (either is fine), it would be cool to be able to set this.
An ever growing chroma database and uploads folder will grow to be a problem relatively quickly, no?
@Classic298 commented on GitHub (Jan 25, 2025):
This issue was not fixed yet as there is literally a placeholder for the missing code, just saying. Writing in the commit notes that 7181 is fixed is weird
@Classic298 commented on GitHub (Jan 29, 2025):
Bump; - issue is not fixed and current implementation goes against ethos that Tim described.
@tjbck commented on GitHub (Jan 29, 2025):
Reverted #8499
4abede9a2b@Jeevanhm commented on GitHub (Feb 6, 2025):
can you share the final script used to upload and add files to the knowledge base please.
@Jeevanhm commented on GitHub (Feb 7, 2025):
Getting this error while adding files to Knowledge Collections.. any idea? Uploading and deleting the files works.
C:\Windows\system32>curl -X POST http://192.xx.xx.xx:/api/v1/knowledge/fb24ac30-d611-4988-90dc-b29fe10d118a/file/add -H "Authorization: Bearer sk-f821a6733a024915932dc30ed44b2d4a" -H "Content-Type: application/json" -d '{"file_id": "430e0a59-5fea-4d4e-87c1-8f2ad38c3dda"}'
{"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}curl: (3) unmatched close brace/bracket in URL position 37:
430e0a59-5fea-4d4e-87c1-8f2ad38c3dda}'
^
@Constey commented on GitHub (Feb 7, 2025):
like this, we put the files in separate kb's so its a bit more complex, but should show how it works:
@Jeevanhm commented on GitHub (Feb 7, 2025):
thank you it works like a charm!
@Jeevanhm commented on GitHub (Feb 9, 2025):
@Constey how do you manage the files on the server?
I'm unable to delete specific files using file id but when I try with "all" the files are deleted on the server.
curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/2bc4340d-4b70-477d-b621-714b854c9817' -H 'accept: application/json'
curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/all' -H 'accept: application/json'
@Constey commented on GitHub (Feb 9, 2025):
My Initial Plan was to just overwrite the files, but since this did not worked Ive currently just created new kbs an relinked them to the Model (and deleted the old kbs manually). I have to test how the current behaviour is (I guess it's not fixed) but if you find a way to delete the old ones, that would be nice. Am 09.02.2025 05:29 schrieb Jay @.***>:
@Constey how do you manage the files on the server?
I'm unable to delete specific files using file id but when I try with "all" the files are deleted on the server.
curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/2bc4340d-4b70-477d-b621-714b854c9817' -H 'accept: application/json'
curl -X 'DELETE' 'http://192.xx.xx.xx:8080/api/v1/files/all' -H 'accept: application/json'
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@ozp commented on GitHub (Feb 10, 2025):
Hello,
I uploaded the OpenWebUI docs to the RAG system following the documentation instructions. This allows me to ask the chat questions about OpenWebUI.
However, I noticed poor performance with the default RAG settings. So, I created a new configuration that requires deleting all previously uploaded files and re-uploading them.
This is when I encountered the duplication issue.
What I’ve Tried So Far:
Request:
Could you provide a step-by-step guide on how to completely remove all previously indexed files from the knowledge base? I’d really appreciate it.
@gilbrotheraway commented on GitHub (Mar 28, 2025):
in 24h with less than 5 knowledge bases my vector-db folder has:
Total disk usage: 7.7 GiB Apparent size: 7.6 GiB Items: 9881
and it's not even user error it's because uploads fail when uploading many files(github .md docs)
@Constey commented on GitHub (Apr 1, 2025):
I think this issue still exists.
If i have a knowledgebase blowing up my vector db to 10gb and i remove the whole knowledgebase, my space (vector db) will not be freed up.
/var/lib/docker/volumes/open-webui/_data/vector_db
@Classic298 commented on GitHub (Apr 1, 2025):
yes, according to tim this is intentional :/
@Jean-Reinhold commented on GitHub (Jul 15, 2025):
Hey Guys, I fixed this by modifying my file router to add an endpoint to delete files if they are not attached to a knowledge:
Here is the full code
@bluelight773 commented on GitHub (Jul 27, 2025):
It seems to me that it's still that case (as of v0.6.18) that if you:
/workspace/knowledge/{id})/api/v1/files/{id}DELETEendpoint.Then when viewing the knowledge base in the UI (
/workspace/knowledge/{id}), you'll no longer see that file listed for the knowledge base. However, if you try re-adding the same file via the UI, you'll getduplicate contenterror.I was able to see the file ID listed when calling the
/api/v1/knowledge/{id}endpoint, but I was not able to remove it from the knowledge base using any endpoint related to deletion/removal provided by Open WebUI.The only ways to "recover" from the above situation (aside from from resetting/recreating the knowledge base) that I've found were:
/workspace/knowledge/{id}continued to show the problematic file ID even though it seems like the file itself is gone and I no longer got theduplicate contenterror.Note that I had this issue while using Qdrant as the database backend, but suspect the same would apply with the default ChromaDB setup.
@xylobol commented on GitHub (Aug 11, 2025):
I can confirm that this happens with the stock ChromaDB setup. My workflow is hitting DELETE
/api/v1/files/{id}.@Classic298 commented on GitHub (Aug 22, 2025):
Hey guys.
I built something that may be interesting to you.
Testing wanted on this PR:
https://github.com/open-webui/open-webui/pull/16520