mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-09 23:35:09 -05:00
issue: File Hash Remains in DB after File Deletion #6348
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @rahepler2 on GitHub (Sep 9, 2025).
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.25
Ollama Version (if applicable)
No response
Operating System
Linux docker container
Browser (if applicable)
All
Confirmation
README.md.Expected Behavior
Upload files to knowledge collection. The file is able to be removed and a new file of the same name is able to be added back. This is a frequent behavior when working with revised documents.
Actual Behavior
When I upload documents to a knowledge collection the first time things work fine. Then if I remove a file or multiple files, the file data on the backend isn't removed. The frontend no longer picks the file up, but the file hash and name is retained in the DB.
Then when I go to upload the new version of the file, it rejects it based on duplicate values being found in the DB.
Steps to Reproduce
Logs & Screenshots
2025-09-09T13:50:15.3010935Z stdout F
2025-09-09T13:50:15.3010957Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1511, in process_file
2025-09-09T13:50:15.3010978Z stdout F result = save_docs_to_vector_db(
2025-09-09T13:50:15.3011002Z stdout F └ <function save_docs_to_vector_db at 0x70a5a3566200>
2025-09-09T13:50:15.3011023Z stdout F
2025-09-09T13:50:15.3019496Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1197, in save_docs_to_vector_db
2025-09-09T13:50:15.3019662Z stdout F raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
2025-09-09T13:50:15.3019680Z stdout F │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
2025-09-09T13:50:15.3020293Z stdout F └ <enum 'ERROR_MESSAGES'>
2025-09-09T13:50:15.3020353Z stdout F
2025-09-09T13:50:15.3020388Z stdout F ValueError: Duplicate content detected. Please provide unique content to proceed.
Additional Information
It does seem that a user has a PR request to fix orphaned records, but I have not experienced this previously when using pgvector on prem versus Azure.
@Classic298 commented on GitHub (Sep 9, 2025):
PLEASE actually search for related discussions and issues.
There are more than 20!
here a related PR, should solve this:
https://github.com/open-webui/open-webui/pull/16520
@rahepler2 commented on GitHub (Sep 9, 2025):
@Classic298 Thanks for the response!
I did search existing issues, and while I found related problems (duplication issues, orphaned data, and your PR), my issue appears to have additional factors that may be specific to Azure PostgreSQL.
Our scenario:
The key difference from existing issues:
fileanddocument_chunktablesfiletable but leaves chunks indocument_chunktableThis might be related to your PR, but I wanted to document this specific Azure-related behavior in case there are additional considerations needed for managed PostgreSQL services (permissions, cascade deletes, transaction handling, etc.).
Hopefully, your PR will go through soon!
@Classic298 commented on GitHub (Sep 10, 2025):
Hm I am not sure if Azure hosted PostgreSQL is officially supported
As per docs and environment variable configuration files, sqlite and postgreSQL are supported, which raises the question if Azure PostgreSQL is the same as self-hosted PostgreSQL
@rahepler2 commented on GitHub (Sep 10, 2025):
@Classic298 Yea, they have the ability with the flexible server in Azure Postgres to turn on the pgvector features, but under the hood I'm not sure how it works together well enough to know what's causing the issue. Will keep looking through the logs and see what I can find to report here. Thanks again
@tjbck commented on GitHub (Sep 11, 2025):
Unable to reproduce here, and this should have nothing to do with database and only with vector db. Keep us updated!