issue: File Hash Remains in DB after File Deletion #6348

Closed
opened 2025-11-11 16:52:14 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @rahepler2 on GitHub (Sep 9, 2025).

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.25

Ollama Version (if applicable)

No response

Operating System

Linux docker container

Browser (if applicable)

All

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Upload files to knowledge collection. The file is able to be removed and a new file of the same name is able to be added back. This is a frequent behavior when working with revised documents.

Actual Behavior

When I upload documents to a knowledge collection the first time things work fine. Then if I remove a file or multiple files, the file data on the backend isn't removed. The frontend no longer picks the file up, but the file hash and name is retained in the DB.

Then when I go to upload the new version of the file, it rejects it based on duplicate values being found in the DB.

Steps to Reproduce

  1. Create a docker container with the latest release of openwebui.
  2. Connect the Azure Database for Postgres SQL
  3. Create a knowledge collection and add documents to it
  4. Delete the documents, or a single document
  5. Reupload a document with the same name as the previous removed document
  6. You receive a 400 error saying that the document is a duplicate

Logs & Screenshots

2025-09-09T13:50:15.3010935Z stdout F
2025-09-09T13:50:15.3010957Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1511, in process_file
2025-09-09T13:50:15.3010978Z stdout F result = save_docs_to_vector_db(
2025-09-09T13:50:15.3011002Z stdout F └ <function save_docs_to_vector_db at 0x70a5a3566200>
2025-09-09T13:50:15.3011023Z stdout F
2025-09-09T13:50:15.3019496Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1197, in save_docs_to_vector_db
2025-09-09T13:50:15.3019662Z stdout F raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT)
2025-09-09T13:50:15.3019680Z stdout F │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'>
2025-09-09T13:50:15.3020293Z stdout F └ <enum 'ERROR_MESSAGES'>
2025-09-09T13:50:15.3020353Z stdout F
2025-09-09T13:50:15.3020388Z stdout F ValueError: Duplicate content detected. Please provide unique content to proceed.

Additional Information

It does seem that a user has a PR request to fix orphaned records, but I have not experienced this previously when using pgvector on prem versus Azure.

Originally created by @rahepler2 on GitHub (Sep 9, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.25 ### Ollama Version (if applicable) _No response_ ### Operating System Linux docker container ### Browser (if applicable) All ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Upload files to knowledge collection. The file is able to be removed and a new file of the same name is able to be added back. This is a frequent behavior when working with revised documents. ### Actual Behavior When I upload documents to a knowledge collection the first time things work fine. Then if I remove a file or multiple files, the file data on the backend isn't removed. The frontend no longer picks the file up, but the file hash and name is retained in the DB. Then when I go to upload the new version of the file, it rejects it based on duplicate values being found in the DB. ### Steps to Reproduce 1. Create a docker container with the latest release of openwebui. 2. Connect the Azure Database for Postgres SQL 3. Create a knowledge collection and add documents to it 4. Delete the documents, or a single document 5. Reupload a document with the same name as the previous removed document 6. You receive a 400 error saying that the document is a duplicate ### Logs & Screenshots 2025-09-09T13:50:15.3010935Z stdout F 2025-09-09T13:50:15.3010957Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1511, in process_file 2025-09-09T13:50:15.3010978Z stdout F result = save_docs_to_vector_db( 2025-09-09T13:50:15.3011002Z stdout F └ <function save_docs_to_vector_db at 0x70a5a3566200> 2025-09-09T13:50:15.3011023Z stdout F 2025-09-09T13:50:15.3019496Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1197, in save_docs_to_vector_db 2025-09-09T13:50:15.3019662Z stdout F raise ValueError(ERROR_MESSAGES.DUPLICATE_CONTENT) 2025-09-09T13:50:15.3019680Z stdout F │ └ <ERROR_MESSAGES.DUPLICATE_CONTENT: 'Duplicate content detected. Please provide unique content to proceed.'> 2025-09-09T13:50:15.3020293Z stdout F └ <enum 'ERROR_MESSAGES'> 2025-09-09T13:50:15.3020353Z stdout F 2025-09-09T13:50:15.3020388Z stdout F ValueError: Duplicate content detected. Please provide unique content to proceed. ### Additional Information It does seem that a user has a PR request to fix orphaned records, but I have not experienced this previously when using pgvector on prem versus Azure.
GiteaMirror added the bug label 2025-11-11 16:52:14 -06:00
Author
Owner

@Classic298 commented on GitHub (Sep 9, 2025):

PLEASE actually search for related discussions and issues.
There are more than 20!

here a related PR, should solve this:

https://github.com/open-webui/open-webui/pull/16520

@Classic298 commented on GitHub (Sep 9, 2025): PLEASE actually search for related discussions and issues. There are more than 20! here a related PR, should solve this: https://github.com/open-webui/open-webui/pull/16520
Author
Owner

@rahepler2 commented on GitHub (Sep 9, 2025):

@Classic298 Thanks for the response!

I did search existing issues, and while I found related problems (duplication issues, orphaned data, and your PR), my issue appears to have additional factors that may be specific to Azure PostgreSQL.

Our scenario:

  • Before: On-premise pgvector DB - all CRUD operations worked correctly (delete file → re-upload new version worked fine)
  • After: Azure Database for PostgreSQL - deletes appear successful in UI but don't actually remove data from backend

The key difference from existing issues:

  1. The UI shows the file is deleted from the collection, but data persists in both file and document_chunk tables
  2. It's not just orphaned data - the duplicate detection actively prevents re-uploading because the hash still exists
  3. The API endpoint for deleting files removes records from file table but leaves chunks in document_chunk table

This might be related to your PR, but I wanted to document this specific Azure-related behavior in case there are additional considerations needed for managed PostgreSQL services (permissions, cascade deletes, transaction handling, etc.).

Hopefully, your PR will go through soon!

@rahepler2 commented on GitHub (Sep 9, 2025): @Classic298 Thanks for the response! I did search existing issues, and while I found related problems (duplication issues, orphaned data, and your PR), my issue appears to have additional factors that may be specific to Azure PostgreSQL. Our scenario: - **Before**: On-premise pgvector DB - all CRUD operations worked correctly (delete file → re-upload new version worked fine) - **After**: Azure Database for PostgreSQL - deletes appear successful in UI but don't actually remove data from backend The key difference from existing issues: 1. The UI shows the file is deleted from the collection, but data persists in both `file` and `document_chunk` tables 2. It's not just orphaned data - the duplicate detection actively prevents re-uploading because the hash still exists 3. The API endpoint for deleting files removes records from `file` table but leaves chunks in `document_chunk` table This might be related to your PR, but I wanted to document this specific Azure-related behavior in case there are additional considerations needed for managed PostgreSQL services (permissions, cascade deletes, transaction handling, etc.). Hopefully, your PR will go through soon!
Author
Owner

@Classic298 commented on GitHub (Sep 10, 2025):

Hm I am not sure if Azure hosted PostgreSQL is officially supported

As per docs and environment variable configuration files, sqlite and postgreSQL are supported, which raises the question if Azure PostgreSQL is the same as self-hosted PostgreSQL

@Classic298 commented on GitHub (Sep 10, 2025): Hm I am not sure if Azure hosted PostgreSQL is officially supported As per docs and environment variable configuration files, sqlite and postgreSQL are supported, which raises the question if Azure PostgreSQL is the same as self-hosted PostgreSQL
Author
Owner

@rahepler2 commented on GitHub (Sep 10, 2025):

@Classic298 Yea, they have the ability with the flexible server in Azure Postgres to turn on the pgvector features, but under the hood I'm not sure how it works together well enough to know what's causing the issue. Will keep looking through the logs and see what I can find to report here. Thanks again

@rahepler2 commented on GitHub (Sep 10, 2025): @Classic298 Yea, they have the ability with the flexible server in Azure Postgres to turn on the pgvector features, but under the hood I'm not sure how it works together well enough to know what's causing the issue. Will keep looking through the logs and see what I can find to report here. Thanks again
Author
Owner

@tjbck commented on GitHub (Sep 11, 2025):

Unable to reproduce here, and this should have nothing to do with database and only with vector db. Keep us updated!

@tjbck commented on GitHub (Sep 11, 2025): Unable to reproduce here, and this should have nothing to do with database and only with vector db. Keep us updated!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6348