mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #23227] issue: Redundant file uploads to /uploads directory before duplicate check in Knowledge Base #58590
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aaquerrmon on GitHub (Mar 30, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23227
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.8.12
Ollama Version (if applicable)
No response
Operating System
Ubuntu 25.10
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The system should perform a deduplication check before permanently storing the file in the /app/backend/data/uploads/ directory.
Content Hashing: The server should calculate a hash (e.g., SHA-256) of the incoming file stream.
Pre-storage Validation: If a file with the same hash already exists in the database or the upload folder, the system should either:
Reuse the existing file path for the new Knowledge Base entry.
Reject the upload immediately without writing a new physical file to disk.
Storage Efficiency: The /uploads folder should not contain multiple byte-for-byte identical files, regardless of how many times the API is called or how the files are named.
Actual Behavior
When uploading a file to the Knowledge Base, the system currently performs two distinct actions: upload_file and add_to_knowledge.
The issue is that the upload_file process saves the file to the container's directory (/app/backend/data/uploads/) before the system checks if the file already exists in the Knowledge Base collection. If the file is a duplicate, the second call fails with a "duplicate file" error, but the physical file remains in the storage folder.
Steps to Reproduce
Logs & Screenshots
root@xxx:/app/backend# cd /app/backend/data/uploads/
root@xxx:/app/backend/data/uploads# ls -1 | wc -l
35
After upload a duplicated file.
root@xxx:/app/backend/data/uploads# ls -1 | wc -l
36
Additional Information
Ideally, the system should implement a content-based hash check (e.g., SHA-256) before writing the file to disk.
If a file with the same hash exists, the system should link the existing path instead of creating a new physical file.
Alternatively, provide an endpoint or parameter to verify if a file hash already exists in the global knowledge base before initiating the upload.