[GH-ISSUE #23227] issue: Redundant file uploads to /uploads directory before duplicate check in Knowledge Base #58590

Closed
opened 2026-05-05 23:29:43 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @aaquerrmon on GitHub (Mar 30, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23227

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.12

Ollama Version (if applicable)

No response

Operating System

Ubuntu 25.10

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The system should perform a deduplication check before permanently storing the file in the /app/backend/data/uploads/ directory.

Content Hashing: The server should calculate a hash (e.g., SHA-256) of the incoming file stream.

Pre-storage Validation: If a file with the same hash already exists in the database or the upload folder, the system should either:

Reuse the existing file path for the new Knowledge Base entry.

Reject the upload immediately without writing a new physical file to disk.

Storage Efficiency: The /uploads folder should not contain multiple byte-for-byte identical files, regardless of how many times the API is called or how the files are named.

Actual Behavior

When uploading a file to the Knowledge Base, the system currently performs two distinct actions: upload_file and add_to_knowledge.

The issue is that the upload_file process saves the file to the container's directory (/app/backend/data/uploads/) before the system checks if the file already exists in the Knowledge Base collection. If the file is a duplicate, the second call fails with a "duplicate file" error, but the physical file remains in the storage folder.

Steps to Reproduce

  1. Access the Open WebUI container or server and monitor the storage directory: ls -1 /app/backend/data/uploads/ | wc -l.
  2. Upload a file (e.g., manual.pdf) to a Knowledge Base collection via the UI or API.
  3. Verify the file was created in the /uploads/ folder.
  4. Upload the exact same file again to the same or a different collection.
  5. Observe the results:
  6. API/UI Level: The second call (add_to_knowledge) returns a "file already exists" or "duplicate" error.
  7. Filesystem Level: Check the storage directory again. You will notice a new file has been created (e.g., manual_1.pdf or a UUID-named file), even though the process "failed" at the Knowledge Base integration step.
  8. Repeat several times and notice how the storage usage grows linearly with redundant data.

Logs & Screenshots

root@xxx:/app/backend# cd /app/backend/data/uploads/
root@xxx:/app/backend/data/uploads# ls -1 | wc -l
35

After upload a duplicated file.
root@xxx:/app/backend/data/uploads# ls -1 | wc -l
36

Additional Information

Ideally, the system should implement a content-based hash check (e.g., SHA-256) before writing the file to disk.

If a file with the same hash exists, the system should link the existing path instead of creating a new physical file.

Alternatively, provide an endpoint or parameter to verify if a file hash already exists in the global knowledge base before initiating the upload.

Originally created by @aaquerrmon on GitHub (Mar 30, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23227 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.12 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 25.10 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The system should perform a deduplication check before permanently storing the file in the /app/backend/data/uploads/ directory. Content Hashing: The server should calculate a hash (e.g., SHA-256) of the incoming file stream. Pre-storage Validation: If a file with the same hash already exists in the database or the upload folder, the system should either: Reuse the existing file path for the new Knowledge Base entry. Reject the upload immediately without writing a new physical file to disk. Storage Efficiency: The /uploads folder should not contain multiple byte-for-byte identical files, regardless of how many times the API is called or how the files are named. ### Actual Behavior When uploading a file to the Knowledge Base, the system currently performs two distinct actions: upload_file and add_to_knowledge. The issue is that the upload_file process saves the file to the container's directory (/app/backend/data/uploads/) before the system checks if the file already exists in the Knowledge Base collection. If the file is a duplicate, the second call fails with a "duplicate file" error, but the physical file remains in the storage folder. ### Steps to Reproduce 1. Access the Open WebUI container or server and monitor the storage directory: ls -1 /app/backend/data/uploads/ | wc -l. 2. Upload a file (e.g., manual.pdf) to a Knowledge Base collection via the UI or API. 3. Verify the file was created in the /uploads/ folder. 4. Upload the exact same file again to the same or a different collection. 5. Observe the results: 6. API/UI Level: The second call (add_to_knowledge) returns a "file already exists" or "duplicate" error. 7. Filesystem Level: Check the storage directory again. You will notice a new file has been created (e.g., manual_1.pdf or a UUID-named file), even though the process "failed" at the Knowledge Base integration step. 8. Repeat several times and notice how the storage usage grows linearly with redundant data. ### Logs & Screenshots root@xxx:/app/backend# cd /app/backend/data/uploads/ root@xxx:/app/backend/data/uploads# ls -1 | wc -l 35 **After upload a duplicated file.** root@xxx:/app/backend/data/uploads# ls -1 | wc -l 36 ### Additional Information Ideally, the system should implement a content-based hash check (e.g., SHA-256) before writing the file to disk. If a file with the same hash exists, the system should link the existing path instead of creating a new physical file. Alternatively, provide an endpoint or parameter to verify if a file hash already exists in the global knowledge base before initiating the upload.
GiteaMirror added the bug label 2026-05-05 23:29:43 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58590