[PR #20127] [CLOSED] feat: Adding STORAGE_DELETE_LOCAL_AFTER_UPLOAD parameter for delete files from local if storage provider remote + improve performance for remote storage provider #25476

Closed
opened 2026-04-20 05:57:11 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/20127
Author: @ctolon
Created: 12/22/2025
Status: Closed

Base: devHead: feature/storage-remove-local-files-for-remote-providers


📝 Commits (1)

  • 0874e04 1. Improve storage provide performance with local file checking

📊 Changes

5 files changed (+71 additions, -17 deletions)

View changed files

📝 backend/open_webui/config.py (+2 -0)
📝 backend/open_webui/routers/files.py (+39 -9)
📝 backend/open_webui/routers/knowledge.py (+7 -2)
📝 backend/open_webui/routers/retrieval.py (+14 -0)
📝 backend/open_webui/storage/provider.py (+9 -6)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.

This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.

Before submitting, make sure you've checked the following:

  • Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch will lead to immediate closure of the PR.
  • Description: Provide a concise description of the changes made in this pull request down below.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
  • Agentic AI Code: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review AND manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

This PR introduces two main improvements to Open WebUI's storage system:

  1. Performance Optimization: Remote storage providers (S3, GCS, Azure) now check for local file existence before downloading, preventing redundant network requests and improving response times.
  2. Optional Local Cleanup Feature: New STORAGE_DELETE_LOCAL_AFTER_UPLOAD environment variable enables automatic cleanup of local file copies after processing when using remote storage providers. This feature helps manage disk space by removing temporary files while maintaining permanent storage in cloud buckets.

The cleanup implementation uses FastAPI's BackgroundTasks to ensure files are deleted only after:

  • Complete file processing (vector embeddings, OCR, etc.)
  • Full response streaming (file downloads)
  • All operations requiring local access are finished

This prevents race conditions and ensures data integrity while optimizing local disk usage.

Added

  • New Environment Variable: STORAGE_DELETE_LOCAL_AFTER_UPLOAD (boolean, default: false)
    • When enabled, automatically removes local file copies after processing
    • Only applies to remote storage providers (S3, GCS, Azure)
    • Uses FastAPI BackgroundTasks for safe asynchronous cleanup
  • Helper Functions:
    • remove_single_file() in files.py and retrieval.py for safe file deletion
  • BackgroundTasks Integration:
    • Added background_tasks parameter to process_file() function
    • Added background_tasks parameter to process_uploaded_file() function
    • Added background_tasks parameter to knowledge base endpoints:
      • add_file_to_knowledge_by_id()
      • update_file_from_knowledge_by_id()
      • reindex_knowledge_files()

Changed

  • Storage Provider Performance (storage/provider.py):
    • S3StorageProvider: Added local file existence check before downloading
    • GCSStorageProvider: Added local file existence check before downloading
    • AzureStorageProvider: Added local file existence check before downloading
    • Impact: Reduces redundant network requests and improves response times
  • File Processing Flow (retrieval.py):
    • Cleanup now happens after all file processing completes
    • Prevents race conditions where files would be deleted before processing finished
  • File Download Endpoints (files.py):
    • GET /{id}/content: Cleanup scheduled after FileResponse streaming
    • GET /{id}/content/html: Cleanup scheduled after FileResponse streaming
    • GET /{id}/content/{file_name}: Cleanup scheduled after FileResponse streaming
    • Impact: Files are deleted only after complete transmission to client

Deprecated

  • None

Removed

  • None

Fixed

  • None

Security

  • None

Breaking Changes

  • BREAKING CHANGE: None. This is a backward-compatible feature addition:
    • Default behavior unchanged (cleanup disabled by default)
    • Existing deployments continue working without any configuration changes
    • Opt-in via environment variable

Additional Information

Screenshots or Videos

Screencast_20251223_093329.webm

  • Will be added

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/20127 **Author:** [@ctolon](https://github.com/ctolon) **Created:** 12/22/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `feature/storage-remove-local-files-for-remote-providers` --- ### 📝 Commits (1) - [`0874e04`](https://github.com/open-webui/open-webui/commit/0874e04cdd6e6002bd312683a8e7ebfaa6f2e654) 1. Improve storage provide performance with local file checking ### 📊 Changes **5 files changed** (+71 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+2 -0) 📝 `backend/open_webui/routers/files.py` (+39 -9) 📝 `backend/open_webui/routers/knowledge.py` (+7 -2) 📝 `backend/open_webui/routers/retrieval.py` (+14 -0) 📝 `backend/open_webui/storage/provider.py` (+9 -6) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request. This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Verify that the pull request targets the `dev` branch. **Not targeting the `dev` branch will lead to immediate closure of the PR.** - [x] **Description:** Provide a concise description of the changes made in this pull request down below. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** If necessary, update relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs) like environment variables, the tutorials, or other documentation sources. - [ ] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [X] **Testing:** Perform manual tests to **verify the implemented fix/feature works as intended AND does not break any other functionality**. Take this as an opportunity to **make screenshots of the feature/fix and include it in the PR description**. - [x] **Agentic AI Code:** Confirm this Pull Request is **not written by any AI Agent** or has at least **gone through additional human review AND manual testing**. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR. - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Title Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description This PR introduces two main improvements to Open WebUI's storage system: 1. Performance Optimization: Remote storage providers (S3, GCS, Azure) now check for local file existence before downloading, preventing redundant network requests and improving response times. 2. Optional Local Cleanup Feature: New STORAGE_DELETE_LOCAL_AFTER_UPLOAD environment variable enables automatic cleanup of local file copies after processing when using remote storage providers. This feature helps manage disk space by removing temporary files while maintaining permanent storage in cloud buckets. The cleanup implementation uses FastAPI's BackgroundTasks to ensure files are deleted only after: - Complete file processing (vector embeddings, OCR, etc.) - Full response streaming (file downloads) - All operations requiring local access are finished This prevents race conditions and ensures data integrity while optimizing local disk usage. ### Added - New Environment Variable: STORAGE_DELETE_LOCAL_AFTER_UPLOAD (boolean, default: false) - When enabled, automatically removes local file copies after processing - Only applies to remote storage providers (S3, GCS, Azure) - Uses FastAPI BackgroundTasks for safe asynchronous cleanup - Helper Functions: - remove_single_file() in files.py and retrieval.py for safe file deletion - BackgroundTasks Integration: - Added background_tasks parameter to process_file() function - Added background_tasks parameter to process_uploaded_file() function - Added background_tasks parameter to knowledge base endpoints: - add_file_to_knowledge_by_id() - update_file_from_knowledge_by_id() - reindex_knowledge_files() ### Changed - Storage Provider Performance (storage/provider.py): - S3StorageProvider: Added local file existence check before downloading - GCSStorageProvider: Added local file existence check before downloading - AzureStorageProvider: Added local file existence check before downloading - Impact: Reduces redundant network requests and improves response times - File Processing Flow (retrieval.py): - Cleanup now happens after all file processing completes - Prevents race conditions where files would be deleted before processing finished - File Download Endpoints (files.py): - GET /{id}/content: Cleanup scheduled after FileResponse streaming - GET /{id}/content/html: Cleanup scheduled after FileResponse streaming - GET /{id}/content/{file_name}: Cleanup scheduled after FileResponse streaming - Impact: Files are deleted only after complete transmission to client ### Deprecated - None ### Removed - None ### Fixed - None ### Security - None ### Breaking Changes - **BREAKING CHANGE**: None. This is a backward-compatible feature addition: - Default behavior unchanged (cleanup disabled by default) - Existing deployments continue working without any configuration changes - Opt-in via environment variable --- ### Additional Information - https://github.com/open-webui/open-webui/discussions/9487 - https://github.com/open-webui/open-webui/issues/15260 - https://github.com/open-webui/open-webui/discussions/15286 ### Screenshots or Videos [Screencast_20251223_093329.webm](https://github.com/user-attachments/assets/5783d14f-feac-47f7-91d0-80b71e6c18e9) - Will be added ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 05:57:11 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#25476