[PR #6773] [CLOSED] fix: S3 support for file upload and organise files under user #8750

Closed
opened 2025-11-11 18:05:01 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/6773
Author: @weixu365
Created: 11/7/2024
Status: Closed

Base: devHead: main


📝 Commits (10+)

  • 2867cfc Clean up local file storeage when using S3 for upload
  • b3b1cae Save original file info in vector db metadata
  • 63e996f Split storage provider into multiple files
  • 15e4ef0 Use async method from aioboto3
  • c9a904f Get file using async content stream
  • 98ca452 Change asyncboto3 to boto3
  • 57794ff Clean up configs
  • 5a155c2 Delete files in uploads folder
  • d5af495 Clean up files
  • 55a55c2 Ensure folder exists before upload

📊 Changes

7 files changed (+238 additions, -223 deletions)

View changed files

📝 backend/open_webui/apps/retrieval/main.py (+21 -9)
📝 backend/open_webui/apps/webui/routers/files.py (+25 -45)
📝 backend/open_webui/config.py (+4 -2)
backend/open_webui/storage/base_storage_provider.py (+23 -0)
backend/open_webui/storage/local_storage_provider.py (+61 -0)
📝 backend/open_webui/storage/provider.py (+16 -167)
backend/open_webui/storage/s3_storage_provider.py (+88 -0)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests for validating the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
    • BREAKING CHANGE: Significant changes that may affect compatibility
    • build: Changes that affect the build system or external dependencies
    • ci: Changes to our continuous integration processes or workflows
    • chore: Refactor, cleanup, or other non-functional code changes
    • docs: Documentation update or addition
    • feat: Introduces a new feature or enhancement to the codebase
    • fix: Bug fix or error correction
    • i18n: Internationalization or localization changes
    • perf: Performance improvement
    • refactor: Code restructuring for better maintainability, readability, or scalability
    • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)
    • test: Adding missing tests or correcting existing tests
    • WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

This PR fixes issue #5763 when using S3 for file upload and organises files under the user to avoid having too many files in the same folder.

Added

  • Added new env USER_DATA_DIR and USER_UPLOAD_FOLDER to save user uploaded files
    Not touch in this PR, but eventually, both image caches and audio transcription should use this storage, and downloaded models use DATA_DIR/cache/xxx

    The USER_DATA_DIR can be a S3 bucket + prefix, so that users can control the uploaded file structure, e.g.

    s3://some-bucket/openwebui    <- USER_DATA_DIR, default to DATA_DIR
      - uploads                   <- USER_UPLOAD_FOLDER, default to 'uploads'
        \- user_id                <- Organise files under the current user id to avoid too many files in the same folder
          \- uploaded_files1 ...
          \- uploaded_files2 ...
      - caches                    <- not touched in this PR
         \- image
         \- audio
    

Changed

  • Add file info to the metadata in vector db when using S3
  • Organise files under the current user id to avoid too many files in the same folder
  • Return StreamResponse directly using S3 response, to avoid copy to a local file
  • Use a temporary file during file upload, and delete the local file after parsing into docs
  • Split Storage providers into multiple files: base, local, and s3

Deprecated

Removed

Fixed

Security

Breaking Changes

  • BREAKING CHANGE: Removed env: STORAGE_PROVIDER and S3_BUCKET_NAME

Additional Information

Screenshots or Videos


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/6773 **Author:** [@weixu365](https://github.com/weixu365) **Created:** 11/7/2024 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `main` --- ### 📝 Commits (10+) - [`2867cfc`](https://github.com/open-webui/open-webui/commit/2867cfcd295e53f06713dd1aa136bd1f413155fb) Clean up local file storeage when using S3 for upload - [`b3b1cae`](https://github.com/open-webui/open-webui/commit/b3b1caed9b7096a61f30db2394764f7584fb366f) Save original file info in vector db metadata - [`63e996f`](https://github.com/open-webui/open-webui/commit/63e996fb9db7389e4777c0b94837120b66ecf7f4) Split storage provider into multiple files - [`15e4ef0`](https://github.com/open-webui/open-webui/commit/15e4ef01a8abdbd385803bdd419b30424029eaa3) Use async method from aioboto3 - [`c9a904f`](https://github.com/open-webui/open-webui/commit/c9a904f7afaae408edd071e6a9862121d4826f82) Get file using async content stream - [`98ca452`](https://github.com/open-webui/open-webui/commit/98ca4520e5e04d147a91a67cc855b42585b9a80b) Change asyncboto3 to boto3 - [`57794ff`](https://github.com/open-webui/open-webui/commit/57794ff6f805f16c88b82ce4bc4be0131b292d05) Clean up configs - [`5a155c2`](https://github.com/open-webui/open-webui/commit/5a155c2849c156dc5b03f890cc065548772843d0) Delete files in uploads folder - [`d5af495`](https://github.com/open-webui/open-webui/commit/d5af495d7b33913fe291b850c85e57a643824344) Clean up files - [`55a55c2`](https://github.com/open-webui/open-webui/commit/55a55c2c12aa35a0493fbeb5e7dea1249990ba9f) Ensure folder exists before upload ### 📊 Changes **7 files changed** (+238 additions, -223 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/apps/retrieval/main.py` (+21 -9) 📝 `backend/open_webui/apps/webui/routers/files.py` (+25 -45) 📝 `backend/open_webui/config.py` (+4 -2) ➕ `backend/open_webui/storage/base_storage_provider.py` (+23 -0) ➕ `backend/open_webui/storage/local_storage_provider.py` (+61 -0) 📝 `backend/open_webui/storage/provider.py` (+16 -167) ➕ `backend/open_webui/storage/s3_storage_provider.py` (+88 -0) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [ ] **Testing:** Have you written and run sufficient tests for validating the changes? - [ ] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To cleary categorize this pull request, prefix the pull request title, using one of the following: - **BREAKING CHANGE**: Significant changes that may affect compatibility - **build**: Changes that affect the build system or external dependencies - **ci**: Changes to our continuous integration processes or workflows - **chore**: Refactor, cleanup, or other non-functional code changes - **docs**: Documentation update or addition - **feat**: Introduces a new feature or enhancement to the codebase - **fix**: Bug fix or error correction - **i18n**: Internationalization or localization changes - **perf**: Performance improvement - **refactor**: Code restructuring for better maintainability, readability, or scalability - **style**: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.) - **test**: Adding missing tests or correcting existing tests - **WIP**: Work in progress, a temporary label for incomplete or ongoing work # Changelog Entry ### Description This PR fixes [issue #5763](https://github.com/open-webui/open-webui/issues/5763) when using S3 for file upload and organises files under the user to avoid having too many files in the same folder. ### Added - Added new env `USER_DATA_DIR` and `USER_UPLOAD_FOLDER` to save user uploaded files Not touch in this PR, but eventually, both image caches and audio transcription should use this storage, and downloaded models use `DATA_DIR`/cache/xxx The `USER_DATA_DIR` can be a S3 bucket + prefix, so that users can control the uploaded file structure, e.g. ``` s3://some-bucket/openwebui <- USER_DATA_DIR, default to DATA_DIR - uploads <- USER_UPLOAD_FOLDER, default to 'uploads' \- user_id <- Organise files under the current user id to avoid too many files in the same folder \- uploaded_files1 ... \- uploaded_files2 ... - caches <- not touched in this PR \- image \- audio ``` ### Changed - Add file info to the metadata in vector db when using S3 - Organise files under the current user id to avoid too many files in the same folder - Return StreamResponse directly using S3 response, to avoid copy to a local file - Use a temporary file during file upload, and delete the local file after parsing into docs - Split Storage providers into multiple files: base, local, and s3 ### Deprecated ### Removed ### Fixed - https://github.com/open-webui/open-webui/issues/5763 ### Security ### Breaking Changes - **BREAKING CHANGE**: Removed env: STORAGE_PROVIDER and S3_BUCKET_NAME --- ### Additional Information ### Screenshots or Videos --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 18:05:01 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#8750