[GH-ISSUE #5763] enh: allow to use S3 for uploaded files #29644

New Issue

GiteaMirror · 2026-04-25T04:00:48-05:00

GiteaMirror commented

2026-04-25 04:00:48 -05:00

Originally created by @hongbo-miao on GitHub (Sep 27, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/5763

Is your feature request related to a problem? Please describe.

@tjbck clarified Postgres can be used for metadata (Open WebUI config and user chat history) at https://github.com/open-webui/helm-charts/issues/83#issuecomment-2379241585 except for user uploaded files.

Describe the solution you'd like
It would be great to support to use S3 for uploaded files. Thanks! ☺️

Describe alternatives you've considered
Use EBS or EFS for uploaded files.

Additional context
None

Originally created by @hongbo-miao on GitHub (Sep 27, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/5763 **Is your feature request related to a problem? Please describe.** @tjbck clarified Postgres can be used for metadata (Open WebUI config and user chat history) at https://github.com/open-webui/helm-charts/issues/83#issuecomment-2379241585 except for user uploaded files. **Describe the solution you'd like** It would be great to support to use S3 for uploaded files. Thanks! ☺️ **Describe alternatives you've considered** Use EBS or EFS for uploaded files. **Additional context** None

GiteaMirror added the good first issue help wanted labels 2026-04-25 04:00:49 -05:00

GiteaMirror closed this issue

2026-04-25 04:00:50 -05:00

GiteaMirror commented

2026-04-25 04:00:52 -05:00

@tjbck commented on GitHub (Sep 27, 2024):

PR Welcome!

@tjbck commented on GitHub (Sep 27, 2024): PR Welcome!

GiteaMirror commented

2026-04-25 04:00:53 -05:00

@ZhangChaoWN commented on GitHub (Sep 30, 2024):

I'm willing to contribute to this feature.

@ZhangChaoWN commented on GitHub (Sep 30, 2024): I'm willing to contribute to this feature.

GiteaMirror commented

2026-04-25 04:00:54 -05:00

@DucNgn commented on GitHub (Oct 8, 2024):

@ZhangChaoWN
Thanks for taking this on! I'm interested in contributing to this feature as well.
Lmk if you need help finishing it!

@DucNgn commented on GitHub (Oct 8, 2024): @ZhangChaoWN Thanks for taking this on! I'm interested in contributing to this feature as well. Lmk if you need help finishing it!

GiteaMirror commented

2026-04-25 04:00:55 -05:00

@ZhangChaoWN commented on GitHub (Oct 12, 2024):

@DucNgn, I'm super excited that you're interested in helping out on this feature!

Here’s a quick overview of what's been done so far and what still needs attention.

Completed:

In process_doc: Download file from S3
In upload_file: Upload file to S3
/{id}/content API: Download file to the client

TODO:

Modify the configuration to enable switching between saving files to S3 or the local file system (currently hardcoded). And Retrieve the S3 bucket name from the configuration.
Consider adding documentation about this feature.
In process_doc, would it be better to delete the temporary file downloaded from S3 after storing it in the vector database?

I have forked this project and pushed my code changes. If you're interested in collaborating on the coding work, feel free to merge my code or ask me to merge yours. Feel free to point out any mistakes or suggest ways to improve the feature. If you have any ideas for additional tasks, Please feel free to share them as well.

@ZhangChaoWN commented on GitHub (Oct 12, 2024): @DucNgn, I'm super excited that you're interested in helping out on this feature! Here’s a quick overview of what's been done so far and what still needs attention. Completed: - In process_doc: Download file from S3 - In upload_file: Upload file to S3 - /{id}/content API: Download file to the client TODO: - Modify the configuration to enable switching between saving files to S3 or the local file system (currently hardcoded). And Retrieve the S3 bucket name from the configuration. - Consider adding documentation about this feature. - In process_doc, would it be better to delete the temporary file downloaded from S3 after storing it in the vector database? I have forked this project and pushed my code changes. If you're interested in collaborating on the coding work, feel free to merge my code or ask me to merge yours. Feel free to point out any mistakes or suggest ways to improve the feature. If you have any ideas for additional tasks, Please feel free to share them as well.

GiteaMirror commented

2026-04-25 04:00:57 -05:00

@ZhangChaoWN commented on GitHub (Oct 13, 2024):

Squashed commits and rebased onto the latest main branch in the forked repo

@ZhangChaoWN commented on GitHub (Oct 13, 2024): Squashed commits and rebased onto the latest main branch in the forked repo

GiteaMirror commented

2026-04-25 04:01:00 -05:00

@LeoLiuYan commented on GitHub (Oct 14, 2024):

The content of the uploaded file will be indexed in the vector database; is it still necessary to upload it to S3? @tjbck @ZhangChaoWN

@LeoLiuYan commented on GitHub (Oct 14, 2024): The content of the uploaded file will be indexed in the vector database; is it still necessary to upload it to S3? @tjbck @ZhangChaoWN

GiteaMirror commented

2026-04-25 04:01:02 -05:00

@tjbck commented on GitHub (Oct 21, 2024):

Basic S3 storage support has been added to the development branch, and everything should function as expected, except for image and audio cache handling. Testing is encouraged, and additional pull requests to extend S3 support are welcome.

@tjbck commented on GitHub (Oct 21, 2024): Basic S3 storage support has been added to the development branch, and everything should function as expected, except for image and audio cache handling. Testing is encouraged, and additional pull requests to extend S3 support are welcome.

GiteaMirror commented

2026-04-25 04:01:06 -05:00

@gmemstr commented on GitHub (Oct 25, 2024):

Looks like the current implementation doesn't quite work

INFO  [open_webui.apps.webui.routers.files] file.content_type: text/plain
ERROR [open_webui.apps.webui.routers.files] I/O operation on closed file.
Traceback (most recent call last):
  File "/app/backend/open_webui/apps/webui/routers/files.py", line 52, in upload_file
    contents, file_path = Storage.upload_file(file.file, filename)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/storage/provider.py", line 137, in upload_file
    return self._upload_to_s3(file, filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/storage/provider.py", line 54, in _upload_to_s3
    return file.read(), f"s3://{self.bucket_name}/{filename}"
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/tempfile.py", line 804, in read
    return self._file.read(*args)
           ^^^^^^^^^^^^^^^^^^^^^^
ValueError: I/O operation on closed file.

Testing with Cloudflare R2's S3 API. The filenames are present but with a zero size. I think it's because .read() is being called multiple times?

If the end of the file has been reached, f.read() will return an empty string ('').

https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

@gmemstr commented on GitHub (Oct 25, 2024): Looks like the current implementation doesn't *quite* work ``` INFO [open_webui.apps.webui.routers.files] file.content_type: text/plain ERROR [open_webui.apps.webui.routers.files] I/O operation on closed file. Traceback (most recent call last): File "/app/backend/open_webui/apps/webui/routers/files.py", line 52, in upload_file contents, file_path = Storage.upload_file(file.file, filename) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/storage/provider.py", line 137, in upload_file return self._upload_to_s3(file, filename) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/storage/provider.py", line 54, in _upload_to_s3 return file.read(), f"s3://{self.bucket_name}/{filename}" ^^^^^^^^^^^ File "/usr/local/lib/python3.11/tempfile.py", line 804, in read return self._file.read(*args) ^^^^^^^^^^^^^^^^^^^^^^ ValueError: I/O operation on closed file. ``` Testing with Cloudflare R2's S3 API. The filenames are present but with a zero size. I think it's because `.read()` is being called multiple times? > If the end of the file has been reached, f.read() will return an empty string (''). https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

GiteaMirror commented

2026-04-25 04:01:08 -05:00

@tjbck commented on GitHub (Oct 26, 2024):

@gmemstr Good catch, should be addressed in dev! More testing wanted here!

@tjbck commented on GitHub (Oct 26, 2024): @gmemstr Good catch, should be addressed in dev! More testing wanted here!

GiteaMirror commented

2026-04-25 04:01:11 -05:00

@nickfixit commented on GitHub (Oct 27, 2024):

What about mounting a JuiceFS filesystem?

@nickfixit commented on GitHub (Oct 27, 2024): What about mounting a JuiceFS filesystem?

GiteaMirror commented

2026-04-25 04:01:13 -05:00

@gmemstr commented on GitHub (Oct 31, 2024):

S3 handling still seems to be broken - the file seems to be uploading properly now, but not retrieved properly

Log

ERROR [open_webui.apps.retrieval.main] list index out of range
Traceback (most recent call last):
  File "/app/backend/open_webui/apps/retrieval/main.py", line 835, in process_file
    file_path = Storage.get_file(file_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/storage/provider.py", line 144, in get_file
    return self._get_file_from_s3(file_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/storage/provider.py", line 71, in _get_file_from_s3
    bucket_name, key = file_path.split("//")[1].split("/")
                       ~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
ERROR [open_webui.apps.webui.routers.files] 400: list index out of range
Traceback (most recent call last):
  File "/app/backend/open_webui/apps/retrieval/main.py", line 835, in process_file
    file_path = Storage.get_file(file_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/storage/provider.py", line 144, in get_file
    return self._get_file_from_s3(file_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/storage/provider.py", line 71, in _get_file_from_s3
    bucket_name, key = file_path.split("//")[1].split("/")
                       ~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/backend/open_webui/apps/webui/routers/files.py", line 71, in upload_file
    process_file(ProcessFileForm(file_id=id))
  File "/app/backend/open_webui/apps/retrieval/main.py", line 903, in process_file
    raise HTTPException(
fastapi.exceptions.HTTPException: 400: list index out of range
ERROR [open_webui.apps.webui.routers.files] Error processing file: 5d1758a5-17dd-4b40-b00f-4d49c78e6078
INFO:     100.96.56.44:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK

@gmemstr commented on GitHub (Oct 31, 2024): S3 handling still seems to be broken - the file seems to be uploading properly now, but not retrieved properly <details><summary>Log</summary> ``` ERROR [open_webui.apps.retrieval.main] list index out of range Traceback (most recent call last): File "/app/backend/open_webui/apps/retrieval/main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/storage/provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/storage/provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range ERROR [open_webui.apps.webui.routers.files] 400: list index out of range Traceback (most recent call last): File "/app/backend/open_webui/apps/retrieval/main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/storage/provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/storage/provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/backend/open_webui/apps/webui/routers/files.py", line 71, in upload_file process_file(ProcessFileForm(file_id=id)) File "/app/backend/open_webui/apps/retrieval/main.py", line 903, in process_file raise HTTPException( fastapi.exceptions.HTTPException: 400: list index out of range ERROR [open_webui.apps.webui.routers.files] Error processing file: 5d1758a5-17dd-4b40-b00f-4d49c78e6078 INFO: 100.96.56.44:0 - "POST /api/v1/files/ HTTP/1.1" 200 OK ``` </details>

GiteaMirror commented

2026-04-25 04:01:15 -05:00

@davizucon commented on GitHub (Nov 1, 2024):

Hey ! well done, I'm looking forward this feature :)

So, taking a look at the code, I started fixing it, but what do you think if we follow the same approach as for vector/dbs?
Where we have a "common CRUD" contract and then in config env we choose which implementation provider ("local" or "s3") should instantiate, what do you think?

@davizucon commented on GitHub (Nov 1, 2024): Hey ! well done, I'm looking forward this feature :) So, taking a look at the code, I started fixing it, but what do you think if we follow the same approach as for vector/dbs? Where we have a "common CR~U~D" contract and then in config env we choose which implementation provider ("local" or "s3") should instantiate, what do you think?

GiteaMirror commented

2026-04-25 04:01:17 -05:00

@tjbck commented on GitHub (Nov 3, 2024):

@davizucon external vector dbs are already supported, unsure what you meant here.

@tjbck commented on GitHub (Nov 3, 2024): @davizucon external vector dbs are already supported, unsure what you meant here.

GiteaMirror commented

2026-04-25 04:01:18 -05:00

@tjbck commented on GitHub (Nov 3, 2024):

@gmemstr should be fixed on dev!

@tjbck commented on GitHub (Nov 3, 2024): @gmemstr should be fixed on dev!

GiteaMirror commented

2026-04-25 04:01:20 -05:00

@davizucon commented on GitHub (Nov 3, 2024):

@tjbck , thanks for reply.
this is how the classes and functions are organized. Instead of all functions with if/else, you could make specialist classes that each deal with a specific implementation of storage. I mentioned vector just to be used as an example of organization, it follows this design.

@davizucon commented on GitHub (Nov 3, 2024): @tjbck , thanks for reply. this is how the classes and functions are organized. Instead of all functions with if/else, you could make specialist classes that each deal with a specific implementation of storage. I mentioned vector just to be used as an example of organization, it follows this design.

GiteaMirror commented

2026-04-25 04:01:22 -05:00

@CallumJHays commented on GitHub (Nov 6, 2024):

Hi all, I'm also looking forward to this feature. I agree with @davizucon idea to organise them into separate classes:

StorageProvider(ABC)
FileSystemStorageProvider(StorageProvider)
S3StorageProvider(StorageProvider)

Also noticed that with the existing implementation there is no pagination on list_objects calls, which might cause issues after 1000 uploads. There may be some frontend considerations for such large collections that need further thought.

Looking to deploy with a stable release relatively soon, so I'm happy to help put together a PR if it would be accepted 😄

@CallumJHays commented on GitHub (Nov 6, 2024): Hi all, I'm also looking forward to this feature. I agree with @davizucon idea to organise them into separate classes: - `StorageProvider(ABC)` - `FileSystemStorageProvider(StorageProvider)` - `S3StorageProvider(StorageProvider)` Also noticed that with the existing implementation there is no pagination on `list_objects` calls, which might cause issues after 1000 uploads. There may be some frontend considerations for such large collections that need further thought. Looking to deploy with a stable release relatively soon, so I'm happy to help put together a PR if it would be accepted 😄

GiteaMirror commented

2026-04-25 04:01:24 -05:00

@tjbck commented on GitHub (Nov 6, 2024):

@CallumJHays Feel free to make an initial PR, I'll provide guidance/comment where needed!

@tjbck commented on GitHub (Nov 6, 2024): @CallumJHays Feel free to make an initial PR, I'll provide guidance/comment where needed!

GiteaMirror commented

2026-04-25 04:01:24 -05:00

@weixu365 commented on GitHub (Nov 7, 2024):

Hi @tjbck , I creatd a PR #6773 to fix the S3 bug and also split files mentioned as @CallumJHays and @davizucon

Other changes:

Organise files under user to prevent too many files in one folder.
Avoid using a local file when downloading file
Delete the local temp file after loading into docs during upload.

Please let me know your opinion, thanks

@weixu365 commented on GitHub (Nov 7, 2024): Hi @tjbck , I creatd a PR #6773 to fix the S3 bug and also split files mentioned as @CallumJHays and @davizucon Other changes: - Organise files under user to prevent too many files in one folder. - Avoid using a local file when downloading file - Delete the local temp file after loading into docs during upload. Please let me know your opinion, thanks

GiteaMirror commented

2026-04-25 04:01:26 -05:00

@tjbck commented on GitHub (Nov 7, 2024):

@weixu365 I'd appreciate if you could split them up to atomic PRs.

@tjbck commented on GitHub (Nov 7, 2024): @weixu365 I'd appreciate if you could split them up to atomic PRs.

GiteaMirror commented

2026-04-25 04:01:27 -05:00

@weixu365 commented on GitHub (Nov 7, 2024):

Hi @tjbck , If you are fine with splitting the file into multiple files, then the following three things need to be in a single PR:

Return StreamResponse directly using S3 response, to avoid copy to a local file
Use a temporary file during file upload, and delete the local file after parsing into docs
Split Storage providers into multiple files: base, local, and s3

The following two can be separate PRs:

Add file info to the metadata in vector db when using S3: 20 lines change in: https://github.com/open-webui/open-webui/pull/6773/files#diff-2d89f8187eb510b438425635ecf01562e49d051917a23ed58ebd73b0519e6cc6R890-R910
Organise files under the current user id to avoid too many files in the same folder: 1 line at https://github.com/open-webui/open-webui/pull/6773/files#diff-20f606f9101c38ec9235ead28a87261b54b09151b45a04d76456aa0bf2e69ef1R50

Please let me know your opinion.

@weixu365 commented on GitHub (Nov 7, 2024): Hi @tjbck , If you are fine with splitting the file into multiple files, then the following three things need to be in a single PR: - Return StreamResponse directly using S3 response, to avoid copy to a local file - Use a temporary file during file upload, and delete the local file after parsing into docs - Split Storage providers into multiple files: base, local, and s3 The following two can be separate PRs: - Add file info to the metadata in vector db when using S3: 20 lines change in: https://github.com/open-webui/open-webui/pull/6773/files#diff-2d89f8187eb510b438425635ecf01562e49d051917a23ed58ebd73b0519e6cc6R890-R910 - Organise files under the current user id to avoid too many files in the same folder: 1 line at https://github.com/open-webui/open-webui/pull/6773/files#diff-20f606f9101c38ec9235ead28a87261b54b09151b45a04d76456aa0bf2e69ef1R50 Please let me know your opinion.

GiteaMirror commented

2026-04-25 04:01:28 -05:00

@weixu365 commented on GitHub (Nov 7, 2024):

I can also create PRs in the following order:

Split Storage providers into multiple files: base, local, and s3 without changing any logic.
Add dest path to Storage class to make it able to store files for different purposes, e.g. uploads, transcription, images
- S3 Storage: bucket, prefix
- Local Storage: storage_home
Delete temporary local files when using S3 for storage, includes the following changes:
- Return StreamResponse from S3 response
- Delete the local file created during upload
Organise files under the current user id to avoid too many files in the same folder
Add file info to the metadata in vector db when using S3
Use paginate to list all files for deleting

The 2nd and 3rd are slightly coupled, so it would be easier to merge them into one PR.

@weixu365 commented on GitHub (Nov 7, 2024): I can also create PRs in the following order: 1. Split Storage providers into multiple files: base, local, and s3 without changing any logic. 2. Add dest path to Storage class to make it able to store files for different purposes, e.g. uploads, transcription, images - S3 Storage: bucket, prefix - Local Storage: storage_home 3. Delete temporary local files when using S3 for storage, includes the following changes: - Return StreamResponse from S3 response - Delete the local file created during upload 4. Organise files under the current user id to avoid too many files in the same folder 5. Add file info to the metadata in vector db when using S3 6. Use paginate to list all files for deleting The 2nd and 3rd are slightly coupled, so it would be easier to merge them into one PR.

GiteaMirror commented

2026-04-25 04:01:29 -05:00

@weixu365 commented on GitHub (Nov 11, 2024):

Hi @tjbck, is there any update on the PR? I can split it into a couple of small PRs, but I need your guidance on what changes are accepted in the above 6 atomic changes.

@weixu365 commented on GitHub (Nov 11, 2024): Hi @tjbck, is there any update on the PR? I can split it into a couple of small PRs, but I need your guidance on what changes are accepted in the above 6 atomic changes.

GiteaMirror commented

2026-04-25 04:01:30 -05:00

@Mavial commented on GitHub (Nov 15, 2024):

Hey @tjbck, does this mean that the S3 bugfixes are delayed until v5?

@Mavial commented on GitHub (Nov 15, 2024): Hey @tjbck, does this mean that the S3 bugfixes are delayed until v5?

GiteaMirror commented

2026-04-25 04:01:31 -05:00

@lewis-ing commented on GitHub (Nov 19, 2024):

so, i had upload file for S3,but console print error info:

`INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf
ERROR [open_webui.apps.retrieval.main] list index out of range
Traceback (most recent call last):
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file
file_path = Storage.get_file(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file
return self._get_file_from_s3(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3
bucket_name, key = file_path.split("//")[1].split("/")
~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
ERROR [open_webui.apps.webui.routers.files] 400: list index out of range
Traceback (most recent call last):
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file
file_path = Storage.get_file(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file
return self._get_file_from_s3(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3
bucket_name, key = file_path.split("//")[1].split("/")
~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\webui\routers\files.py", line 71, in upload_file
process_file(ProcessFileForm(file_id=id))
File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 903, in process_file
raise HTTPException(
fastapi.exceptions.HTTPException: 400: list index out of range
ERROR [open_webui.apps.webui.routers.files] Error processing file: 54add99d-1754-431d-8935-8ecd82794aae
INFO: 127.0.0.1:49411 - "POST /api/v1/files/ HTTP/1.1" 200 OK`

@lewis-ing commented on GitHub (Nov 19, 2024): so, i had upload file for S3,but console print error info: `INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf ERROR [open_webui.apps.retrieval.main] list index out of range Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range ERROR [open_webui.apps.webui.routers.files] 400: list index out of range Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\webui\routers\files.py", line 71, in upload_file process_file(ProcessFileForm(file_id=id)) File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 903, in process_file raise HTTPException( fastapi.exceptions.HTTPException: 400: list index out of range ERROR [open_webui.apps.webui.routers.files] Error processing file: 54add99d-1754-431d-8935-8ecd82794aae INFO: 127.0.0.1:49411 - "POST /api/v1/files/ HTTP/1.1" 200 OK`

GiteaMirror commented

2026-04-25 04:01:34 -05:00

@lewis-ing commented on GitHub (Nov 19, 2024):

So, I figure it, modified provider.py file,

def _get_file_from_s3(self, file_path: str) -> str:
        """Handles downloading of the file from S3 storage."""
        if not self.s3_client:
            raise RuntimeError("S3 Client is not initialized.")

        if os.path.exists(file_path):  # Local file
            return file_path

        try:
            # Validate and parse the file path
            if not file_path.startswith("s3://"):
                raise ValueError(f"Invalid S3 file path format: {file_path}")
            bucket_name, key = file_path[len("s3://"):].split("/", 1)

            local_file_path = os.path.join(UPLOAD_DIR, os.path.basename(key))
            self.s3_client.download_file(bucket_name, key, local_file_path)
            return local_file_path
        except ValueError as e:
            raise RuntimeError(f"File path validation failed: {e}")
        except ClientError as e:
            raise RuntimeError(f"Error downloading file from S3: {e}")

def get_file(self, file_path: str) -> str:
        """Downloads a file either from S3 or the local file system and returns the file path."""
        try:
            if self.storage_provider == "s3":
                return self._get_file_from_s3(file_path)
            return self._get_file_from_local(file_path)
        except RuntimeError as e:
            if "Invalid file path format" in str(e):
                print("Attempting to handle file as local path.")
                return self._get_file_from_local(file_path)
            raise

@lewis-ing commented on GitHub (Nov 19, 2024): So, I figure it, modified provider.py file, ``` def _get_file_from_s3(self, file_path: str) -> str: """Handles downloading of the file from S3 storage.""" if not self.s3_client: raise RuntimeError("S3 Client is not initialized.") if os.path.exists(file_path): # Local file return file_path try: # Validate and parse the file path if not file_path.startswith("s3://"): raise ValueError(f"Invalid S3 file path format: {file_path}") bucket_name, key = file_path[len("s3://"):].split("/", 1) local_file_path = os.path.join(UPLOAD_DIR, os.path.basename(key)) self.s3_client.download_file(bucket_name, key, local_file_path) return local_file_path except ValueError as e: raise RuntimeError(f"File path validation failed: {e}") except ClientError as e: raise RuntimeError(f"Error downloading file from S3: {e}") ``` ``` def get_file(self, file_path: str) -> str: """Downloads a file either from S3 or the local file system and returns the file path.""" try: if self.storage_provider == "s3": return self._get_file_from_s3(file_path) return self._get_file_from_local(file_path) except RuntimeError as e: if "Invalid file path format" in str(e): print("Attempting to handle file as local path.") return self._get_file_from_local(file_path) raise ```

GiteaMirror commented

2026-04-25 04:01:35 -05:00

@RobinBially commented on GitHub (Nov 19, 2024):

so, i had upload file for S3,but console print error info:

`INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf ERROR [open_webui.apps.retrieval.main] list index out of range Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range ERROR [open_webui.apps.webui.routers.files] 400: list index out of range Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\webui\routers\files.py", line 71, in upload_file process_file(ProcessFileForm(file_id=id)) File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 903, in process_file raise HTTPException( fastapi.exceptions.HTTPException: 400: list index out of range ERROR [open_webui.apps.webui.routers.files] Error processing file: 54add99d-1754-431d-8935-8ecd82794aae INFO: 127.0.0.1:49411 - "POST /api/v1/files/ HTTP/1.1" 200 OK`

#7040

@RobinBially commented on GitHub (Nov 19, 2024): > so, i had upload file for S3,but console print error info: > > `INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf ERROR [open_webui.apps.retrieval.main] list index out of range Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range ERROR [open_webui.apps.webui.routers.files] 400: list index out of range Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 835, in process_file file_path = Storage.get_file(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 144, in get_file return self._get_file_from_s3(file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\storage\provider.py", line 71, in _get_file_from_s3 bucket_name, key = file_path.split("//")[1].split("/") ~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\webui\routers\files.py", line 71, in upload_file process_file(ProcessFileForm(file_id=id)) File "D:\sourcecode\soft-factory\ai\open-webui\backend\open_webui\apps\retrieval\main.py", line 903, in process_file raise HTTPException( fastapi.exceptions.HTTPException: 400: list index out of range ERROR [open_webui.apps.webui.routers.files] Error processing file: 54add99d-1754-431d-8935-8ecd82794aae INFO: 127.0.0.1:49411 - "POST /api/v1/files/ HTTP/1.1" 200 OK` #7040

GiteaMirror commented

2026-04-25 04:01:39 -05:00

@freeload101 commented on GitHub (Dec 23, 2024):

Why not just use one of the other million ways to mount S3 to a path like s3fs ... Or something that does not use fuse whatever...

@freeload101 commented on GitHub (Dec 23, 2024): Why not just use one of the other million ways to mount S3 to a path like s3fs ... Or something that does not use fuse whatever...

GiteaMirror commented

2026-04-25 04:01:41 -05:00

@dallenpyrah commented on GitHub (Jan 8, 2025):

Really looking forward to this feature, our team is wanting to upload .txt files of our codebase to S3 on merges so we can chat with our entire system in OpenWebUI.

@dallenpyrah commented on GitHub (Jan 8, 2025): Really looking forward to this feature, our team is wanting to upload .txt files of our codebase to S3 on merges so we can chat with our entire system in OpenWebUI.

GiteaMirror commented

2026-04-25 04:01:43 -05:00

@rragundez commented on GitHub (Jan 15, 2025):

For splitting into classes without modifying the logic in the code #8580
Once merged after any modifications I will add the GCS storage provider which is actually the reason for me to contribute as I need it, to not hack my way into it.

@rragundez commented on GitHub (Jan 15, 2025): For splitting into classes without modifying the logic in the code #8580 Once merged after any modifications I will add the GCS storage provider which is actually the reason for me to contribute as I need it, to not hack my way into it.

GiteaMirror commented

2026-04-25 04:01:44 -05:00

@tjbck commented on GitHub (Jan 17, 2025):

Testing wanted with the latest dev! Might've resolved a lot of issues you guys were facing!

@tjbck commented on GitHub (Jan 17, 2025): Testing wanted with the latest dev! Might've resolved a lot of issues you guys were facing!

GiteaMirror commented

2026-04-25 04:01:46 -05:00

@rragundez commented on GitHub (Jan 19, 2025):

Hi @RobinBially @lewis-ing, I added the PR with the tests and the refactoring of the Storage classes. It seems to me the logic is correct so could it be that you were using the local storage and then changes to the S3 Storage while using the same DB or docker volume? If this is the case, then indeed there might be issues because the file is saves to the database using an ID and a property call path which determines if it is local or S3. So if it was saved the first time using one storage provider and then changed, there would be a mistmatch as the path in the database will be pointed to local while the function being called to load the file will be S3 (for example).

to address this there would have to be a mechanism that if the retrieve path from the database is a local path but the configuration is S3 it should, upload the file to S3 and add/replace the path with the s3 path

If you still see the error given that you start with a clean environment (DB and docker volume) can you post it here with the scenario, then I can add a test for it and try to solve it.

@hongbo-miao since you opened this Issue can you also double check? thanks

@rragundez commented on GitHub (Jan 19, 2025): Hi @RobinBially @lewis-ing, I added the PR with the tests and the refactoring of the Storage classes. It seems to me the logic is correct so could it be that you were using the local storage and then changes to the S3 Storage while using the same DB or docker volume? If this is the case, then indeed there might be issues because the file is saves to the database using an ID and a property call path which determines if it is local or S3. So if it was saved the first time using one storage provider and then changed, there would be a mistmatch as the path in the database will be pointed to local while the function being called to load the file will be S3 (for example). > to address this there would have to be a mechanism that if the retrieve path from the database is a local path but the configuration is S3 it should, upload the file to S3 and add/replace the path with the s3 path If you still see the error given that you start with a clean environment (DB and docker volume) can you post it here with the scenario, then I can add a test for it and try to solve it. @hongbo-miao since you opened this Issue can you also double check? thanks

GiteaMirror commented

2026-04-25 04:01:47 -05:00

@rragundez commented on GitHub (Jan 19, 2025):

I do think one decision remains, if using an external provider (e.g. S3, GCS, Azure Blob, etc.) should the application still interact with the local filesystem, right now if using S3, files are being saved to both locations on upload and download.
I think this one is for you to decide @tjbck. I can help with the implementation.

My 2cents. From an application, deployment and scaling point of view, the application should never store data, but only use external sources or at least give the possibility to do so. There are known issues when the application is holding data: data deletion on app failure or redeployment, bloating the filesystem, no single source of truth, and different behavior for users depending on which VM/pod their request lands in (distributed deployment).
There are indeed methods to mitigate this, like from infra having a single filesystem that mounts into each VM/pod but then that solution would be infra wise not application wise, and I guess this will reduce the amount of audience that could use this feature appropriately since infra is another different capability.

Now, it is true that other issues might impact the connection to the external storage, like internet connection, firewalls, etc. So in order to be a bit resilient to this, some fail mechanism where if cannot retrieve the file from the external storage then it tries to retrieve it locally. This argument, for me, it is not enough to counter the one in favor of only setting for external storage, unless explicitly indicated by the user. Perhaps, even though it will add complexity, the fail safety mechanism to store files also locally could be only triggered if there is flag that the user sets explicitly.

In conclusion I would only go for storing in external storage and then wait to see how that goes.

@rragundez commented on GitHub (Jan 19, 2025): I do think one decision remains, if using an external provider (e.g. S3, GCS, Azure Blob, etc.) should the application still interact with the local filesystem, right now if using S3, files are being saved to both locations on upload and download. I think this one is for you to decide @tjbck. I can help with the implementation. My 2cents. From an application, deployment and scaling point of view, the application should never store data, but only use external sources or at least give the possibility to do so. There are known issues when the application is holding data: data deletion on app failure or redeployment, bloating the filesystem, no single source of truth, and different behavior for users depending on which VM/pod their request lands in (distributed deployment). There are indeed methods to mitigate this, like from infra having a single filesystem that mounts into each VM/pod but then that solution would be infra wise not application wise, and I guess this will reduce the amount of audience that could use this feature appropriately since infra is another different capability. Now, it is true that other issues might impact the connection to the external storage, like internet connection, firewalls, etc. So in order to be a bit resilient to this, some fail mechanism where if cannot retrieve the file from the external storage then it tries to retrieve it locally. This argument, for me, it is not enough to counter the one in favor of only setting for external storage, unless explicitly indicated by the user. Perhaps, even though it will add complexity, the fail safety mechanism to store files also locally could be only triggered if there is flag that the user sets explicitly. In conclusion I would only go for storing in external storage and then wait to see how that goes.

GiteaMirror commented

2026-04-25 04:01:48 -05:00

@antoinebou12 commented on GitHub (Jan 21, 2025):

Can you add minio support also

@antoinebou12 commented on GitHub (Jan 21, 2025): Can you add minio support also

GiteaMirror commented

2026-04-25 04:01:49 -05:00

@tjbck commented on GitHub (Jan 22, 2025):

Refac has been merged to main, testing wanted here!

@tjbck commented on GitHub (Jan 22, 2025): Refac has been merged to main, testing wanted here!

GiteaMirror commented

2026-04-25 04:01:50 -05:00

@Mavial commented on GitHub (Jan 25, 2025):

Refac has been merged to main, testing wanted here!

I've been using it in prod for 40 users since it was pushed to dev and have found no issues with the s3 provider.

@Mavial commented on GitHub (Jan 25, 2025): > Refac has been merged to main, testing wanted here! I've been using it in prod for 40 users since it was pushed to dev and have found no issues with the s3 provider.

GiteaMirror commented

2026-04-25 04:01:51 -05:00

@rragundez commented on GitHub (Jan 25, 2025):

Thanks for the feedback. @tjbck should we close the issue?

@rragundez commented on GitHub (Jan 25, 2025): Thanks for the feedback. @tjbck should we close the issue?

GiteaMirror commented

2026-04-25 04:01:52 -05:00

@spammenotinoz commented on GitHub (Jan 28, 2025):

I do think one decision remains, if using an external provider (e.g. S3, GCS, Azure Blob, etc.) should the application still interact with the local filesystem, right now if using S3, files are being saved to both locations on upload and download. I think this one is for you to decide @tjbck. I can help with the implementation.

My 2cents. From an application, deployment and scaling point of view, the application should never store data, but only use external sources or at least give the possibility to do so. There are known issues when the application is holding data: data deletion on app failure or redeployment, bloating the filesystem, no single source of truth, and different behavior for users depending on which VM/pod their request lands in (distributed deployment). There are indeed methods to mitigate this, like from infra having a single filesystem that mounts into each VM/pod but then that solution would be infra wise not application wise, and I guess this will reduce the amount of audience that could use this feature appropriately since infra is another different capability.

Now, it is true that other issues might impact the connection to the external storage, like internet connection, firewalls, etc. So in order to be a bit resilient to this, some fail mechanism where if cannot retrieve the file from the external storage then it tries to retrieve it locally. This argument, for me, it is not enough to counter the one in favor of only setting for external storage, unless explicitly indicated by the user. Perhaps, even though it will add complexity, the fail safety mechanism to store files also locally could be only triggered if there is flag that the user sets explicitly.

In conclusion I would only go for storing in external storage and then wait to see how that goes.

Agree entirely.

@spammenotinoz commented on GitHub (Jan 28, 2025): > I do think one decision remains, if using an external provider (e.g. S3, GCS, Azure Blob, etc.) should the application still interact with the local filesystem, right now if using S3, files are being saved to both locations on upload and download. I think this one is for you to decide [@tjbck](https://github.com/tjbck). I can help with the implementation. > > My 2cents. From an application, deployment and scaling point of view, the application should never store data, but only use external sources or at least give the possibility to do so. There are known issues when the application is holding data: data deletion on app failure or redeployment, bloating the filesystem, no single source of truth, and different behavior for users depending on which VM/pod their request lands in (distributed deployment). There are indeed methods to mitigate this, like from infra having a single filesystem that mounts into each VM/pod but then that solution would be infra wise not application wise, and I guess this will reduce the amount of audience that could use this feature appropriately since infra is another different capability. > > Now, it is true that other issues might impact the connection to the external storage, like internet connection, firewalls, etc. So in order to be a bit resilient to this, some fail mechanism where if cannot retrieve the file from the external storage then it tries to retrieve it locally. This argument, for me, it is not enough to counter the one in favor of only setting for external storage, unless explicitly indicated by the user. Perhaps, even though it will add complexity, the fail safety mechanism to store files also locally could be only triggered if there is flag that the user sets explicitly. > > In conclusion I would only go for storing in external storage and then wait to see how that goes. Agree entirely.

GiteaMirror commented

2026-04-25 04:01:53 -05:00

@JoeChen2me commented on GitHub (Feb 15, 2025):

I have identified a problem: S3 storage is limited to AWS S3 and does not support other compatible services such as Cloudflare R2, particularly concerning the configuration of the S3_REGION_NAME environment variable.

@JoeChen2me commented on GitHub (Feb 15, 2025): I have identified a problem: S3 storage is limited to AWS S3 and does not support other compatible services such as Cloudflare R2, particularly concerning the configuration of the S3_REGION_NAME environment variable.

GiteaMirror commented

2026-04-25 04:01:54 -05:00

@Mavial commented on GitHub (Feb 15, 2025):

I have identified a problem: S3 storage is limited to AWS S3 and does not support other compatible services such as Cloudflare R2, particularly concerning the configuration of the S3_REGION_NAME environment variable.

I've been using S3 on IONOS with no Problems. Please elaborate und your exact Problems?

@Mavial commented on GitHub (Feb 15, 2025): > I have identified a problem: S3 storage is limited to AWS S3 and does not support other compatible services such as Cloudflare R2, particularly concerning the configuration of the S3_REGION_NAME environment variable. I've been using S3 on IONOS with no Problems. Please elaborate und your exact Problems?

GiteaMirror commented

2026-04-25 04:01:55 -05:00

@JoeChen2me commented on GitHub (Feb 15, 2025):

I have identified a problem: S3 storage is limited to AWS S3 and does not support other compatible services such as Cloudflare R2, particularly concerning the configuration of the S3_REGION_NAME environment variable.

I've been using S3 on IONOS with no Problems. Please elaborate und your exact Problems?

you can view the issue created by me. ISSUE

And, have you updated to the latest version(0.5.12).

@JoeChen2me commented on GitHub (Feb 15, 2025): > > I have identified a problem: S3 storage is limited to AWS S3 and does not support other compatible services such as Cloudflare R2, particularly concerning the configuration of the S3_REGION_NAME environment variable. > > I've been using S3 on IONOS with no Problems. Please elaborate und your exact Problems? you can view the issue created by me. [ISSUE](https://github.com/open-webui/open-webui/issues/10069) And, have you updated to the latest version(0.5.12).

GiteaMirror commented

2026-04-25 04:01:56 -05:00

@blegry commented on GitHub (May 19, 2025):

I do think one decision remains, if using an external provider (e.g. S3, GCS, Azure Blob, etc.) should the application still interact with the local filesystem, right now if using S3, files are being saved to both locations on upload and download. I think this one is for you to decide @tjbck. I can help with the implementation.

My 2cents. From an application, deployment and scaling point of view, the application should never store data, but only use external sources or at least give the possibility to do so. There are known issues when the application is holding data: data deletion on app failure or redeployment, bloating the filesystem, no single source of truth, and different behavior for users depending on which VM/pod their request lands in (distributed deployment). There are indeed methods to mitigate this, like from infra having a single filesystem that mounts into each VM/pod but then that solution would be infra wise not application wise, and I guess this will reduce the amount of audience that could use this feature appropriately since infra is another different capability.

Now, it is true that other issues might impact the connection to the external storage, like internet connection, firewalls, etc. So in order to be a bit resilient to this, some fail mechanism where if cannot retrieve the file from the external storage then it tries to retrieve it locally. This argument, for me, it is not enough to counter the one in favor of only setting for external storage, unless explicitly indicated by the user. Perhaps, even though it will add complexity, the fail safety mechanism to store files also locally could be only triggered if there is flag that the user sets explicitly.

In conclusion I would only go for storing in external storage and then wait to see how that goes.

Agreed.

I conducted some behavior tests. My environment: owui, external S3, external vectorDB, external orcization:

When using the collection to upload files and then manually deleting them directly from the container in ./data/uploads.
When using the file upload in a new chat and manually deleting them directly from the container in ./data/uploads.

--> In both cases, the system continues to function properly. I can discuss with chunks and download files that were "locally deleted." This means that S3 is sufficient.

So, what is the purpose of this? Are there any scenarios where it breaks something? When downloading a file, do the files on the container reappear in ./data/uploads.

@blegry commented on GitHub (May 19, 2025): # > I do think one decision remains, if using an external provider (e.g. S3, GCS, Azure Blob, etc.) should the application still interact with the local filesystem, right now if using S3, files are being saved to both locations on upload and download. I think this one is for you to decide [@tjbck](https://github.com/tjbck). I can help with the implementation. > > My 2cents. From an application, deployment and scaling point of view, the application should never store data, but only use external sources or at least give the possibility to do so. There are known issues when the application is holding data: data deletion on app failure or redeployment, bloating the filesystem, no single source of truth, and different behavior for users depending on which VM/pod their request lands in (distributed deployment). There are indeed methods to mitigate this, like from infra having a single filesystem that mounts into each VM/pod but then that solution would be infra wise not application wise, and I guess this will reduce the amount of audience that could use this feature appropriately since infra is another different capability. > > Now, it is true that other issues might impact the connection to the external storage, like internet connection, firewalls, etc. So in order to be a bit resilient to this, some fail mechanism where if cannot retrieve the file from the external storage then it tries to retrieve it locally. This argument, for me, it is not enough to counter the one in favor of only setting for external storage, unless explicitly indicated by the user. Perhaps, even though it will add complexity, the fail safety mechanism to store files also locally could be only triggered if there is flag that the user sets explicitly. > > In conclusion I would only go for storing in external storage and then wait to see how that goes. Agreed. I conducted some behavior tests. My environment: owui, external S3, external vectorDB, external orcization: - When using the collection to upload files and then manually deleting them directly from the container in ./data/uploads. - When using the file upload in a new chat and manually deleting them directly from the container in ./data/uploads. --> In both cases, the system continues to function properly. I can discuss with chunks and download files that were "locally deleted." This means that S3 is sufficient. So, what is the purpose of this? Are there any scenarios where it breaks something? When downloading a file, do the files on the container reappear in ./data/uploads.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#29644