Configurable options for different file formats for RAG #2738

Closed
opened 2025-11-11 15:13:22 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @dsjath on GitHub (Nov 21, 2024).

Feature Request

Is your feature request related to a problem? Please describe.
I am frustrated that I cannot distinquish file types that should be automatically vectorized and file types that should not be automatically vectorized.
I do want to be able to upload large csv/excel files, but I do not want them to be embedded right away - on the other hand I do want embedding on text formats such as txt/word/pdf to be embedded right away.

I am utilizing pipelines to customize the experience - so I can e.g. parse csv/excel files with some code and only embed certain aspects of the file inside the pipeline whereas I just want the txt files to be embedded as usual.

Describe the solution you'd like
A yaml config file that allows customization of how to handle uploads of different file types.
E.g.
dissallow upload
allow upload but block rag
allow upload and use rag
file size limits for different file types
allow different max number of files for different formats (which needs to be less than the overall maximum)

Describe alternatives you've considered
I have tried to disallow uploads entirely, but then I have to do some weird integrations with API to access the files instead to the folder - not a friendly user experience.

Additional context
Add any other context or screenshots about the feature request here.

Originally created by @dsjath on GitHub (Nov 21, 2024). # Feature Request **Is your feature request related to a problem? Please describe.** I am frustrated that I cannot distinquish file types that should be automatically vectorized and file types that should not be automatically vectorized. I do want to be able to upload large csv/excel files, but I do not want them to be embedded right away - on the other hand I do want embedding on text formats such as txt/word/pdf to be embedded right away. I am utilizing pipelines to customize the experience - so I can e.g. parse csv/excel files with some code and only embed certain aspects of the file inside the pipeline whereas I just want the txt files to be embedded as usual. **Describe the solution you'd like** A yaml config file that allows customization of how to handle uploads of different file types. E.g. dissallow upload allow upload but block rag allow upload and use rag file size limits for different file types allow different max number of files for different formats (which needs to be less than the overall maximum) **Describe alternatives you've considered** I have tried to disallow uploads entirely, but then I have to do some weird integrations with API to access the files instead to the folder - not a friendly user experience. **Additional context** Add any other context or screenshots about the feature request here.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#2738