mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-10 15:54:15 -05:00
issue: RAG_ALLOWED_FILE_EXTENSIONS is too rigid and doesn't support all filetypes in accordance to chosen Content Extraction Engine
#5473
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Hisma on GitHub (Jun 7, 2025).
Check Existing Issues
Installation Method
Docker Compose
Open WebUI Version
0.6.13
Operating System
Ubuntu 24.04
Confirmation
README.md.Expected Behavior
.png,.jpg, etc.) should be able to be uploaded to a RAG knowledgebase when using aContent Extraction Enginethat supports image OCR (e.g.datalab_marker- but this bug applies to all engines exceptexternal)Allowed File Extensionsshould truly mean “allow all file types,” or at minimum auto-whitelist those supported by the chosen engine.Actual Behavior
image/*MIME type, for any Document Extraction Engine other thanexternal.Steps to Reproduce
Environment
datalab_marker(configured in UI inDocument Settings)Steps to reproduce
external**, click Save.Docker Log
Problematic Code Snippet
In
backend/open_webui/routers/files.pyaround line 171, the engine-based logic currently looks like this:This means in
open_webui/routers/retrieval.py, images and videos are only passed toprocess_file()when the engine equals"external". This hard-codes a block/allow rule for image & video files based on the engine name, rather than aligning the whitelist with the actual capabilities of the chosen OCR engine.I edited that conditional logic referenced above from
externaltodatalab_marker, and successfully uploaded an image, which proves this is where the problematic code lies.Suggested Fixes
RAG_ALLOWED_FILE_EXTENSIONSdynamic based on the file types the selected Content Extraction Engine actually supports, for example:datalab_markerhas amime_mapthat lists supported MIME types defined in theretrieval/datalab_marker.pyfile. Each loader would need amime_mapthat's specific to that loader.RAG_ALLOWED_FILE_EXTENSIONStruly allow all file types by default when left empty (including video/image/audio/etc), and enable the end user to white list file types manually via "Allowed File Extensions" option, without any pre-existing restrictions. If the selected engine encounters an error trying to upload a file to knowledge, return a generic "file type not supported" message.Either option will allow users to upload any file type that their chosen OCR engine actually supports.
@licaon-kter commented on GitHub (Jul 3, 2025):
Having the same issue with Tika, it can OCR pictures but... open-webui rejects these file right away.