mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #20948] feat: Preserve File Metadata in Pipelines & Implement Customizable Loader Hooks #58006
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @burakkilic11 on GitHub (Jan 26, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20948
Check Existing Issues
Verify Feature Scope
Problem Description
Currently, when a file is uploaded via the chat "+" button, the internal document loader automatically processes the file, extracts the text, and strips all original file metadata (path, id, files object) before the request reaches the Pipelines (Filters).
Even with a shared volume setup (e.g., mapping /app/backend/data/uploads to both OpenWebUI and Pipelines containers), there is no way for a filter to know which file belongs to the current message because the references are removed from the body and kwargs. This prevents developers from implementing custom OCR (Tesseract), specialized layout analysis, or private local processing within the Pipeline framework.
Desired Solution you'd like
I would like to see two main improvements:
Metadata Preservation: Ensure the original files object (containing id, filename, and path) remains included in the body payload sent to Pipelines, even after the document loader has processed the file.
Loader Hooks: Implement a mechanism that allows Pipelines to "hook" into or override the core document loader stage. This would enable developers to replace the default text extraction logic with custom solutions (like specialized local OCR) directly within the OpenWebUI ecosystem.
Alternatives Considered
Shared Volumes: We tried using shared volumes to access the files directly, but since the file_id or path is stripped from the JSON payload, the Pipeline cannot identify the correct file on disk.
Manual RAG: We considered disabling RAG, but the document loader still executes by default upon upload, resulting in the same metadata loss.
Additional Context
The current payload received by the pipeline is too stripped down for advanced processing:
{
"stream": false,
"model": "your_model_id",
"messages": [{
"role": "user",
"content": "### Task: ... [Extracted text is present, but file references are missing] ..."
}],
"user": { "name": "...", "role": "admin" }
}
Providing a way to intercept the file before or during the loading phase would significantly expand the extensibility of OpenWebUI.