mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[PR #13518] [CLOSED] fix: Resolve AttributeError for TikaLoader by passing kwargs #23221
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/13518
Author: @kidd808
Created: 5/5/2025
Status: ❌ Closed
Base:
main← Head:fix-tika-loader-kwargs📝 Commits (1)
2e2c22fFix: Pass kwargs to TikaLoader during initialization📊 Changes
1 file changed (+3 additions, -1 deletions)
View changed files
📝
backend/open_webui/retrieval/loaders/main.py(+3 -1)📄 Description
Pull Request Checklist
Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.
Before submitting, make sure you've checked the following:
devbranch.Changelog Entry
Description
Fixes an AttributeError: 'TikaLoader' object has no attribute 'kwargs' that occurred during file processing when Tika is selected as the document engine. This error prevented configurations like PDF_EXTRACT_IMAGES from being correctly applied. The root cause was that the necessary keyword arguments (kwargs) containing these configurations were not being passed from the main Loader class to the TikaLoader instance during initialization. This PR ensures these arguments are correctly propagated.
Added
N/A
Changed
Modified Loader._get_loader method in backend/open_webui/retrieval/loaders/main.py to pass the self.kwargs dictionary to the TikaLoader constructor.
Modified TikaLoader.init method in backend/open_webui/retrieval/loaders/main.py to accept an optional loader_kwargs argument and store it as self.kwargs, initializing to an empty dict if not provided.
Deprecated
N/A
Removed
N/A
Fixed
Resolved AttributeError: 'TikaLoader' object has no attribute 'kwargs' when using Tika for document parsing, allowing configurations like PDF_EXTRACT_IMAGES to function correctly.
Security
N/A
Breaking Changes
N/A
Additional Information
This fix directly addresses the traceback reported where TikaLoader.load() attempted to access self.kwargs.get("PDF_EXTRACT_IMAGES") but self.kwargs did not exist on the object instance.
Manual testing confirmed that file uploads using the Tika engine now complete successfully without the AttributeError after applying these changes.
Although the template suggests opening a discussion first, this PR directly addresses a specific AttributeError identified from error logs and aims to restore previously working functionality.
Screenshots or Videos
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.