mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #18531] issue: Compatibility Issues with Newer Tika Versions Break Tika Server Access and OCR Functionality in Non-Docker Deployments #57292
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @AnomeZ on GitHub (Oct 23, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18531
Check Existing Issues
Installation Method
Pip Install
Open WebUI Version
v0.6.34
Ollama Version (if applicable)
No response
Operating System
Ubuntu 22.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Due to constraints on my server, I am unable to deploy any applications or services using Docker. Consequently, I deployed Open WebUI via pipas outlined in the documentation, and also deployed the Apache Tika service in a non-Docker manner. However, when I attempted to use the OCR functionality by uploading a scanned PDF, I found that Open WebUI could not properly connect to or utilize the Tika service.
After verifying that my configuration and port settings were correct, I began examining the Open WebUI source code. I identified the issue within the file open_webui/open_webui/retrieval/loaders/main.py, specifically in the TikaLoaderclass.
The problem lies in the request endpoint that TikaLoadersends to Tika. It uses the path http://localhost:9998/tika/text, but newer versions of Tika do not have this "/text"endpoint. Furthermore, the TikaLoaderclass was missing some necessary headers parameters, which prevented the OCR functionality from being invoked correctly.
After modifying the request path and adding the required headers parameters (X-Tika-PDFOcrStrategyand X-Tika-OCRLanguage), I was able to use the Tika service normally within Open WebUI.
I hope the Open WebUI team can update this part of the code to enhance its robustness and compatibility with newer Tika versions
Actual Behavior
Error calling Tika: Not Found
Steps to Reproduce
1.Install Ubuntu 22.04 on your server.
2.Install Open WebUI using pip.
3.Install OpenJDK 11 using the command sudo apt-get install openjdk-11-jdk, then download and run the Tika server with java -jar tika-server-standard-3.1.0.jar.
4.Open Open WebUI, navigate to the settings, and configure the Tika server URL, typically as http://localhost:9998/tika.
5.Start a new chat session, drag and drop a scanned PDF file to upload it.
Logs & Screenshots
HTTPConnectionPool(host='127.0.0.1', port=80): Max retries exceeded with url: /:9998/tika/tika/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa3900fad50>: Failed to establish a new connection: [Errno 111] Connection refused'))
and Here is my modified code:
Additional Information
If you would like me to provide more details about the root cause of this issue or elaborate further on my solution, please feel free to leave a comment below.