[GH-ISSUE #18531] issue: Compatibility Issues with Newer Tika Versions Break Tika Server Access and OCR Functionality in Non-Docker Deployments #18626

Closed
opened 2026-04-20 00:50:20 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @AnomeZ on GitHub (Oct 23, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18531

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Pip Install

Open WebUI Version

v0.6.34

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Due to constraints on my server, I am unable to deploy any applications or services using Docker. Consequently, I deployed Open WebUI via pipas outlined in the documentation, and also deployed the Apache Tika service in a non-Docker manner. However, when I attempted to use the OCR functionality by uploading a scanned PDF, I found that Open WebUI could not properly connect to or utilize the Tika service.

After verifying that my configuration and port settings were correct, I began examining the Open WebUI source code. I identified the issue within the file open_webui/open_webui/retrieval/loaders/main.py, specifically in the TikaLoaderclass.

The problem lies in the request endpoint that TikaLoadersends to Tika. It uses the path http://localhost:9998/tika/text, but newer versions of Tika do not have this "/text"endpoint. Furthermore, the TikaLoaderclass was missing some necessary headers parameters, which prevented the OCR functionality from being invoked correctly.

After modifying the request path and adding the required headers parameters (X-Tika-PDFOcrStrategyand X-Tika-OCRLanguage), I was able to use the Tika service normally within Open WebUI.

I hope the Open WebUI team can update this part of the code to enhance its robustness and compatibility with newer Tika versions

Actual Behavior

Error calling Tika: Not Found

Steps to Reproduce

1.Install Ubuntu 22.04 on your server.

2.Install Open WebUI using pip.

3.Install OpenJDK 11 using the command sudo apt-get install openjdk-11-jdk, then download and run the Tika server with java -jar tika-server-standard-3.1.0.jar.

4.Open Open WebUI, navigate to the settings, and configure the Tika server URL, typically as http://localhost:9998/tika.

5.Start a new chat session, drag and drop a scanned PDF file to upload it.

Logs & Screenshots

HTTPConnectionPool(host='127.0.0.1', port=80): Max retries exceeded with url: /:9998/tika/tika/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa3900fad50>: Failed to establish a new connection: [Errno 111] Connection refused'))

and ​Here is my modified code:​​

Image

Additional Information

​If you would like me to provide more details about the root cause of this issue or elaborate further on my solution, please feel free to leave a comment below.​​

Originally created by @AnomeZ on GitHub (Oct 23, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/18531 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Pip Install ### Open WebUI Version v0.6.34 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Due to constraints on my server, I am unable to deploy any applications or services using Docker. Consequently, I deployed Open WebUI via pipas outlined in the documentation, and also deployed the Apache Tika service in a non-Docker manner. However, when I attempted to use the OCR functionality by uploading a scanned PDF, I found that Open WebUI could not properly connect to or utilize the Tika service. After verifying that my configuration and port settings were correct, I began examining the Open WebUI source code. I identified the issue within the file open_webui/open_webui/retrieval/loaders/main.py, specifically in the TikaLoaderclass. The problem lies in the request endpoint that TikaLoadersends to Tika. It uses the path http://localhost:9998/tika/text, but newer versions of Tika do not have this "/text"endpoint. Furthermore, the TikaLoaderclass was missing some necessary headers parameters, which prevented the OCR functionality from being invoked correctly. After modifying the request path and adding the required headers parameters (X-Tika-PDFOcrStrategyand X-Tika-OCRLanguage), I was able to use the Tika service normally within Open WebUI. I hope the Open WebUI team can update this part of the code to enhance its robustness and compatibility with newer Tika versions ### Actual Behavior Error calling Tika: Not Found ### Steps to Reproduce 1.Install Ubuntu 22.04 on your server. 2.Install Open WebUI using pip. 3.Install OpenJDK 11 using the command sudo apt-get install openjdk-11-jdk, then download and run the Tika server with java -jar tika-server-standard-3.1.0.jar. 4.Open Open WebUI, navigate to the settings, and configure the Tika server URL, typically as http://localhost:9998/tika. 5.Start a new chat session, drag and drop a scanned PDF file to upload it. ### Logs & Screenshots <!-- Failed to upload "a7a453e67f23e9e4d5dee9a1cb71d5c0.png" --> HTTPConnectionPool(host='127.0.0.1', port=80): Max retries exceeded with url: /:9998/tika/tika/text (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa3900fad50>: Failed to establish a new connection: [Errno 111] Connection refused')) and ​Here is my modified code:​​ <img width="1014" height="760" alt="Image" src="https://github.com/user-attachments/assets/f38bd88a-93ee-4f9c-a5a5-c87a6de8a3b7" /> ### Additional Information ​If you would like me to provide more details about the root cause of this issue or elaborate further on my solution, please feel free to leave a comment below.​​
GiteaMirror added the bug label 2026-04-20 00:50:20 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#18626