issue: Tika is unable to extract text from .rtf when it's attached from Windows browser #6606

Closed
opened 2025-11-11 17:00:52 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @mkhludnev on GitHub (Oct 7, 2025).

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

0.6.26

Ollama Version (if applicable)

No response

Operating System

Windows

Browser (if applicable)

Chrome

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Tika loader extracts text from rtf for browser running in Windows.

Actual Behavior

For Tika loader (at least) attaching .rtf file from Windows (Browser OS) leads to submitting wrong Content-Type: application/ms-word, which is passed to Tika, and causes an extraction error.

  1. If I send it with curl passing Content-Type: application/rtf or use a browser in Linux it works fine. It's known Windows issue.
  2. fwiw, same file compressed by .zip recognized by Tika successfully.

Steps to Reproduce

Enable Tika for content extraction.
Attach .rtf from Windows browser.

Logs & Screenshots

Additional Information

No response

Originally created by @mkhludnev on GitHub (Oct 7, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version 0.6.26 ### Ollama Version (if applicable) _No response_ ### Operating System Windows ### Browser (if applicable) Chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Tika loader extracts text from rtf for browser running in Windows. ### Actual Behavior For Tika loader (at least) attaching .rtf file from Windows (Browser OS) leads to submitting wrong `Content-Type: application/ms-word`, which is passed to Tika, and causes an extraction error. 1. If I send it with curl passing `Content-Type: application/rtf` or use a browser in Linux it works fine. It's known Windows issue. 2. fwiw, same file compressed by .zip recognized by Tika successfully. ### Steps to Reproduce Enable Tika for content extraction. Attach .rtf from Windows browser. ### Logs & Screenshots - ### Additional Information _No response_
GiteaMirror added the buggood first issuenon-core labels 2025-11-11 17:00:52 -06:00
Author
Owner

@mkhludnev commented on GitHub (Oct 7, 2025):

Quickcheck: it works if we just don't pass Content-type to tika/text, since it might be incorrect.

@mkhludnev commented on GitHub (Oct 7, 2025): Quickcheck: it works if we just don't pass Content-type to tika/text, since it might be incorrect.
Author
Owner

@tjbck commented on GitHub (Oct 28, 2025):

Open to PRs!

@tjbck commented on GitHub (Oct 28, 2025): Open to PRs!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6606