issue: Document processing (Tika) times out silently after 60 s from frontend #5568

Closed
opened 2025-11-11 16:24:42 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @jakehlee on GitHub (Jun 17, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.15

Ollama Version (if applicable)

v0.9.0

Operating System

RHEL 9.5

Browser (if applicable)

Chrome 137.0

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

  1. The file upload should have failed with an error message indicating that the document processing has timed out, OR
  2. There should be a configuration to increase the frontend timeout for document processing.

Actual Behavior

After dragging and dropping the file into the browser, a spinning file icon indicated that the model was in the process of uploading. After 60 seconds, this icon disappeared without a warning message, despite the logs indicating that Tika was still processing the file, and eventually added to the document store.

Steps to Reproduce

  1. Run the docker://ollama/ollama image with its configuration.

  2. Run the docker://ghcr.io/open-webui/open-webui:main image with the following relevant environmental variable configurations:

AIOHTTP_CLIENT_TIMEOUT=300
ENABLE_WEBSOCKET_SUPPORT=True
PDF_EXTRACT_IMAGES=True
CONTENT_EXTRACTION_ENGINE=Tika
TIKA_SERVER_URL="http://localhost:9998"
  1. Run the docker://apache/tika:latest-full image with the following custom configuration:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <server>
    <taskTimeoutMillis>300000</taskTimeoutMillis> <!-- 5 minutes -->
  </server>
</properties>
  1. Run the docker://nginx:alpine image with the following custom configuration:
server {
    listen 443 ssl;
    server_name example.com;

    ssl_certificate example.crt;
    ssl_certificate_key example.key;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_buffering off;

        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        client_max_body_size 50M;
        proxy_read_timeout 5m;
        proxy_connect_timeout 5m;
        proxy_send_timeout 5m;
    }
}

Drag and drop a large PDF with many OCR images into open-webui.

Logs & Screenshots

The following is all shown in chronological order.

Immediately after file drag-and-drop:

Image

Server logs:

(first time log message)

[...] | INFO  [qtp1727424614-113] org.apache.tika.parser.ocr.TesseractOCRParser Tesseract is installed and is being invoked. This can add greatly to processing time.  If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr

Subsequently,

2025-06-17 02:25:56.739 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.20.2:0 - "GET /_app/version.json HTTP/1.1" 200 - {}
192.168.20.2 - - [17/Jun/2025:02:25:56 -0700] "GET /_app/version.json HTTP/1.1" 200 54 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"
2025-06-17 02:26:12.900 | INFO     | open_webui.routers.files:upload_file:94 - file.content_type: application/pdf - {}
INFO  [qtp1727424614-96] 09:26:12,913 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf)
2025-06-17 02:26:57.739 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.20.2:0 - "GET /_app/version.json HTTP/1.1" 200 - {}
192.168.20.2 - - [17/Jun/2025:02:26:57 -0700] "GET /_app/version.json HTTP/1.1" 200 54 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"
192.168.20.2 - - [17/Jun/2025:02:27:12 -0700] "POST /api/v1/files/ HTTP/1.1" 499 0 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"

At this point, the file upload disappears silently from open-webui without a warning message, with the following in the console logs:

           POST https://example.com/api/v1/files/ net::ERR_EMPTY_RESPONSE
window.fetch @ fetcher.js:76
s @ index.ts:12
kt @ MessageInput.svelte:277
(anonymous) @ MessageInput.svelte:398
_t @ MessageInput.svelte:323
fn @ MessageInput.svelte:426
index.ts:26 TypeError: Failed to fetch
    at window.fetch (fetcher.js:76:10)
    at s (index.ts:12:20)
    at kt (MessageInput.svelte:277:31)
    at MessageInput.svelte:398:5
    at Array.forEach (<anonymous>)
    at _t (MessageInput.svelte:323:14)
    at HTMLDivElement.fn (MessageInput.svelte:426:5)
(anonymous) @ index.ts:26
Promise.catch
s @ index.ts:24
kt @ MessageInput.svelte:277
(anonymous) @ MessageInput.svelte:398
_t @ MessageInput.svelte:323
fn @ MessageInput.svelte:426

Meanwhile, the backend keeps going:

2025-06-17 02:27:56.462 | INFO     | open_webui.routers.retrieval:save_docs_to_vector_db:1125 - save_docs_to_vector_db: document test.pdf file-3a6c2f72-8aae-4826-8115-c0c341a34ad1 - {}
2025-06-17 02:27:56.467 | INFO     | open_webui.routers.retrieval:save_docs_to_vector_db:1208 - adding to collection file-3a6c2f72-8aae-4826-8115-c0c341a34ad1 - {}
Batches:   0%|          | 0/3 [00:00<?, ?it/s]2025-06-17 02:27:58.727 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.20.2:0 - "GET /_app/version.json HTTP/1.1" 200 - {}
192.168.20.2 - - [17/Jun/2025:02:27:58 -0700] "GET /_app/version.json HTTP/1.1" 200 54 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36"
Batches: 100%|██████████| 3/3 [00:05<00:00,  1.79s/it]

Indicating that Tika did, in fact, return a valid extraction and that it was added to the vector_db; however, it remains missing from the frontend.

Additional Information

  • Is the NGINX reverse proxy timing out?
    • No, only open-webui is behind the reverse proxy, Tika and OWUI are on the same localhost network.
  • Is Tika accessible?
    • Yes, it is clearly working on the provided PDF, and it works for other smaller PDFs without OCR.
  • AIOHTTP_CLIENT_TIMEOUT?
    • It was set to 300.
  • NGINX timeout?
    • All timeouts were set to 5m.
  • Tika has a task timeout of 60 seconds.
    • The configuration I shared has it set to 300, and it does finish running.

Thank you!!!

Originally created by @jakehlee on GitHub (Jun 17, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.15 ### Ollama Version (if applicable) v0.9.0 ### Operating System RHEL 9.5 ### Browser (if applicable) Chrome 137.0 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior 1. The file upload should have failed with an error message indicating that the document processing has timed out, OR 2. There should be a configuration to increase the frontend timeout for document processing. ### Actual Behavior After dragging and dropping the file into the browser, a spinning file icon indicated that the model was in the process of uploading. After 60 seconds, this icon disappeared without a warning message, despite the logs indicating that Tika was still processing the file, and eventually added to the document store. ### Steps to Reproduce 1. Run the `docker://ollama/ollama` image with its configuration. 2. Run the `docker://ghcr.io/open-webui/open-webui:main` image with the following relevant environmental variable configurations: ``` AIOHTTP_CLIENT_TIMEOUT=300 ENABLE_WEBSOCKET_SUPPORT=True PDF_EXTRACT_IMAGES=True CONTENT_EXTRACTION_ENGINE=Tika TIKA_SERVER_URL="http://localhost:9998" ``` 3. Run the `docker://apache/tika:latest-full` image with the following custom configuration: ``` <?xml version="1.0" encoding="UTF-8"?> <properties> <server> <taskTimeoutMillis>300000</taskTimeoutMillis> <!-- 5 minutes --> </server> </properties> ``` 4. Run the `docker://nginx:alpine` image with the following custom configuration: ``` server { listen 443 ssl; server_name example.com; ssl_certificate example.crt; ssl_certificate_key example.key; location / { proxy_pass http://localhost:3000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_buffering off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; client_max_body_size 50M; proxy_read_timeout 5m; proxy_connect_timeout 5m; proxy_send_timeout 5m; } } ``` Drag and drop a large PDF with many OCR images into open-webui. ### Logs & Screenshots The following is all shown in chronological order. Immediately after file drag-and-drop: <img width="773" alt="Image" src="https://github.com/user-attachments/assets/e21dd746-af97-4d55-8a5a-58d5f5a6c564" /> Server logs: (first time log message) ``` [...] | INFO [qtp1727424614-113] org.apache.tika.parser.ocr.TesseractOCRParser Tesseract is installed and is being invoked. This can add greatly to processing time. If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr ``` Subsequently, ``` 2025-06-17 02:25:56.739 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.20.2:0 - "GET /_app/version.json HTTP/1.1" 200 - {} 192.168.20.2 - - [17/Jun/2025:02:25:56 -0700] "GET /_app/version.json HTTP/1.1" 200 54 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36" 2025-06-17 02:26:12.900 | INFO | open_webui.routers.files:upload_file:94 - file.content_type: application/pdf - {} INFO [qtp1727424614-96] 09:26:12,913 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) 2025-06-17 02:26:57.739 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.20.2:0 - "GET /_app/version.json HTTP/1.1" 200 - {} 192.168.20.2 - - [17/Jun/2025:02:26:57 -0700] "GET /_app/version.json HTTP/1.1" 200 54 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36" 192.168.20.2 - - [17/Jun/2025:02:27:12 -0700] "POST /api/v1/files/ HTTP/1.1" 499 0 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36" ``` At this point, the file upload disappears silently from open-webui without a warning message, with the following in the console logs: ``` POST https://example.com/api/v1/files/ net::ERR_EMPTY_RESPONSE window.fetch @ fetcher.js:76 s @ index.ts:12 kt @ MessageInput.svelte:277 (anonymous) @ MessageInput.svelte:398 _t @ MessageInput.svelte:323 fn @ MessageInput.svelte:426 index.ts:26 TypeError: Failed to fetch at window.fetch (fetcher.js:76:10) at s (index.ts:12:20) at kt (MessageInput.svelte:277:31) at MessageInput.svelte:398:5 at Array.forEach (<anonymous>) at _t (MessageInput.svelte:323:14) at HTMLDivElement.fn (MessageInput.svelte:426:5) (anonymous) @ index.ts:26 Promise.catch s @ index.ts:24 kt @ MessageInput.svelte:277 (anonymous) @ MessageInput.svelte:398 _t @ MessageInput.svelte:323 fn @ MessageInput.svelte:426 ``` Meanwhile, the backend keeps going: ``` 2025-06-17 02:27:56.462 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1125 - save_docs_to_vector_db: document test.pdf file-3a6c2f72-8aae-4826-8115-c0c341a34ad1 - {} 2025-06-17 02:27:56.467 | INFO | open_webui.routers.retrieval:save_docs_to_vector_db:1208 - adding to collection file-3a6c2f72-8aae-4826-8115-c0c341a34ad1 - {} Batches: 0%| | 0/3 [00:00<?, ?it/s]2025-06-17 02:27:58.727 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.20.2:0 - "GET /_app/version.json HTTP/1.1" 200 - {} 192.168.20.2 - - [17/Jun/2025:02:27:58 -0700] "GET /_app/version.json HTTP/1.1" 200 54 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36" Batches: 100%|██████████| 3/3 [00:05<00:00, 1.79s/it] ``` Indicating that Tika did, in fact, return a valid extraction and that it was added to the vector_db; however, it remains missing from the frontend. ### Additional Information * Is the NGINX reverse proxy timing out? - No, only open-webui is behind the reverse proxy, Tika and OWUI are on the same localhost network. * Is Tika accessible? - Yes, it is clearly working on the provided PDF, and it works for other smaller PDFs without OCR. * AIOHTTP_CLIENT_TIMEOUT? - It was set to 300. * NGINX timeout? - All timeouts were set to 5m. * Tika has a task timeout of 60 seconds. - The configuration I shared has it set to 300, and it does finish running. Thank you!!!
GiteaMirror added the bug label 2025-11-11 16:24:42 -06:00
Author
Owner

@jakehlee commented on GitHub (Jun 17, 2025):

Disregard - this seems to be an issue with browser configuration. It will happily wait for several minutes with a different browser.

@jakehlee commented on GitHub (Jun 17, 2025): Disregard - this seems to be an issue with browser configuration. It will happily wait for several minutes with a different browser.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5568