[GH-ISSUE #18495] feat: Add configurable timeout for MinerU document processing API #18614

Closed
opened 2026-04-20 00:49:43 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @manu-benoit on GitHub (Oct 21, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18495

Check Existing Issues

  • I have searched all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Problem Description

The integration of MinerU is very cool! I'm using it for all documents, but I was having API timeout.
MinerU document loader has a hardcoded 300-second (5 minute) timeout that causes failures when processing large PDF documents (>500MB, 1000+ pages).

ReadTimeoutError: HTTPConnectionPool(host='192.168.77.34', port=8000): Read timed out. (read timeout=300)

The timeout occurs while MinerU is still actively processing the document, resulting in failed uploads even though the backend service is functioning correctly.

Desired Solution you'd like

Add a configurable timeout setting for MinerU API calls, similar to existing timeout configurations for other services.

Proposed implementation:
Environment Variable:
MINERU_API_TIMEOUT=7200 # Default: 300, in seconds

Configuration in config.py:
MINERU_API_TIMEOUT = int(os.getenv("MINERU_API_TIMEOUT", "300"))
Use in retrieval/loaders/mineru.py:

Current (Line 120):
response = requests.post(
f"{self.api_url}/file_parse",
data=form_data,
files=files,
timeout=300, # Hardcoded
)

Proposed:
from open_webui.config import MINERU_API_TIMEOUT

response = requests.post(
f"{self.api_url}/file_parse",
data=form_data,
files=files,
timeout=MINERU_API_TIMEOUT, # Configurable
)

The same change should apply to:

  • Line 120: file_parse endpoint
  • Line 322: Large file upload

Alternatives Considered

  1. Database configuration: Store timeout in config table alongside other MinerU settings
  2. Admin UI setting: Add timeout field under Documents > Content Extraction settings
  3. Per-request timeout: Allow timeout to be specified in API request

Additional Context

Related Issues:

  • #17247 (Docling timeout - same issue)
  • #15023 (File upload async processing)
  • #11345 (5 minute streaming timeout)

Use Case:
Processing engineering documents (building codes, standards) that are 750MB+ with 1200+ pages requires 15-20 minutes of processing time. Current 5-minute timeout makes these documents impossible to process.

Originally created by @manu-benoit on GitHub (Oct 21, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/18495 ### Check Existing Issues - [x] I have searched all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request. ### Problem Description The integration of MinerU is very cool! I'm using it for all documents, but I was having API timeout. MinerU document loader has a hardcoded 300-second (5 minute) timeout that causes failures when processing large PDF documents (>500MB, 1000+ pages). ReadTimeoutError: HTTPConnectionPool(host='192.168.77.34', port=8000): Read timed out. (read timeout=300) The timeout occurs while MinerU is still actively processing the document, resulting in failed uploads even though the backend service is functioning correctly. ### Desired Solution you'd like Add a configurable timeout setting for MinerU API calls, similar to existing timeout configurations for other services. **Proposed implementation:** **Environment Variable:** MINERU_API_TIMEOUT=7200 # Default: 300, in seconds **Configuration in `config.py`:** MINERU_API_TIMEOUT = int(os.getenv("MINERU_API_TIMEOUT", "300")) **Use in `retrieval/loaders/mineru.py`:** **Current (Line 120):** response = requests.post( f"{self.api_url}/file_parse", data=form_data, files=files, timeout=300, # Hardcoded ) **Proposed:** from open_webui.config import MINERU_API_TIMEOUT response = requests.post( f"{self.api_url}/file_parse", data=form_data, files=files, timeout=MINERU_API_TIMEOUT, # Configurable ) The same change should apply to: - Line 120: `file_parse` endpoint - Line 322: Large file upload ### Alternatives Considered 1. **Database configuration:** Store timeout in `config` table alongside other MinerU settings 2. **Admin UI setting:** Add timeout field under Documents > Content Extraction settings 3. **Per-request timeout:** Allow timeout to be specified in API request ### Additional Context **Related Issues:** - #17247 (Docling timeout - same issue) - #15023 (File upload async processing) - #11345 (5 minute streaming timeout) **Use Case:** Processing engineering documents (building codes, standards) that are 750MB+ with 1200+ pages requires 15-20 minutes of processing time. Current 5-minute timeout makes these documents impossible to process.
Author
Owner

@tjbck commented on GitHub (Oct 23, 2025):

PR welcome.

<!-- gh-comment-id:3437912831 --> @tjbck commented on GitHub (Oct 23, 2025): PR welcome.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#18614