mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #18495] feat: Add configurable timeout for MinerU document processing API #18614
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @manu-benoit on GitHub (Oct 21, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18495
Check Existing Issues
Problem Description
The integration of MinerU is very cool! I'm using it for all documents, but I was having API timeout.
MinerU document loader has a hardcoded 300-second (5 minute) timeout that causes failures when processing large PDF documents (>500MB, 1000+ pages).
ReadTimeoutError: HTTPConnectionPool(host='192.168.77.34', port=8000): Read timed out. (read timeout=300)
The timeout occurs while MinerU is still actively processing the document, resulting in failed uploads even though the backend service is functioning correctly.
Desired Solution you'd like
Add a configurable timeout setting for MinerU API calls, similar to existing timeout configurations for other services.
Proposed implementation:
Environment Variable:
MINERU_API_TIMEOUT=7200 # Default: 300, in seconds
Configuration in
config.py:MINERU_API_TIMEOUT = int(os.getenv("MINERU_API_TIMEOUT", "300"))
Use in
retrieval/loaders/mineru.py:Current (Line 120):
response = requests.post(
f"{self.api_url}/file_parse",
data=form_data,
files=files,
timeout=300, # Hardcoded
)
Proposed:
from open_webui.config import MINERU_API_TIMEOUT
response = requests.post(
f"{self.api_url}/file_parse",
data=form_data,
files=files,
timeout=MINERU_API_TIMEOUT, # Configurable
)
The same change should apply to:
file_parseendpointAlternatives Considered
configtable alongside other MinerU settingsAdditional Context
Related Issues:
Use Case:
Processing engineering documents (building codes, standards) that are 750MB+ with 1200+ pages requires 15-20 minutes of processing time. Current 5-minute timeout makes these documents impossible to process.
@tjbck commented on GitHub (Oct 23, 2025):
PR welcome.