mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-08 12:58:11 -05:00
[GH-ISSUE #14729] issue: default document loader can't handle some PDF articles #32877
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @astroboylrx on GitHub (Jun 6, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/14729
Check Existing Issues
Installation Method
Pip Install
Open WebUI Version
v0.6.13
Ollama Version (if applicable)
No response
Operating System
macOS Sequoia 15.5
Browser (if applicable)
Chrome 137.0.7151.68, Safari 18.5
Confirmation
README.md.Expected Behavior
Drag a PDF to a chat with any model, the backend should extract text from that PDF.
Actual Behavior
For certain PDF files, the default loader fails to extract text from PDF.
Steps to Reproduce
Drag this PDF to your WebUI interface:
s41550-023-01945-7.pdf
Logs & Screenshots
The only relevant log is:
Additional Information
No response
@tjbck commented on GitHub (Jun 6, 2025):
Tika is recommended.
@astroboylrx commented on GitHub (Jun 6, 2025):
Okay, let me rephrase.
Would it be possible to allow users to point PDF to an external loader but still pass other types of documents to the default loader?
@mykola-mmm commented on GitHub (Jun 27, 2025):
@tjbck @astroboylrx I have encountered the same issue when working with the internal documentation of my company, updating the langchain/lagchain-community versions to the newest one helped