[PR #1050] [MERGED] feat: added ocr functionality to the pdf loader #7360

Closed
opened 2025-11-11 17:24:28 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/1050
Author: @jannikstdl
Created: 3/5/2024
Status: Merged
Merged: 3/6/2024
Merged by: @tjbck

Base: devHead: rag-pdf-ocr


📝 Commits (1)

  • 089a63e feat: added ocr functionality to the pdf loader

📊 Changes

2 files changed (+2 additions, -1 deletions)

View changed files

📝 backend/apps/rag/main.py (+1 -1)
📝 backend/requirements.txt (+1 -0)

📄 Description

Added OCR functionality to the PDF loader.

File used for the test:
This is a pdf file with a screenshot.pdf

Before:
image

After:
image

Ressources:
langchain


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/1050 **Author:** [@jannikstdl](https://github.com/jannikstdl) **Created:** 3/5/2024 **Status:** ✅ Merged **Merged:** 3/6/2024 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `rag-pdf-ocr` --- ### 📝 Commits (1) - [`089a63e`](https://github.com/open-webui/open-webui/commit/089a63e0c68bb1c2693a15a8469a278dc358b111) feat: added ocr functionality to the pdf loader ### 📊 Changes **2 files changed** (+2 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/apps/rag/main.py` (+1 -1) 📝 `backend/requirements.txt` (+1 -0) </details> ### 📄 Description Added OCR functionality to the PDF loader. File used for the test: [This is a pdf file with a screenshot.pdf](https://github.com/open-webui/open-webui/files/14501663/This.is.a.pdf.file.with.a.screenshot.pdf) Before: ![image](https://github.com/open-webui/open-webui/assets/69747628/b53f38c7-c363-4256-b72f-f90b785c3ceb) After: ![image](https://github.com/open-webui/open-webui/assets/69747628/b01c876d-07fb-462f-bb67-af08b7579cdd) Ressources: [langchain](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf#extracting-images) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-11 17:24:28 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#7360