mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[PR #18604] [CLOSED] Add DeepSeek OCR integration for document processing #40481
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/18604
Author: @devrandom
Created: 10/24/2025
Status: ❌ Closed
Base:
main← Head:claude/integrate-deepseek-ocr-011CURd6ZNYLEHnxfT6CB6Hf📝 Commits (1)
281b2ddAdd DeepSeek OCR integration for document processing📊 Changes
5 files changed (+478 additions, -0 deletions)
View changed files
📝
backend/open_webui/config.py(+37 -0)➕
backend/open_webui/retrieval/loaders/deepseek_ocr.py(+372 -0)📝
backend/open_webui/retrieval/loaders/main.py(+17 -0)📝
backend/open_webui/routers/retrieval.py(+51 -0)📝
pyproject.toml(+1 -0)📄 Description
This commit integrates DeepSeek OCR as a new content extraction engine using vLLM server with OpenAI-compatible API.
Changes:
Added DeepSeekOCRLoader class for API-based OCR processing
Added relevant configuration variables in config.py.
Integrated into loader factory (main.py):
Updated retrieval router API:
Added pdf2image dependency for PDF processing
Usage:
Set environment variables:
CONTENT_EXTRACTION_ENGINE=deepseek_ocr DEEPSEEK_OCR_API_BASE_URL=http://your-gpu-server:8000/v1 DEEPSEEK_OCR_API_KEY=your-secret-key
🤖 Generated with Claude Code
Pull Request Checklist
Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.
Before submitting, make sure you've checked the following:
devbranch. Not targeting thedevbranch may lead to immediate closure of the PR.Changelog Entry
Description
Added
Changed
Deprecated
Removed
Fixed
Security
Breaking Changes
Additional Information
Screenshots or Videos
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.