[PR #12307] [MERGED] feat: Added support for Mistral OCR for Content Extraction #61752

Closed
opened 2026-05-06 05:24:02 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/12307
Author: @paddy313
Created: 4/1/2025
Status: Merged
Merged: 4/2/2025
Merged by: @tjbck

Base: devHead: feature/mistral_ocr


📝 Commits (4)

  • 1ac6879 Add Mistral OCR integration and configuration support
  • 93d7702 refactor: move MistralLoader to a separate module and just use the requests package instead of mistralai
  • c5a8d2f refactor: update MistralLoader documentation and adjust parameters for signed URL retrieval
  • 0ac00b9 refactor: update import path for MistralLoader

📊 Changes

6 files changed (+282 additions, -5 deletions)

View changed files

📝 backend/open_webui/config.py (+5 -0)
📝 backend/open_webui/main.py (+2 -0)
📝 backend/open_webui/retrieval/loaders/main.py (+12 -0)
backend/open_webui/retrieval/loaders/mistral.py (+225 -0)
📝 backend/open_webui/routers/retrieval.py (+16 -0)
📝 src/lib/components/admin/Settings/Documents.svelte (+22 -5)

📄 Description

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

  • Target branch: Please verify that the pull request targets the dev branch.
  • Description: Provide a concise description of the changes made in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests to validate the changes?
  • Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
  • Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
    • feat: Introduces a new feature or enhancement to the codebase

Changelog Entry

Description

As proposed in discussion #11386 and #12062, this PR introduces Mistral OCR support for PDF content extraction within Open Web UI. Mistral OCR excels at interpreting complex document elements—including images, math, tables, and LaTeX layouts—enabling deeper understanding of rich documents like scientific papers with charts, graphs, and equations.

I added support for Mistral OCR integration into the backend and frontend of Open Web UI. The changes include adding configuration options, implementing the Mistral OCR loader, and updating the relevant components to handle the new OCR engine.

Added

  • Added MISTRAL_OCR_API_KEY configuration in backend/open_webui/config.py and updated the application state to include this key (backend/open_webui/config.py, backend/open_webui/main.py, backend/open_webui/routers/retrieval.py).
  • Implemented MistralLoader class for handling OCR processing via Mistral API in backend/open_webui/retrieval/loaders/mistral.py.
  • Added Mistral OCR configuration fields and validation in src/lib/components/admin/Settings/Documents.svelte.

Changed

  • Changed the _get_loader method in backend/open_webui/retrieval/loaders/main.py to include the Mistral OCR loader.

Deprecated

  • None

Removed

  • None

Fixed

  • None

Security

  • None

Breaking Changes

  • None

Additional Information

  • Implemented just with requests. No additional python package needed.
  • The document is divided into pages.

Screenshots or Videos

image


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/12307 **Author:** [@paddy313](https://github.com/paddy313) **Created:** 4/1/2025 **Status:** ✅ Merged **Merged:** 4/2/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `feature/mistral_ocr` --- ### 📝 Commits (4) - [`1ac6879`](https://github.com/open-webui/open-webui/commit/1ac6879268e3526bf71006b46b6cc45d281eda87) Add Mistral OCR integration and configuration support - [`93d7702`](https://github.com/open-webui/open-webui/commit/93d7702e8c889361e4198a12038fdcccd5f83505) refactor: move MistralLoader to a separate module and just use the requests package instead of mistralai - [`c5a8d2f`](https://github.com/open-webui/open-webui/commit/c5a8d2f8571a801dffd4795eec2eb616cf9260d3) refactor: update MistralLoader documentation and adjust parameters for signed URL retrieval - [`0ac00b9`](https://github.com/open-webui/open-webui/commit/0ac00b92569bd023de08c73663e6fe2314564e11) refactor: update import path for MistralLoader ### 📊 Changes **6 files changed** (+282 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+5 -0) 📝 `backend/open_webui/main.py` (+2 -0) 📝 `backend/open_webui/retrieval/loaders/main.py` (+12 -0) ➕ `backend/open_webui/retrieval/loaders/mistral.py` (+225 -0) 📝 `backend/open_webui/routers/retrieval.py` (+16 -0) 📝 `src/lib/components/admin/Settings/Documents.svelte` (+22 -5) </details> ### 📄 Description # Pull Request Checklist ### Note to first-time contributors: Please open a discussion post in [Discussions](https://github.com/open-webui/open-webui/discussions) and describe your changes before submitting a pull request. **Before submitting, make sure you've checked the following:** - [x] **Target branch:** Please verify that the pull request targets the `dev` branch. - [x] **Description:** Provide a concise description of the changes made in this pull request. - [x] **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description. - [ ] **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources? - [x] **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation? - [ ] **Testing:** Have you written and run sufficient tests to validate the changes? - [x] **Code review:** Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards? - [x] **Prefix:** To clearly categorize this pull request, prefix the pull request title using one of the following: - **feat**: Introduces a new feature or enhancement to the codebase # Changelog Entry ### Description As proposed in discussion #11386 and #12062, this PR introduces Mistral OCR support for PDF content extraction within Open Web UI. [Mistral OCR](https://mistral.ai/news/mistral-ocr) excels at interpreting complex document elements—including images, math, tables, and LaTeX layouts—enabling deeper understanding of rich documents like scientific papers with charts, graphs, and equations. I added support for Mistral OCR integration into the backend and frontend of Open Web UI. The changes include adding configuration options, implementing the Mistral OCR loader, and updating the relevant components to handle the new OCR engine. ### Added - Added `MISTRAL_OCR_API_KEY` configuration in `backend/open_webui/config.py` and updated the application state to include this key (`backend/open_webui/config.py`, `backend/open_webui/main.py`, `backend/open_webui/routers/retrieval.py`). - Implemented `MistralLoader` class for handling OCR processing via Mistral API in `backend/open_webui/retrieval/loaders/mistral.py`. - Added Mistral OCR configuration fields and validation in `src/lib/components/admin/Settings/Documents.svelte`. ### Changed - Changed the `_get_loader` method in `backend/open_webui/retrieval/loaders/main.py` to include the Mistral OCR loader. ### Deprecated - None ### Removed - None ### Fixed - None ### Security - None ### Breaking Changes - None --- ### Additional Information - Implemented just with requests. No additional python package needed. - The document is divided into pages. ### Screenshots or Videos ![image](https://github.com/user-attachments/assets/63106c91-e4d7-4ead-bc5f-15018da1e233) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 05:24:02 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#61752