feat: Support selecting OCR engine for Docling Server #4929

Closed
opened 2025-11-11 16:06:47 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @kpchen on GitHub (Apr 22, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

I have configured the Docling which is so powerful as the document extraction tool on open webui, but I’m unable to select the OCR engine and language on openwebui,which is a very useful feature in Docling.
Image

Desired Solution you'd like

there seems a parameter named "ocr_engine" in docling serve api documentation, which we can expose to openwebui.

Image

Alternatives Considered

No response

Additional Context

No response

Originally created by @kpchen on GitHub (Apr 22, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description I have configured the Docling which is so powerful as the document extraction tool on open webui, but I’m unable to select the OCR engine and language on openwebui,which is a very useful feature in Docling. <img width="1485" alt="Image" src="https://github.com/user-attachments/assets/312d65b5-b59e-4ec2-870c-6ffd3abd4d69" /> ### Desired Solution you'd like there seems a parameter named "ocr_engine" in docling serve api documentation, which we can expose to openwebui. <img width="750" alt="Image" src="https://github.com/user-attachments/assets/66e0f4ff-dbf0-44ce-bf5a-2948032a6887" /> ### Alternatives Considered _No response_ ### Additional Context _No response_
Author
Owner

@athoik commented on GitHub (Apr 26, 2025):

Hi,

Needed exactly the same.

Manually you can add the ocr engine and ocr lang in params.

            params = {
                "image_export_mode": "placeholder",
                "table_mode": "accurate",
                "ocr_engine": "tesseract",
                "ocr_lang": ['ell','eng'],
            }

Here is a quick way to retrieve ocr engine and ocr lang using environment variables.

diff --git a/backend/open_webui/retrieval/loaders/main.py b/backend/open_webui/retrieval/loaders/main.py
index 24944bd8a..dd68e52a8 100644
--- a/backend/open_webui/retrieval/loaders/main.py
+++ b/backend/open_webui/retrieval/loaders/main.py
@@ -2,6 +2,7 @@ import requests
 import logging
 import ftfy
 import sys
+import os

 from langchain_community.document_loaders import (
     AzureAIDocumentIntelligenceLoader,
@@ -141,6 +142,16 @@ class DoclingLoader:
                 "table_mode": "accurate",
             }

+            # Read additional docling environment variables
+            ocr_engine_env = os.getenv("DOCLING_OCR_ENGINE", "").strip()
+            ocr_lang_env = os.getenv("DOCLING_OCR_LANG", "").strip()
+
+            if ocr_engine_env and ocr_lang_env:
+                params["ocr_engine"] = ocr_engine_env
+                params["ocr_lang"] = [
+                    lang.strip() for lang in ocr_lang_env.split(",") if lang.strip()
+                ]
+
             endpoint = f"{self.url}/v1alpha/convert/file"
             r = requests.post(endpoint, files=files, data=params)

@tjbck is the approach acceptable? I can create a PR.

@athoik commented on GitHub (Apr 26, 2025): Hi, Needed exactly the same. Manually you can add the ocr engine and ocr lang in params. ``` params = { "image_export_mode": "placeholder", "table_mode": "accurate", "ocr_engine": "tesseract", "ocr_lang": ['ell','eng'], } ``` Here is a quick way to retrieve ocr engine and ocr lang using environment variables. ``` diff --git a/backend/open_webui/retrieval/loaders/main.py b/backend/open_webui/retrieval/loaders/main.py index 24944bd8a..dd68e52a8 100644 --- a/backend/open_webui/retrieval/loaders/main.py +++ b/backend/open_webui/retrieval/loaders/main.py @@ -2,6 +2,7 @@ import requests import logging import ftfy import sys +import os from langchain_community.document_loaders import ( AzureAIDocumentIntelligenceLoader, @@ -141,6 +142,16 @@ class DoclingLoader: "table_mode": "accurate", } + # Read additional docling environment variables + ocr_engine_env = os.getenv("DOCLING_OCR_ENGINE", "").strip() + ocr_lang_env = os.getenv("DOCLING_OCR_LANG", "").strip() + + if ocr_engine_env and ocr_lang_env: + params["ocr_engine"] = ocr_engine_env + params["ocr_lang"] = [ + lang.strip() for lang in ocr_lang_env.split(",") if lang.strip() + ] + endpoint = f"{self.url}/v1alpha/convert/file" r = requests.post(endpoint, files=files, data=params) ``` @tjbck is the approach acceptable? I can create a PR.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#4929