mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[PR #20447] [CLOSED] feat: docling page numbering #25626
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/20447
Author: @NikolasTh90
Created: 1/7/2026
Status: ❌ Closed
Base:
dev← Head:feature/docling-page-numbering📝 Commits (10+)
fe6783cMerge pull request #19030 from open-webui/devfc05e0aMerge pull request #19405 from open-webui/deve3faec6Merge pull request #19416 from open-webui/dev9899293Merge pull request #19448 from open-webui/dev140605eMerge pull request #19462 from open-webui/dev6f1486fMerge pull request #19466 from open-webui/devd95f533Merge pull request #19729 from open-webui/deva7271530.6.43 (#20093)0fd71c7feat: Add page numbering for Docling OCR citations3c90f74style: Apply black code formatting to Python files📊 Changes
5 files changed (+1010 additions, -8 deletions)
View changed files
📝
backend/open_webui/config.py(+6 -0)📝
backend/open_webui/retrieval/loaders/main.py(+146 -8)➕
backend/tests/test_docling_page_extraction.py(+274 -0)➕
docs/docling-page-extraction.md(+316 -0)➕
test_docling_integration.py(+268 -0)📄 Description
Pull Request Checklist
Note to first-time contributors: Please open a discussion post in Discussions to discuss your idea/fix with the community before creating a pull request, and describe your changes before submitting a pull request.
This is to ensure large feature PRs are discussed with the community first, before starting work on it. If the community does not want this feature or it is not relevant for Open WebUI as a project, it can be identified in the discussion before working on the feature and submitting the PR.
Before submitting, make sure you've checked the following:
devbranch. Not targeting thedevbranch will lead to immediate closure of the PR.Changelog Entry
Description
Enhanced Docling OCR integration to support page-level citations, improving user experience by providing specific page references in document citations. This feature brings Docling OCR in line with other OCR engines like Mistral OCR, enabling consistent citation behavior across different document processing engines.
Added
DOCLING_EXTRACT_PAGESenvironment variable to enable/disable page extraction (default: true)Changed
DoclingLoaderclass inbackend/open_webui/retrieval/loaders/main.pyto parse structured JSON responsesbackend/open_webui/config.pyto include newDOCLING_EXTRACT_PAGESconfiguration optionDeprecated
Removed
Fixed
Security
Breaking Changes
Additional Information
This enhancement addresses the community need for page-specific citations when using Docling OCR. Previously, Docling-processed documents appeared as single text blocks without page references, making it difficult for users to locate specific content. With this change:
Related Discussion: https://github.com/open-webui/open-webui/discussions/20446
Screenshots or Videos
Screenshot of page-numbered citations:

(Add screenshot after testing the feature - show chat interface with citations displaying "Page 3 of 10" etc.)
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.