mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[PR #20809] [CLOSED] fix: Docling page number extraction for citations #41421
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/20809
Author: @jannikstdl
Created: 1/20/2026
Status: ❌ Closed
Base:
main← Head:fix/docling-page-extraction📝 Commits (10+)
b464b48Merge pull request #20581 from Classic298/fix/db-pool-memory-update3fc8661fix(db): CRITICAL - prevent pool exhaustion in memory /reset (#20580)182d5e8fix(db): release connection before embedding in process_files_batch (#20576)826e9abfix(db): release connection before embeddings in knowledge /metadata/reindex (#20577)2426257fix(db): release connection before embedding in memory /add (#20578)d0c2bfdfix(db): release connection before LLM call in OpenAI /chat/completions (#20572)0b5aa6dfix(db): release connection before LLM call in Ollama /api/chat (#20571)2faab40i18n(pl-PL): Add missing keys and update existing translations (#20562)84263fci18n: Updated the Catalan translation file (#20566)24044b4fix(db): release connection before LLM call in Ollama /v1/chat/completions (#20569)📊 Changes
43 files changed (+1073 additions, -760 deletions)
View changed files
📝
backend/open_webui/config.py(+4 -0)📝
backend/open_webui/env.py(+7 -1)📝
backend/open_webui/models/groups.py(+7 -6)📝
backend/open_webui/models/knowledge.py(+3 -0)📝
backend/open_webui/models/models.py(+3 -0)📝
backend/open_webui/retrieval/loaders/main.py(+35 -5)📝
backend/open_webui/retrieval/vector/dbs/weaviate.py(+11 -3)📝
backend/open_webui/routers/auths.py(+4 -0)📝
backend/open_webui/routers/channels.py(+2 -6)📝
backend/open_webui/routers/files.py(+2 -0)📝
backend/open_webui/routers/knowledge.py(+22 -13)📝
backend/open_webui/routers/memories.py(+24 -8)📝
backend/open_webui/routers/models.py(+2 -4)📝
backend/open_webui/routers/ollama.py(+15 -9)📝
backend/open_webui/routers/openai.py(+5 -3)📝
backend/open_webui/routers/retrieval.py(+6 -2)📝
backend/open_webui/routers/users.py(+2 -4)📝
backend/open_webui/tools/builtin.py(+180 -1)📝
backend/open_webui/utils/auth.py(+2 -1)📝
backend/open_webui/utils/tools.py(+11 -0)...and 23 more files
📄 Description
Summary
md_page_break_placeholderparameter to split markdown content by page while preserving formattingProblem
Page numbers from Docling-processed PDFs were not showing in CitationModal because the markdown output had no page boundary information.
Solution
Request Docling to insert page break markers (
<!-- DOCLING_PAGE_BREAK -->) between pages in the markdown output, then split on those markers to create one document per page with thepagemetadata field.This preserves:
Test plan
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.