mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
issue: Docling and Tika are not adding page numbers to vector store document metadata #5216
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @sreesdas on GitHub (May 18, 2025).
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.6.9
Ollama Version (if applicable)
0.7.0
Operating System
macOs Sonoma
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
The document content extraction engines: docling and tika are not storing the page number metadata into the vector store, where as default and mistral ocr are retaining page number information.
Actual Behavior
"page", "page_label", "total_pages" metadata fields in the vector store are absent in case document extraction engine is set to docling and tika metadata.
Steps to Reproduce
Logs & Screenshots
Vector store snapshot when selected Mistral OCR:
Vector store snapshot when selected docling OCR / tika:
Additional Information
No response
@tjbck commented on GitHub (May 18, 2025):
Intended behaviour, however PR welcome.