mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #13137] feat: Knowledge Base Enhancement: Support for Image Import, OCR Recognition, and Image Display #55488
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @belugaming on GitHub (Apr 22, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13137
Check Existing Issues
Problem Description
Currently, the knowledge base in Open WebUI only supports text-based content. This limits its utility when working with visual information. Users are unable to import images, have them OCR-processed, or have the model display images from the knowledge base when responding to queries. This creates a gap in functionality when dealing with documents containing visual elements, diagrams, charts, or screenshots that contain important information.
Desired Solution you'd like
Implement comprehensive support for images in the knowledge base with three main capabilities:
Allow users to import image files (JPG, PNG, etc.) into the knowledge base
Automatically perform OCR on these images to extract text content while maintaining the association between text and source images
Enable models to display images from the knowledge base directly in conversations using Markdown image syntax (similar to how Python-generated charts can be displayed)
When a user asks a question related to knowledge base content that includes images, the model should be able to retrieve relevant images and display them inline within its response, rather than just describing them.
Alternatives Considered
Using external OCR tools and manually uploading text extracted from images
Only storing image URLs rather than the images themselves
Limiting knowledge base to text only and having users describe images manually
These alternatives all create additional work for users and don't provide the seamless experience of having images directly available within the knowledge base system.
Additional Context
This feature would greatly enhance the multimodal capabilities of Open WebUI, allowing it to work more effectively with visual content. Modern AI systems are increasingly multimodal, and this enhancement would keep Open WebUI aligned with this trend while providing significant practical benefits for users working with documents that combine text and images.
@diwakar-s-maurya commented on GitHub (May 24, 2025):
If this feature gets selected for implementation, I recommend looking at this UX for showing image and OCR-text side by side.
https://mistral.ai/solutions/document-ai

@aelahi1998 commented on GitHub (May 24, 2025):
I’ve implemented a fork of the existing Docling integration that enhances chunk coherence by switching from token-based to tag-based splitting—this preserves table structures (which token chunking was breaking apart). Currently, the Markdown output still uses placeholder tags for images, and I haven’t yet tackled image/figure chunks due to limited front-end experience. If we could get format based chunking and also keep the images this would be huge for anyone that needs to be able to quickly verify that the document loading and retrieval process aren't causing errors while also being able to quickly check and reference figures which are currently omitted entirely.
If we instead grab the JSON output: every retrieved element includes its bounding box and page tag, and the endpoint returns full-page images encoded in Base64. To leverage this, I propose adding an optional “page-based” chunking mode in the Docling API that groups all content by page and embeds its corresponding Base64 image alongside.
Looking further ahead, we could introduce “feature-based” chunking—separating text blocks, figures, and tables into discrete chunks. In that model, we’d extract each chunk’s bounding box and dynamically clip the full-page Base64 image to display only the relevant region (e.g. a chart or table). This would give downstream applications both the textual and visual context in a single, unified payload while allowing the vector search to be format-aware and maintaining chunk coherence.
Does any of this sound useful ?
@rgaricano commented on GitHub (May 25, 2025):
another way is integrate doctr (https://github.com/mindee/doctr) as tool for preprocess image based docs and work the fine-tunning interpretation within conversation/prompting.
@Hisma commented on GitHub (May 29, 2025):
marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR.
I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature -
https://github.com/open-webui/open-webui/pull/14311
That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested.
@ER-EPR commented on GitHub (Jun 3, 2025):
Will it be possible to self-host marker and use it in openwebui?
@adityapandey216 commented on GitHub (Jun 3, 2025):
I thought this is controlled using a flag PDF_EXTRACT_IMAGES flag ? or its not the case?
@adityapandey216 commented on GitHub (Jun 3, 2025):
https://github.com/open-webui/open-webui/pull/13085
@Hisma commented on GitHub (Jun 3, 2025):
Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL.
@nukikordzaia20 commented on GitHub (Jun 23, 2025):
Hi, is this pull request active?
@GeorgelPreput commented on GitHub (Jun 27, 2025):
Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally:
Dockerfile
pyproject-cpu.toml
pyproject-gpu.toml
CPU-only image
build via:
run via:
GPU-enabled image
build via:
run via:
Usage
@Hisma: would really appreciate a patch to the Marker feature that allows for a custom API base URL
@Hisma commented on GitHub (Jun 28, 2025):
True, forcing people to use datalab is quite limiting, I agree. It's nice because it's "set and forget", cheap ($25/month for arguably the best OCR engine on the market), and doesn't require the end user to set up their own local server. Also their hosted solution is pretty beefy, no need to have GPU horsepower. That said, it would still be ideal to allow users to self-host.
Let me look into what it'd take to add this feature.
Also they've added some new flags since I released the add-on and deprecated others (language selection), so it warrants an update anyway.
@Hisma commented on GitHub (Jun 28, 2025):
I also realize this is a separate issue that's not addressed by marker - openwebui doesn't allow you to add images to a knowledgebase, even if the parser engine supports it.
This is a pretty big missing feature. Is anyone working on this? @tjbck
If not, I can try to see what it'd take to implement. I know a hacky work-around to get it going quickly (right now only external doc parser supports image uploads - so just append other image-supporting doc parsers to that section of code and you'll get image support - tested and confirmed this works).
But ideally, someone should refactor all the available doc parsers to include a list of what available doc formats that particular doc parser supports, and when that parser is selected, openwebui allows those formats to be uploaded to a kb built w/ that parser.
I already have a supported format list implemented in the marker api parser code, but owui doesn't pass the list as a parameter when choosing supuported formats - owui just hardcodes a very small subset of text-based file formats for RAG kbs. This is fixable, but some parsers (like tika) support TONS of file formats depending on which version you install. It could get messy.
There's probably a solid middle ground. But I really think this issue needs to be addressed.
@Hisma commented on GitHub (Jul 21, 2025):
Sure, sorry its been a while, working on this now, as there's been some changes to the datalab API I need to incorporate as well.
@Hisma commented on GitHub (Jul 21, 2025):
PR with configurable API url is ready for review @GeorgelPreput. https://github.com/open-webui/open-webui/pull/15903
Also includes updates to the datalab API spec.
@Arokha commented on GitHub (Mar 25, 2026):
It would be great to add image support. Not 'a document as an image' that needs OCR, but I mean a jpg, png, etc, of a landscape or anything else that is not a 'document' but may be pertinent to other documents in the same KB for vision models to analyze.
@HenkieTenkie62 commented on GitHub (Apr 30, 2026):
@GeorgelPreput Bit unrelated for here maybe but I was delighted to find you made a new image for Marker on Dockerhub.
But I can't seem to get it to boot.
Without running it with "user: root" it won't download texify and the text recognition model.
After that it exits with error:
@GeorgelPreput commented on GitHub (Apr 30, 2026):
Hey @HenkieTenkie62, I'll look into it. Are you on ARM or x86?
@HenkieTenkie62 commented on GitHub (Apr 30, 2026):
@GeorgelPreput Thanks! x86 - 64
@GeorgelPreput commented on GitHub (Apr 30, 2026):
@HenkieTenkie62 try again, please? I assume you have a mounted volume on your machine for the downloaded models, the rights to which gave you errors.
The issue was that the GPU variant (based on an NVIDIA CUDA image) had two built-in users,
root(id 0) andubuntu(id 1000). When the Dockerfile created the userapp, used for running Marker, that user got id 1001 instead of 1000.I've removed the
ubuntuuser, so theappuser has id 1000, just like for the CPU version. Since I suspect your own user (on the machine running the container) also has user id 1000 (verify by typingidin the terminal), everything should work fine 🤞 If your user has some other id, then you can do one of the following options below. I'm going to assume in this case that your user id is 1337, for example:-- OR --
The above comes with a "works on my machine" certificate, but the Docker images should be fixed, just finished pushing them. Also, since I am working on this on my private GitLab, I made a mirror of the repo producing the Docker images in GitHub over here. Maybe we can leave this discussion to rest for the purposes of details on the Marker image, and happy to answer any questions / fix issues in the appropriate repo for it.