[GH-ISSUE #13137] feat: Knowledge Base Enhancement: Support for Image Import, OCR Recognition, and Image Display #16822

Open
opened 2026-04-19 22:38:49 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @belugaming on GitHub (Apr 22, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13137

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

Currently, the knowledge base in Open WebUI only supports text-based content. This limits its utility when working with visual information. Users are unable to import images, have them OCR-processed, or have the model display images from the knowledge base when responding to queries. This creates a gap in functionality when dealing with documents containing visual elements, diagrams, charts, or screenshots that contain important information.

Desired Solution you'd like

Implement comprehensive support for images in the knowledge base with three main capabilities:

Allow users to import image files (JPG, PNG, etc.) into the knowledge base
Automatically perform OCR on these images to extract text content while maintaining the association between text and source images
Enable models to display images from the knowledge base directly in conversations using Markdown image syntax (similar to how Python-generated charts can be displayed)
When a user asks a question related to knowledge base content that includes images, the model should be able to retrieve relevant images and display them inline within its response, rather than just describing them.

Alternatives Considered

Using external OCR tools and manually uploading text extracted from images
Only storing image URLs rather than the images themselves
Limiting knowledge base to text only and having users describe images manually
These alternatives all create additional work for users and don't provide the seamless experience of having images directly available within the knowledge base system.

Additional Context

This feature would greatly enhance the multimodal capabilities of Open WebUI, allowing it to work more effectively with visual content. Modern AI systems are increasingly multimodal, and this enhancement would keep Open WebUI aligned with this trend while providing significant practical benefits for users working with documents that combine text and images.

Originally created by @belugaming on GitHub (Apr 22, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/13137 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Currently, the knowledge base in Open WebUI only supports text-based content. This limits its utility when working with visual information. Users are unable to import images, have them OCR-processed, or have the model display images from the knowledge base when responding to queries. This creates a gap in functionality when dealing with documents containing visual elements, diagrams, charts, or screenshots that contain important information. ### Desired Solution you'd like Implement comprehensive support for images in the knowledge base with three main capabilities: Allow users to import image files (JPG, PNG, etc.) into the knowledge base Automatically perform OCR on these images to extract text content while maintaining the association between text and source images Enable models to display images from the knowledge base directly in conversations using Markdown image syntax (similar to how Python-generated charts can be displayed) When a user asks a question related to knowledge base content that includes images, the model should be able to retrieve relevant images and display them inline within its response, rather than just describing them. ### Alternatives Considered Using external OCR tools and manually uploading text extracted from images Only storing image URLs rather than the images themselves Limiting knowledge base to text only and having users describe images manually These alternatives all create additional work for users and don't provide the seamless experience of having images directly available within the knowledge base system. ### Additional Context This feature would greatly enhance the multimodal capabilities of Open WebUI, allowing it to work more effectively with visual content. Modern AI systems are increasingly multimodal, and this enhancement would keep Open WebUI aligned with this trend while providing significant practical benefits for users working with documents that combine text and images.
Author
Owner

@diwakar-s-maurya commented on GitHub (May 24, 2025):

If this feature gets selected for implementation, I recommend looking at this UX for showing image and OCR-text side by side.

https://mistral.ai/solutions/document-ai
Image

<!-- gh-comment-id:2906928498 --> @diwakar-s-maurya commented on GitHub (May 24, 2025): If this feature gets selected for implementation, I recommend looking at this UX for showing image and OCR-text side by side. https://mistral.ai/solutions/document-ai ![Image](https://github.com/user-attachments/assets/74ab0a03-8a64-4215-8b4b-0014acaf9775)
Author
Owner

@aelahi1998 commented on GitHub (May 24, 2025):

I’ve implemented a fork of the existing Docling integration that enhances chunk coherence by switching from token-based to tag-based splitting—this preserves table structures (which token chunking was breaking apart). Currently, the Markdown output still uses placeholder tags for images, and I haven’t yet tackled image/figure chunks due to limited front-end experience. If we could get format based chunking and also keep the images this would be huge for anyone that needs to be able to quickly verify that the document loading and retrieval process aren't causing errors while also being able to quickly check and reference figures which are currently omitted entirely.

If we instead grab the JSON output: every retrieved element includes its bounding box and page tag, and the endpoint returns full-page images encoded in Base64. To leverage this, I propose adding an optional “page-based” chunking mode in the Docling API that groups all content by page and embeds its corresponding Base64 image alongside.

Looking further ahead, we could introduce “feature-based” chunking—separating text blocks, figures, and tables into discrete chunks. In that model, we’d extract each chunk’s bounding box and dynamically clip the full-page Base64 image to display only the relevant region (e.g. a chart or table). This would give downstream applications both the textual and visual context in a single, unified payload while allowing the vector search to be format-aware and maintaining chunk coherence.

Does any of this sound useful ?

<!-- gh-comment-id:2906998000 --> @aelahi1998 commented on GitHub (May 24, 2025): I’ve implemented a fork of the existing Docling integration that enhances chunk coherence by switching from token-based to tag-based splitting—this preserves table structures (which token chunking was breaking apart). Currently, the Markdown output still uses placeholder tags for images, and I haven’t yet tackled image/figure chunks due to limited front-end experience. If we could get format based chunking and also keep the images this would be huge for anyone that needs to be able to quickly verify that the document loading and retrieval process aren't causing errors while also being able to quickly check and reference figures which are currently omitted entirely. If we instead grab the JSON output: every retrieved element includes its bounding box and page tag, and the endpoint returns full-page images encoded in Base64. To leverage this, I propose adding an optional “page-based” chunking mode in the Docling API that groups all content by page and embeds its corresponding Base64 image alongside. Looking further ahead, we could introduce “feature-based” chunking—separating text blocks, figures, and tables into discrete chunks. In that model, we’d extract each chunk’s bounding box and dynamically clip the full-page Base64 image to display only the relevant region (e.g. a chart or table). This would give downstream applications both the textual and visual context in a single, unified payload while allowing the vector search to be format-aware and maintaining chunk coherence. Does any of this sound useful ?
Author
Owner

@rgaricano commented on GitHub (May 25, 2025):

another way is integrate doctr (https://github.com/mindee/doctr) as tool for preprocess image based docs and work the fine-tunning interpretation within conversation/prompting.

<!-- gh-comment-id:2907765853 --> @rgaricano commented on GitHub (May 25, 2025): another way is integrate doctr (https://github.com/mindee/doctr) as tool for preprocess image based docs and work the fine-tunning interpretation within conversation/prompting.
Author
Owner

@Hisma commented on GitHub (May 29, 2025):

marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR.

I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature -
https://github.com/open-webui/open-webui/pull/14311

That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested.

<!-- gh-comment-id:2918190924 --> @Hisma commented on GitHub (May 29, 2025): marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR. I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature - https://github.com/open-webui/open-webui/pull/14311 That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested.
Author
Owner

@ER-EPR commented on GitHub (Jun 3, 2025):

marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR.

I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature - #14311

That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested.

Will it be possible to self-host marker and use it in openwebui?

<!-- gh-comment-id:2935384767 --> @ER-EPR commented on GitHub (Jun 3, 2025): > marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR. > > I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature - [#14311](https://github.com/open-webui/open-webui/pull/14311) > > That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested. Will it be possible to self-host marker and use it in openwebui?
Author
Owner

@adityapandey216 commented on GitHub (Jun 3, 2025):

I thought this is controlled using a flag PDF_EXTRACT_IMAGES flag ? or its not the case?

<!-- gh-comment-id:2936829463 --> @adityapandey216 commented on GitHub (Jun 3, 2025): I thought this is controlled using a flag PDF_EXTRACT_IMAGES flag ? or its not the case?
Author
Owner

@adityapandey216 commented on GitHub (Jun 3, 2025):

https://github.com/open-webui/open-webui/pull/13085

<!-- gh-comment-id:2936884747 --> @adityapandey216 commented on GitHub (Jun 3, 2025): https://github.com/open-webui/open-webui/pull/13085
Author
Owner

@Hisma commented on GitHub (Jun 3, 2025):

marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR.

I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature - #14311

That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested.

Will it be possible to self-host marker and use it in openwebui?

Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL.

<!-- gh-comment-id:2937230312 --> @Hisma commented on GitHub (Jun 3, 2025): > > marker API support was just added as well which supports image OCR. That in addition to docling and mistral gives you multiple options for image OCR. > > > > I'm obviously a fan of marker since I found it to outperform both docling and mistral which led me to create the PR to add the feature - [#14311](https://github.com/open-webui/open-webui/pull/14311) > > > > That said, it would be nice to have a UI for displaying image and OCR result side-by-side as suggested. > > Will it be possible to self-host marker and use it in openwebui? Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL.
Author
Owner

@nukikordzaia20 commented on GitHub (Jun 23, 2025):

Hi, is this pull request active?

<!-- gh-comment-id:2995947139 --> @nukikordzaia20 commented on GitHub (Jun 23, 2025): Hi, is this pull request active?
Author
Owner

@GeorgelPreput commented on GitHub (Jun 27, 2025):

Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL.

Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally:

Dockerfile

# For a CPU-only build, use the following build command:
#   docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu .

# For a CUDA-enabled build, use the following build command:
#   docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu .

ARG BASE_IMAGE=python:3.13-slim-bookworm
ARG BUILD_TYPE=cpu
FROM ${BASE_IMAGE} AS builder

ARG BUILD_TYPE
LABEL build_type=${BUILD_TYPE}

RUN apt-get update && apt-get install --no-install-recommends -y \
    build-essential curl wget python3 python-is-python3 software-properties-common && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

# CUDA Keyring for Ubuntu doesn't get accepted, but the Debian 12 one is. Go figure.
RUN if [ "$BUILD_TYPE" = "gpu" ]; then \
    wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i cuda-keyring_1.1-1_all.deb && \
    rm cuda-keyring_1.1-1_all.deb && \
    apt-get update && \
    apt-get -y install libcusparselt0 libcusparselt-dev cuda-toolkit-12-6; \
    else \
    echo "Skipping cuSPARSELt and CUPTI installation for non-GPU build"; \
    fi

RUN useradd --create-home --shell /bin/bash app
ENV HOME=/home/app

USER app
WORKDIR /home/app

ADD --chown=app:app https://astral.sh/uv/install.sh install.sh
RUN chmod -R 755 install.sh && ./install.sh && rm install.sh

ENV PATH="/home/app/.local/bin:${PATH}"

COPY pyproject-${BUILD_TYPE}.toml pyproject.toml
RUN uv sync

ENV PATH="/home/app/.venv/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:${LD_LIBRARY_PATH}"
ENV CUDA_HOME="/usr/local/cuda-12.6"

EXPOSE 8001

CMD ["marker_server", "--host", "0.0.0.0", "--port", "8001"]

pyproject-cpu.toml

[project]
name = "marker"
version = "1.6.2"
description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately."
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "fastapi",
    "marker-pdf[full]",
    "python-multipart",
    "starlette",
    "torch",
    "torchvision",
    "torchaudio",
    "uvicorn",
]

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

pyproject-gpu.toml

[project]
name = "marker"
version = "1.6.2"
description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately."
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "fastapi",
    "marker-pdf[full]",
    "python-multipart",
    "starlette",
    "torch",
    "torchvision",
    "torchaudio",
    "uvicorn",
]

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu126"
explicit = true
CPU-only image
  • build via:

    docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu .
    
  • run via:

    docker run -p 8001:8001 marker:cpu
    
GPU-enabled image
  • build via:

    docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu .
    
  • run via:

    docker run --runtime=nvidia --gpus=all -it -d -p 8001:8001 marker:gpu
    

Usage

curl -X POST -F 'file=@/home/user/sample.pdf' http://localhost:8001/marker/upload > ~/sample.json

@Hisma: would really appreciate a patch to the Marker feature that allows for a custom API base URL

<!-- gh-comment-id:3014486758 --> @GeorgelPreput commented on GitHub (Jun 27, 2025): > Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL. Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally: #### Dockerfile ```dockerfile # For a CPU-only build, use the following build command: # docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu . # For a CUDA-enabled build, use the following build command: # docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu . ARG BASE_IMAGE=python:3.13-slim-bookworm ARG BUILD_TYPE=cpu FROM ${BASE_IMAGE} AS builder ARG BUILD_TYPE LABEL build_type=${BUILD_TYPE} RUN apt-get update && apt-get install --no-install-recommends -y \ build-essential curl wget python3 python-is-python3 software-properties-common && \ apt-get clean && rm -rf /var/lib/apt/lists/* # CUDA Keyring for Ubuntu doesn't get accepted, but the Debian 12 one is. Go figure. RUN if [ "$BUILD_TYPE" = "gpu" ]; then \ wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb && \ dpkg -i cuda-keyring_1.1-1_all.deb && \ rm cuda-keyring_1.1-1_all.deb && \ apt-get update && \ apt-get -y install libcusparselt0 libcusparselt-dev cuda-toolkit-12-6; \ else \ echo "Skipping cuSPARSELt and CUPTI installation for non-GPU build"; \ fi RUN useradd --create-home --shell /bin/bash app ENV HOME=/home/app USER app WORKDIR /home/app ADD --chown=app:app https://astral.sh/uv/install.sh install.sh RUN chmod -R 755 install.sh && ./install.sh && rm install.sh ENV PATH="/home/app/.local/bin:${PATH}" COPY pyproject-${BUILD_TYPE}.toml pyproject.toml RUN uv sync ENV PATH="/home/app/.venv/bin:${PATH}" ENV LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:${LD_LIBRARY_PATH}" ENV CUDA_HOME="/usr/local/cuda-12.6" EXPOSE 8001 CMD ["marker_server", "--host", "0.0.0.0", "--port", "8001"] ``` #### pyproject-cpu.toml ```toml [project] name = "marker" version = "1.6.2" description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately." readme = "README.md" requires-python = ">=3.13" dependencies = [ "fastapi", "marker-pdf[full]", "python-multipart", "starlette", "torch", "torchvision", "torchaudio", "uvicorn", ] [tool.uv.sources] torch = { index = "pytorch" } torchvision = { index = "pytorch" } torchaudio = { index = "pytorch" } [[tool.uv.index]] name = "pytorch" url = "https://download.pytorch.org/whl/cpu" explicit = true ``` #### pyproject-gpu.toml ```toml [project] name = "marker" version = "1.6.2" description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately." readme = "README.md" requires-python = ">=3.13" dependencies = [ "fastapi", "marker-pdf[full]", "python-multipart", "starlette", "torch", "torchvision", "torchaudio", "uvicorn", ] [tool.uv.sources] torch = { index = "pytorch" } torchvision = { index = "pytorch" } torchaudio = { index = "pytorch" } [[tool.uv.index]] name = "pytorch" url = "https://download.pytorch.org/whl/cu126" explicit = true ``` ##### CPU-only image - build via: ```bash docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu . ``` - run via: ```bash docker run -p 8001:8001 marker:cpu ``` ##### GPU-enabled image - build via: ```bash docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu . ``` - run via: ```bash docker run --runtime=nvidia --gpus=all -it -d -p 8001:8001 marker:gpu ``` ## Usage ```bash curl -X POST -F 'file=@/home/user/sample.pdf' http://localhost:8001/marker/upload > ~/sample.json ``` @Hisma: would _really_ appreciate a patch to the Marker feature that allows for a custom API base URL
Author
Owner

@Hisma commented on GitHub (Jun 28, 2025):

Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL.

Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally:

Dockerfile

For a CPU-only build, use the following build command:

docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu .

For a CUDA-enabled build, use the following build command:

docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu .

ARG BASE_IMAGE=python:3.13-slim-bookworm
ARG BUILD_TYPE=cpu
FROM ${BASE_IMAGE} AS builder

ARG BUILD_TYPE
LABEL build_type=${BUILD_TYPE}

RUN apt-get update && apt-get install --no-install-recommends -y
build-essential curl wget python3 python-is-python3 software-properties-common &&
apt-get clean && rm -rf /var/lib/apt/lists/*

CUDA Keyring for Ubuntu doesn't get accepted, but the Debian 12 one is. Go figure.

RUN if [ "$BUILD_TYPE" = "gpu" ]; then
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb &&
dpkg -i cuda-keyring_1.1-1_all.deb &&
rm cuda-keyring_1.1-1_all.deb &&
apt-get update &&
apt-get -y install libcusparselt0 libcusparselt-dev cuda-toolkit-12-6;
else
echo "Skipping cuSPARSELt and CUPTI installation for non-GPU build";
fi

RUN useradd --create-home --shell /bin/bash app
ENV HOME=/home/app

USER app
WORKDIR /home/app

ADD --chown=app:app https://astral.sh/uv/install.sh install.sh
RUN chmod -R 755 install.sh && ./install.sh && rm install.sh

ENV PATH="/home/app/.local/bin:${PATH}"

COPY pyproject-${BUILD_TYPE}.toml pyproject.toml
RUN uv sync

ENV PATH="/home/app/.venv/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:${LD_LIBRARY_PATH}"
ENV CUDA_HOME="/usr/local/cuda-12.6"

EXPOSE 8001

CMD ["marker_server", "--host", "0.0.0.0", "--port", "8001"]

pyproject-cpu.toml

[project]
name = "marker"
version = "1.6.2"
description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately."
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"fastapi",
"marker-pdf[full]",
"python-multipart",
"starlette",
"torch",
"torchvision",
"torchaudio",
"uvicorn",
]

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

tool.uv.index
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

pyproject-gpu.toml

[project]
name = "marker"
version = "1.6.2"
description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately."
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"fastapi",
"marker-pdf[full]",
"python-multipart",
"starlette",
"torch",
"torchvision",
"torchaudio",
"uvicorn",
]

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

tool.uv.index
name = "pytorch"
url = "https://download.pytorch.org/whl/cu126"
explicit = true

CPU-only image
  • build via:
    docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu .
  • run via:
    docker run -p 8001:8001 marker:cpu
GPU-enabled image
  • build via:
    docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu .
  • run via:
    docker run --runtime=nvidia --gpus=all -it -d -p 8001:8001 marker:gpu

Usage

curl -X POST -F 'file=@/home/user/sample.pdf' http://localhost:8001/marker/upload > ~/sample.json
@Hisma: would really appreciate a patch to the Marker feature that allows for a custom API base URL

True, forcing people to use datalab is quite limiting, I agree. It's nice because it's "set and forget", cheap ($25/month for arguably the best OCR engine on the market), and doesn't require the end user to set up their own local server. Also their hosted solution is pretty beefy, no need to have GPU horsepower. That said, it would still be ideal to allow users to self-host.
Let me look into what it'd take to add this feature.
Also they've added some new flags since I released the add-on and deprecated others (language selection), so it warrants an update anyway.

<!-- gh-comment-id:3014944518 --> @Hisma commented on GitHub (Jun 28, 2025): > > Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL. > > Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally: > > #### Dockerfile > # For a CPU-only build, use the following build command: > # docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu . > > # For a CUDA-enabled build, use the following build command: > # docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu . > > ARG BASE_IMAGE=python:3.13-slim-bookworm > ARG BUILD_TYPE=cpu > FROM ${BASE_IMAGE} AS builder > > ARG BUILD_TYPE > LABEL build_type=${BUILD_TYPE} > > RUN apt-get update && apt-get install --no-install-recommends -y \ > build-essential curl wget python3 python-is-python3 software-properties-common && \ > apt-get clean && rm -rf /var/lib/apt/lists/* > > # CUDA Keyring for Ubuntu doesn't get accepted, but the Debian 12 one is. Go figure. > RUN if [ "$BUILD_TYPE" = "gpu" ]; then \ > wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb && \ > dpkg -i cuda-keyring_1.1-1_all.deb && \ > rm cuda-keyring_1.1-1_all.deb && \ > apt-get update && \ > apt-get -y install libcusparselt0 libcusparselt-dev cuda-toolkit-12-6; \ > else \ > echo "Skipping cuSPARSELt and CUPTI installation for non-GPU build"; \ > fi > > RUN useradd --create-home --shell /bin/bash app > ENV HOME=/home/app > > USER app > WORKDIR /home/app > > ADD --chown=app:app https://astral.sh/uv/install.sh install.sh > RUN chmod -R 755 install.sh && ./install.sh && rm install.sh > > ENV PATH="/home/app/.local/bin:${PATH}" > > COPY pyproject-${BUILD_TYPE}.toml pyproject.toml > RUN uv sync > > ENV PATH="/home/app/.venv/bin:${PATH}" > ENV LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:${LD_LIBRARY_PATH}" > ENV CUDA_HOME="/usr/local/cuda-12.6" > > EXPOSE 8001 > > CMD ["marker_server", "--host", "0.0.0.0", "--port", "8001"] > #### pyproject-cpu.toml > [project] > name = "marker" > version = "1.6.2" > description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately." > readme = "README.md" > requires-python = ">=3.13" > dependencies = [ > "fastapi", > "marker-pdf[full]", > "python-multipart", > "starlette", > "torch", > "torchvision", > "torchaudio", > "uvicorn", > ] > > [tool.uv.sources] > torch = { index = "pytorch" } > torchvision = { index = "pytorch" } > torchaudio = { index = "pytorch" } > > [[tool.uv.index]] > name = "pytorch" > url = "https://download.pytorch.org/whl/cpu" > explicit = true > #### pyproject-gpu.toml > [project] > name = "marker" > version = "1.6.2" > description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately." > readme = "README.md" > requires-python = ">=3.13" > dependencies = [ > "fastapi", > "marker-pdf[full]", > "python-multipart", > "starlette", > "torch", > "torchvision", > "torchaudio", > "uvicorn", > ] > > [tool.uv.sources] > torch = { index = "pytorch" } > torchvision = { index = "pytorch" } > torchaudio = { index = "pytorch" } > > [[tool.uv.index]] > name = "pytorch" > url = "https://download.pytorch.org/whl/cu126" > explicit = true > ##### CPU-only image > * build via: > docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu . > * run via: > docker run -p 8001:8001 marker:cpu > > ##### GPU-enabled image > * build via: > docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu . > * run via: > docker run --runtime=nvidia --gpus=all -it -d -p 8001:8001 marker:gpu > > ## Usage > curl -X POST -F 'file=@/home/user/sample.pdf' http://localhost:8001/marker/upload > ~/sample.json > [@Hisma](https://github.com/Hisma): would _really_ appreciate a patch to the Marker feature that allows for a custom API base URL True, forcing people to use datalab is quite limiting, I agree. It's nice because it's "set and forget", cheap ($25/month for arguably the best OCR engine on the market), and doesn't require the end user to set up their own local server. Also their hosted solution is pretty beefy, no need to have GPU horsepower. That said, it would still be ideal to allow users to self-host. Let me look into what it'd take to add this feature. Also they've added some new flags since I released the add-on and deprecated others (language selection), so it warrants an update anyway.
Author
Owner

@Hisma commented on GitHub (Jun 28, 2025):

I also realize this is a separate issue that's not addressed by marker - openwebui doesn't allow you to add images to a knowledgebase, even if the parser engine supports it.
This is a pretty big missing feature. Is anyone working on this? @tjbck
If not, I can try to see what it'd take to implement. I know a hacky work-around to get it going quickly (right now only external doc parser supports image uploads - so just append other image-supporting doc parsers to that section of code and you'll get image support - tested and confirmed this works).
But ideally, someone should refactor all the available doc parsers to include a list of what available doc formats that particular doc parser supports, and when that parser is selected, openwebui allows those formats to be uploaded to a kb built w/ that parser.
I already have a supported format list implemented in the marker api parser code, but owui doesn't pass the list as a parameter when choosing supuported formats - owui just hardcodes a very small subset of text-based file formats for RAG kbs. This is fixable, but some parsers (like tika) support TONS of file formats depending on which version you install. It could get messy.
There's probably a solid middle ground. But I really think this issue needs to be addressed.

<!-- gh-comment-id:3014955271 --> @Hisma commented on GitHub (Jun 28, 2025): I also realize this is a separate issue that's not addressed by marker - openwebui doesn't allow you to add images to a knowledgebase, even if the parser engine supports it. This is a pretty big missing feature. Is anyone working on this? @tjbck If not, I can try to see what it'd take to implement. I know a hacky work-around to get it going quickly (right now only external doc parser supports image uploads - so just append other image-supporting doc parsers to that section of code and you'll get image support - tested and confirmed this works). But ideally, someone should refactor all the available doc parsers to include a list of what available doc formats that particular doc parser supports, and when that parser is selected, openwebui allows those formats to be uploaded to a kb built w/ that parser. I already have a supported format list implemented in the marker api parser code, but owui doesn't pass the list as a parameter when choosing supuported formats - owui just hardcodes a very small subset of text-based file formats for RAG kbs. This is fixable, but some parsers (like tika) support TONS of file formats depending on which version you install. It could get messy. There's probably a solid middle ground. But I really think this issue needs to be addressed.
Author
Owner

@Hisma commented on GitHub (Jul 21, 2025):

Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL.

Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally:

Dockerfile

For a CPU-only build, use the following build command:

docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu .

For a CUDA-enabled build, use the following build command:

docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu .

ARG BASE_IMAGE=python:3.13-slim-bookworm
ARG BUILD_TYPE=cpu
FROM ${BASE_IMAGE} AS builder

ARG BUILD_TYPE
LABEL build_type=${BUILD_TYPE}

RUN apt-get update && apt-get install --no-install-recommends -y
build-essential curl wget python3 python-is-python3 software-properties-common &&
apt-get clean && rm -rf /var/lib/apt/lists/*

CUDA Keyring for Ubuntu doesn't get accepted, but the Debian 12 one is. Go figure.

RUN if [ "$BUILD_TYPE" = "gpu" ]; then
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb &&
dpkg -i cuda-keyring_1.1-1_all.deb &&
rm cuda-keyring_1.1-1_all.deb &&
apt-get update &&
apt-get -y install libcusparselt0 libcusparselt-dev cuda-toolkit-12-6;
else
echo "Skipping cuSPARSELt and CUPTI installation for non-GPU build";
fi

RUN useradd --create-home --shell /bin/bash app
ENV HOME=/home/app

USER app
WORKDIR /home/app

ADD --chown=app:app https://astral.sh/uv/install.sh install.sh
RUN chmod -R 755 install.sh && ./install.sh && rm install.sh

ENV PATH="/home/app/.local/bin:${PATH}"

COPY pyproject-${BUILD_TYPE}.toml pyproject.toml
RUN uv sync

ENV PATH="/home/app/.venv/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:${LD_LIBRARY_PATH}"
ENV CUDA_HOME="/usr/local/cuda-12.6"

EXPOSE 8001

CMD ["marker_server", "--host", "0.0.0.0", "--port", "8001"]

pyproject-cpu.toml

[project]
name = "marker"
version = "1.6.2"
description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately."
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"fastapi",
"marker-pdf[full]",
"python-multipart",
"starlette",
"torch",
"torchvision",
"torchaudio",
"uvicorn",
]

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

tool.uv.index
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

pyproject-gpu.toml

[project]
name = "marker"
version = "1.6.2"
description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately."
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"fastapi",
"marker-pdf[full]",
"python-multipart",
"starlette",
"torch",
"torchvision",
"torchaudio",
"uvicorn",
]

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

tool.uv.index
name = "pytorch"
url = "https://download.pytorch.org/whl/cu126"
explicit = true

CPU-only image
  • build via:
    docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu .
  • run via:
    docker run -p 8001:8001 marker:cpu
GPU-enabled image
  • build via:
    docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu .
  • run via:
    docker run --runtime=nvidia --gpus=all -it -d -p 8001:8001 marker:gpu

Usage

curl -X POST -F 'file=@/home/user/sample.pdf' http://localhost:8001/marker/upload > ~/sample.json
@Hisma: would really appreciate a patch to the Marker feature that allows for a custom API base URL

Sure, sorry its been a while, working on this now, as there's been some changes to the datalab API I need to incorporate as well.

<!-- gh-comment-id:3094963370 --> @Hisma commented on GitHub (Jul 21, 2025): > > Possibly, but I haven't tried. If possible you would need to set up a marker server (there's an example server in the git), and use the "external ocr engine" option and point it to your local marker endpoint URL. > > Sorry for the inline wall-of-text, but I haven't set up a repo for this -- tried it out a couple of months ago and seemed to work. I'd like to also automate it via GH Actions, but ain't got the time. Here's a recipe for building (and running) Marker locally: > > #### Dockerfile > # For a CPU-only build, use the following build command: > # docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu . > > # For a CUDA-enabled build, use the following build command: > # docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu . > > ARG BASE_IMAGE=python:3.13-slim-bookworm > ARG BUILD_TYPE=cpu > FROM ${BASE_IMAGE} AS builder > > ARG BUILD_TYPE > LABEL build_type=${BUILD_TYPE} > > RUN apt-get update && apt-get install --no-install-recommends -y \ > build-essential curl wget python3 python-is-python3 software-properties-common && \ > apt-get clean && rm -rf /var/lib/apt/lists/* > > # CUDA Keyring for Ubuntu doesn't get accepted, but the Debian 12 one is. Go figure. > RUN if [ "$BUILD_TYPE" = "gpu" ]; then \ > wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb && \ > dpkg -i cuda-keyring_1.1-1_all.deb && \ > rm cuda-keyring_1.1-1_all.deb && \ > apt-get update && \ > apt-get -y install libcusparselt0 libcusparselt-dev cuda-toolkit-12-6; \ > else \ > echo "Skipping cuSPARSELt and CUPTI installation for non-GPU build"; \ > fi > > RUN useradd --create-home --shell /bin/bash app > ENV HOME=/home/app > > USER app > WORKDIR /home/app > > ADD --chown=app:app https://astral.sh/uv/install.sh install.sh > RUN chmod -R 755 install.sh && ./install.sh && rm install.sh > > ENV PATH="/home/app/.local/bin:${PATH}" > > COPY pyproject-${BUILD_TYPE}.toml pyproject.toml > RUN uv sync > > ENV PATH="/home/app/.venv/bin:${PATH}" > ENV LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:${LD_LIBRARY_PATH}" > ENV CUDA_HOME="/usr/local/cuda-12.6" > > EXPOSE 8001 > > CMD ["marker_server", "--host", "0.0.0.0", "--port", "8001"] > #### pyproject-cpu.toml > [project] > name = "marker" > version = "1.6.2" > description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately." > readme = "README.md" > requires-python = ">=3.13" > dependencies = [ > "fastapi", > "marker-pdf[full]", > "python-multipart", > "starlette", > "torch", > "torchvision", > "torchaudio", > "uvicorn", > ] > > [tool.uv.sources] > torch = { index = "pytorch" } > torchvision = { index = "pytorch" } > torchaudio = { index = "pytorch" } > > [[tool.uv.index]] > name = "pytorch" > url = "https://download.pytorch.org/whl/cpu" > explicit = true > #### pyproject-gpu.toml > [project] > name = "marker" > version = "1.6.2" > description = "Marker converts documents to markdown, JSON, and HTML quickly and accurately." > readme = "README.md" > requires-python = ">=3.13" > dependencies = [ > "fastapi", > "marker-pdf[full]", > "python-multipart", > "starlette", > "torch", > "torchvision", > "torchaudio", > "uvicorn", > ] > > [tool.uv.sources] > torch = { index = "pytorch" } > torchvision = { index = "pytorch" } > torchaudio = { index = "pytorch" } > > [[tool.uv.index]] > name = "pytorch" > url = "https://download.pytorch.org/whl/cu126" > explicit = true > ##### CPU-only image > * build via: > docker build --build-arg BASE_IMAGE=python:3.13-slim-bookworm --build-arg BUILD_TYPE=cpu -t marker:cpu . > * run via: > docker run -p 8001:8001 marker:cpu > > ##### GPU-enabled image > * build via: > docker build --build-arg BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 --build-arg BUILD_TYPE=gpu -t marker:gpu . > * run via: > docker run --runtime=nvidia --gpus=all -it -d -p 8001:8001 marker:gpu > > ## Usage > curl -X POST -F 'file=@/home/user/sample.pdf' http://localhost:8001/marker/upload > ~/sample.json > [@Hisma](https://github.com/Hisma): would _really_ appreciate a patch to the Marker feature that allows for a custom API base URL Sure, sorry its been a while, working on this now, as there's been some changes to the datalab API I need to incorporate as well.
Author
Owner

@Hisma commented on GitHub (Jul 21, 2025):

PR with configurable API url is ready for review @GeorgelPreput. https://github.com/open-webui/open-webui/pull/15903
Also includes updates to the datalab API spec.

<!-- gh-comment-id:3096781095 --> @Hisma commented on GitHub (Jul 21, 2025): PR with configurable API url is ready for review @GeorgelPreput. https://github.com/open-webui/open-webui/pull/15903 Also includes updates to the datalab API spec.
Author
Owner

@Arokha commented on GitHub (Mar 25, 2026):

It would be great to add image support. Not 'a document as an image' that needs OCR, but I mean a jpg, png, etc, of a landscape or anything else that is not a 'document' but may be pertinent to other documents in the same KB for vision models to analyze.

<!-- gh-comment-id:4122485466 --> @Arokha commented on GitHub (Mar 25, 2026): It would be great to add image support. Not 'a document as an image' that needs OCR, but I mean a jpg, png, etc, of a landscape or anything else that is not a 'document' but may be pertinent to other documents in the same KB for vision models to analyze.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#16822