[GH-ISSUE #11653] Support dots.ocr (rednote-hilab) #69761

Open
opened 2026-05-04 19:07:11 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Leroy-X on GitHub (Aug 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11653

dots.ocr is a powerful, multilingual document parser that unifies layout detection and content recognition within a single vision-language model while maintaining good reading order. Despite its compact 1.7B-parameter LLM foundation, it achieves state-of-the-art(SOTA) performance.

https://huggingface.co/rednote-hilab/dots.ocr
https://github.com/rednote-hilab/dots.ocr

Originally created by @Leroy-X on GitHub (Aug 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11653 dots.ocr is a powerful, multilingual document parser that unifies layout detection and content recognition within a single vision-language model while maintaining good reading order. Despite its compact 1.7B-parameter LLM foundation, it achieves state-of-the-art(SOTA) performance. https://huggingface.co/rednote-hilab/dots.ocr https://github.com/rednote-hilab/dots.ocr
GiteaMirror added the model label 2026-05-04 19:07:11 -05:00
Author
Owner

@berserker1 commented on GitHub (Aug 6, 2025):

Currently the model runs only through vllm right?

<!-- gh-comment-id:3157431406 --> @berserker1 commented on GitHub (Aug 6, 2025): Currently the model runs only through vllm right?
Author
Owner

@LuSrackhall commented on GitHub (Aug 16, 2025):

Currently the model runs only through vllm right?

However, vllm requires too much VRAM, and many people don't have hardware capable of supporting it.

However, if not to use ollama; you can also use dots.ocr through Hugging Face inference.

After setting up the environment, you can use the following command to experience the parsing effect locally:

python3 demo/demo_hf.py

You can replace this path to parse the image you want to test.

Image

Finally, I also hope to be able to experience dots.ocr through Ollama.

<!-- gh-comment-id:3193833491 --> @LuSrackhall commented on GitHub (Aug 16, 2025): > Currently the model runs only through vllm right? However, vllm requires too much VRAM, and many people don't have hardware capable of supporting it. However, if not to use ollama; you can also use dots.ocr through Hugging Face inference. After setting up the environment, you can use the following command to experience the parsing effect locally: ```bash python3 demo/demo_hf.py ``` You can replace this path to parse the image you want to test. <img width="1288" height="469" alt="Image" src="https://github.com/user-attachments/assets/f2c49a26-d44f-46b4-81dd-c8af1422f2d2" /> Finally, I also hope to be able to experience dots.ocr through Ollama.
Author
Owner

@mkvn-1 commented on GitHub (Aug 17, 2025):

Currently the model runs only through vllm right?

However, vllm requires too much VRAM, and many people don't have hardware capable of supporting it.

However, if not to use ollama; you can also use dots.ocr through Hugging Face inference.

After setting up the environment, you can use the following command to experience the parsing effect locally:

python3 demo/demo_hf.py
You can replace this path to parse the image you want to test.

Image Finally, I also hope to be able to experience dots.ocr through Ollama.

Can you pls share a Jupyter notebook showing how to load this model with Transformers? That would be very useful for everyone.

<!-- gh-comment-id:3194402292 --> @mkvn-1 commented on GitHub (Aug 17, 2025): > > Currently the model runs only through vllm right? > > However, vllm requires too much VRAM, and many people don't have hardware capable of supporting it. > > However, if not to use ollama; you can also use dots.ocr through Hugging Face inference. > > After setting up the environment, you can use the following command to experience the parsing effect locally: > > python3 demo/demo_hf.py > You can replace this path to parse the image you want to test. > > <img alt="Image" width="1288" height="469" src="https://private-user-images.githubusercontent.com/142690689/478706750-f2c49a26-d44f-46b4-81dd-c8af1422f2d2.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTU0MzkwNDEsIm5iZiI6MTc1NTQzODc0MSwicGF0aCI6Ii8xNDI2OTA2ODkvNDc4NzA2NzUwLWYyYzQ5YTI2LWQ0NGYtNDZiNC04MWRkLWM4YWYxNDIyZjJkMi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwODE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDgxN1QxMzUyMjFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xYWMxNmMxODViMmZjZWExNmYzNzNkODhmODlmNzVlZGJjNWFmMzliNGE1OTJjZGMxNmVjYjliN2FhYTdhNTQyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.mY44aMMyPHt2XxB_tmaRjjQfP388-Tzk602gPtNscCQ"> > Finally, I also hope to be able to experience dots.ocr through Ollama. Can you pls share a Jupyter notebook showing how to load this model with Transformers? That would be very useful for everyone.
Author
Owner

@LuSrackhall commented on GitHub (Aug 18, 2025):

Can you pls share a Jupyter notebook showing how to load this model with Transformers? That would be very useful for everyone.

@mkvn-1 Follow the steps in the official documentation to deploy successfully. My deployment process is relatively niche, but it also follows the official steps.

If you're interested—my deployment process uses Pixi to manage dependencies.

You can refer to my related repository DotsOCR for details. It's very useful when leveraging different cloud-native platforms for free. After cloning the relevant code, use the pixi i command to install dependencies. Then, follow the documentation tutorial to download model weights, and you can run it. I’ve updated the general process in the README of this repository.

Below is a screenshot of a successful deployment on the Tencent CNB platform:
Image

Image

I hope my deployment process can help you.

<!-- gh-comment-id:3195280141 --> @LuSrackhall commented on GitHub (Aug 18, 2025): > Can you pls share a Jupyter notebook showing how to load this model with Transformers? That would be very useful for everyone. @mkvn-1 Follow the steps in the official documentation to deploy successfully. My deployment process is relatively niche, but it also follows the official steps. If you're interested—my deployment process uses Pixi to manage dependencies. You can refer to my related repository [DotsOCR](https://github.com/LuSrackhall/DotsOcr) for details. It's very useful when leveraging different cloud-native platforms for free. After cloning the relevant code, use the `pixi i` command to install dependencies. Then, follow the documentation tutorial to download model weights, and you can run it. I’ve updated the general process in the README of this repository. Below is a screenshot of a successful deployment on the Tencent CNB platform: <img width="1907" height="1080" alt="Image" src="https://github.com/user-attachments/assets/0944f117-1264-4a92-9ff8-1a7d50eb637f" /> <img width="1038" height="859" alt="Image" src="https://github.com/user-attachments/assets/e6e1c2aa-262d-4601-95a6-399ef6245169" /> I hope my deployment process can help you.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69761