[GH-ISSUE #5086] TextMonkey model #28969

Open
opened 2026-04-22 07:33:11 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @insinfo on GitHub (Jun 16, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5086

In my quick tests on the demo, it seems to be the best document understanding and OCR model I have ever tested, my current use case is that I have to identify the process code of 1500000 images manually (a challenging job) (I am wondering if this model will be able to do this for me)

I have to identify from an image like the one below what the process code/year is in each image

572_page-0001
573.pdf

https://github.com/Yuliang-Liu/Monkey?tab=readme-ov-file

TextMonkey

Originally created by @insinfo on GitHub (Jun 16, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5086 In my quick tests on the demo, it seems to be the best document understanding and OCR model I have ever tested, my current use case is that I have to identify the process code of 1500000 images manually (a challenging job) (I am wondering if this model will be able to do this for me) I have to identify from an image like the one below what the process code/year is in each image ![572_page-0001](https://github.com/ollama/ollama/assets/12227024/b32b4fc2-a861-4357-b2d2-89de79de8258) [573.pdf](https://github.com/user-attachments/files/15859661/573.pdf) https://github.com/Yuliang-Liu/Monkey?tab=readme-ov-file [TextMonkey](https://arxiv.org/abs/2403.04473)
GiteaMirror added the model label 2026-04-22 07:33:11 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28969