[GH-ISSUE #15057] Support for Baidu Qianfan-OCR 4B model #35424

Open
opened 2026-04-22 19:55:31 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @rjmalagon on GitHub (Mar 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15057

This is a strong OCR with allegedly high-quality results surpassing DeepSeek OCR V2 and Gemini 3/3.1 Pro on some benchmarks.

https://huggingface.co/baidu/Qianfan-OCR

Offers direct image-to-Markdown conversion, 192 Languages, and "Layout-as-Thought" (more about it in the model description).

This multimodal model uses Qwen3 for the LLM and the InternVL derived Qianfan-ViT as the vision encoder.

Originally created by @rjmalagon on GitHub (Mar 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15057 This is a strong OCR with allegedly high-quality results surpassing DeepSeek OCR V2 and Gemini 3/3.1 Pro on some benchmarks. https://huggingface.co/baidu/Qianfan-OCR Offers direct image-to-Markdown conversion, 192 Languages, and "Layout-as-Thought" (more about it in the model description). This multimodal model uses Qwen3 for the LLM and the InternVL derived Qianfan-ViT as the vision encoder.
GiteaMirror added the model label 2026-04-22 19:55:31 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 25, 2026):

https://ollama.com/maternion/Qianfan-OCR

<!-- gh-comment-id:4127266509 --> @rick-github commented on GitHub (Mar 25, 2026): https://ollama.com/maternion/Qianfan-OCR
Author
Owner

@rjmalagon commented on GitHub (Mar 26, 2026):

Hi, thanks, I already tested it directly from Hugging Face GGUFs, I was thinking of bringing support to the Ollama engine. And bringing a correct template is an enigma for me (porting the Jinja thingy is hard for me; I wish to learn).

<!-- gh-comment-id:4131871882 --> @rjmalagon commented on GitHub (Mar 26, 2026): Hi, thanks, I already tested it directly from Hugging Face GGUFs, I was thinking of bringing support to the Ollama engine. And bringing a correct template is an enigma for me (porting the Jinja thingy is hard for me; I wish to learn).
Author
Owner

@rick-github commented on GitHub (Mar 26, 2026):

What's incorrect with the template?

<!-- gh-comment-id:4135861675 --> @rick-github commented on GitHub (Mar 26, 2026): What's incorrect with the template?
Author
Owner

@JoeLoginIsAlreadyTaken commented on GitHub (Mar 27, 2026):

https://ollama.com/maternion/Qianfan-OCR

I've tested this model (q8_0) and it just outputs hallucinated gibberish on documents where GLM-OCR and Qwen3.5 works fine.

<!-- gh-comment-id:4144264729 --> @JoeLoginIsAlreadyTaken commented on GitHub (Mar 27, 2026): > https://ollama.com/maternion/Qianfan-OCR I've tested this model (q8_0) and it just outputs hallucinated gibberish on documents where GLM-OCR and Qwen3.5 works fine.
Author
Owner

@rick-github commented on GitHub (Mar 27, 2026):

Requires a vendor sync: https://github.com/ggml-org/llama.cpp/pull/20847

<!-- gh-comment-id:4144677114 --> @rick-github commented on GitHub (Mar 27, 2026): Requires a vendor sync: https://github.com/ggml-org/llama.cpp/pull/20847
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35424