[GH-ISSUE #6949] Is there a better model that can accurately recognize image information?下载了好几个多模态的模型,图片识别效果都不好 #4397

Closed
opened 2026-04-12 15:20:17 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @goactiongo on GitHub (Sep 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6949

What is the issue?

"Using fastgpt --onapi to call local Ollama models, I have downloaded several multimodal models, but the image recognition accuracy is not good. Is there a better model that can accurately recognize image information?"

使用fastgpt--onapi调用ollama本地模型,下载了好几个多模态的模型,图片识别效果都不准确。
有没有好一点的可以识别图片信息的模型

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.11

Originally created by @goactiongo on GitHub (Sep 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6949 ### What is the issue? "Using fastgpt --onapi to call local Ollama models, I have downloaded several multimodal models, but the image recognition accuracy is not good. Is there a better model that can accurately recognize image information?" 使用fastgpt--onapi调用ollama本地模型,下载了好几个多模态的模型,图片识别效果都不准确。 有没有好一点的可以识别图片信息的模型 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.11
GiteaMirror added the question label 2026-04-12 15:20:17 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 25, 2024):

Which models have you tried? The current vision-capable models in the ollama library can be found here.

qwen2-vl (https://github.com/ollama/ollama/issues/6564) and pixtral (https://github.com/ollama/ollama/issues/6748) are apparently quite good but not yet supported in ollama.

<!-- gh-comment-id:2373298323 --> @rick-github commented on GitHub (Sep 25, 2024): Which models have you tried? The current vision-capable models in the ollama library can be found [here](https://ollama.com/search?c=vision). qwen2-vl (https://github.com/ollama/ollama/issues/6564) and pixtral (https://github.com/ollama/ollama/issues/6748) are apparently quite good but not yet supported in ollama.
Author
Owner

@goactiongo commented on GitHub (Sep 25, 2024):

minicpm-v:8b
llava:13b
bakllava
blackened/llama-3-8b-gpt-4o-ru1.0:latest
gemma2:27b
llava-llama3

The image recognition performance of these models is not accurate, for example, recognizing the information on images of vehicle dashboard displays

<!-- gh-comment-id:2373330445 --> @goactiongo commented on GitHub (Sep 25, 2024): minicpm-v:8b llava:13b bakllava blackened/llama-3-8b-gpt-4o-ru1.0:latest gemma2:27b llava-llama3 The image recognition performance of these models is not accurate, for example, recognizing the information on images of vehicle dashboard displays
Author
Owner

@olokelo commented on GitHub (Sep 26, 2024):

minicpm-v is the most advanced model available in ollama right now. It's based on qwen2 so it should work fine for Chinese text as well. Make sure your input images are under 1.8mpx as this is the limitation of minicpm-v. Based on my testing it isn't perfect but it's certainly more accurate than older llava based models. I was able to do some OCR tasks with it.

There're also other models that will probably be added pretty soon: pixtral, qwen2-vl and llama 3.2 vision, however for now you can't run them with ollama. Keep an eye on issues listed by @rick-github to know when they're available.

<!-- gh-comment-id:2376887553 --> @olokelo commented on GitHub (Sep 26, 2024): minicpm-v is the most advanced model available in ollama right now. It's based on qwen2 so it should work fine for Chinese text as well. Make sure your input images are under 1.8mpx as this is the limitation of minicpm-v. Based on my testing it isn't perfect but it's certainly more accurate than older llava based models. I was able to do some OCR tasks with it. There're also other models that will probably be added pretty soon: pixtral, qwen2-vl and llama 3.2 vision, however for now you can't run them with ollama. Keep an eye on issues listed by @rick-github to know when they're available.
Author
Owner

@goactiongo commented on GitHub (Sep 26, 2024):

Thanks,appreciate your replay
 

---原始邮件---
发件人: "Aleksander @.>
发送时间: 2024年9月26日(周四) 晚上9:00
收件人: @.
>;
抄送: @.@.>;
主题: Re: [ollama/ollama] Is there a better model that can accurately recognize image information?下载了好几个多模态的模型,图片识别效果都不好 (Issue #6949)

minicpm-v is the most advanced model available in ollama right now. It's based on qwen2 so it should work fine for Chinese text as well. Make sure your input images are under 1.8mpx as this is the limitation of minicpm-v. Based on my testing it isn't perfect but it's certainly more accurate than older llava based models. I was able to do some OCR tasks with it.

There're also other models that will probably be added pretty soon: pixtral, qwen2-vl and llama 3.2 vision, however for now you can't run them with ollama. Keep an eye on issues listed by @rick-github to know when they're available.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.***>

<!-- gh-comment-id:2376936311 --> @goactiongo commented on GitHub (Sep 26, 2024): Thanks,appreciate your replay &nbsp; ---原始邮件--- 发件人: "Aleksander ***@***.***&gt; 发送时间: 2024年9月26日(周四) 晚上9:00 收件人: ***@***.***&gt;; 抄送: ***@***.******@***.***&gt;; 主题: Re: [ollama/ollama] Is there a better model that can accurately recognize image information?下载了好几个多模态的模型,图片识别效果都不好 (Issue #6949) minicpm-v is the most advanced model available in ollama right now. It's based on qwen2 so it should work fine for Chinese text as well. Make sure your input images are under 1.8mpx as this is the limitation of minicpm-v. Based on my testing it isn't perfect but it's certainly more accurate than older llava based models. I was able to do some OCR tasks with it. There're also other models that will probably be added pretty soon: pixtral, qwen2-vl and llama 3.2 vision, however for now you can't run them with ollama. Keep an eye on issues listed by @rick-github to know when they're available. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***&gt;
Author
Owner

@dhiltgen commented on GitHub (Oct 24, 2024):

We're seeing good results with llama 3.2. Please give the new 0.4.0 RC a try

https://github.com/ollama/ollama/releases

<!-- gh-comment-id:2434188984 --> @dhiltgen commented on GitHub (Oct 24, 2024): We're seeing good results with llama 3.2. Please give the new 0.4.0 RC a try https://github.com/ollama/ollama/releases
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4397