[GH-ISSUE #2562] Inconsistent OCR Results with LLaVA 1.6 and Ollama vs. LLaVA Online Demo #48016

Open
opened 2026-04-28 06:26:35 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @arcaweb-ch on GitHub (Feb 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2562

Hey there, I've posted this issue on LLaVA repo already, not sure if this problem refers to an implementation issue in Ollama. Any idea?

Originally created by @arcaweb-ch on GitHub (Feb 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2562 Hey there, I've posted this issue on [LLaVA repo](https://github.com/haotian-liu/LLaVA/issues/1116) already, not sure if this problem refers to an implementation issue in Ollama. Any idea?
GiteaMirror added the question label 2026-04-28 06:26:35 -05:00
Author
Owner

@easp commented on GitHub (Feb 17, 2024):

Are you using the fp16 version? I think the online demo uses an unquantized version of the model.

<!-- gh-comment-id:1950296128 --> @easp commented on GitHub (Feb 17, 2024): Are you using the fp16 version? I think the online demo uses an unquantized version of the model.
Author
Owner

@donbr commented on GitHub (Feb 18, 2024):

appreciate you posting the issue with both Ollama and LLaVA.

On the Ollama side my concern is that the default model uses Mistral, but the only model supported at higher parameters uses Vicuna. Refer to the Discord for more info.

The lower parameter model supports both Vicuna and Mistral,.

https://discord.com/channels/1128867683291627614/1128867684130508875/1208258667141402676

<!-- gh-comment-id:1950937120 --> @donbr commented on GitHub (Feb 18, 2024): appreciate you posting the issue with both Ollama and LLaVA. On the Ollama side my concern is that the default model uses Mistral, but the only model supported at higher parameters uses Vicuna. Refer to the Discord for more info. The lower parameter model supports both Vicuna and Mistral,. https://discord.com/channels/1128867683291627614/1128867684130508875/1208258667141402676
Author
Owner

@arcaweb-ch commented on GitHub (Feb 18, 2024):

@easp @donbr thanks appreciate your thoughts. I tested both vicuna and mistral versions on different hw setups, they're both producing the same issue. Could be related to a different implementation on Ollama side as stated here?

<!-- gh-comment-id:1950991734 --> @arcaweb-ch commented on GitHub (Feb 18, 2024): @easp @donbr thanks appreciate your thoughts. I tested both vicuna and mistral versions on different hw setups, they're both producing the same issue. Could be related to a different implementation on Ollama side as stated [here](https://github.com/ggerganov/llama.cpp/pull/5267)?
Author
Owner

@wrapss commented on GitHub (Feb 18, 2024):

yes, llava1.6 splits an image into several lower-resolution images for processing, which improves its capabilities. Without this modification (another pr is still pending), the current implementation won't have all the model's performance.

<!-- gh-comment-id:1951321934 --> @wrapss commented on GitHub (Feb 18, 2024): yes, llava1.6 splits an image into several lower-resolution images for processing, which improves its capabilities. Without this modification (another pr is still pending), the current implementation won't have all the model's performance.
Author
Owner

@arcaweb-ch commented on GitHub (Feb 19, 2024):

That would actually improve OCR operations a lot (trying to catch @jmorganca attention :)

<!-- gh-comment-id:1952428926 --> @arcaweb-ch commented on GitHub (Feb 19, 2024): That would actually improve OCR operations a lot (trying to catch @jmorganca attention :)
Author
Owner

@donbr commented on GitHub (Feb 21, 2024):

Apologize for missing the updates. Recommend that we work with the llava team to improve our test scenarios. I posted a discussion item in their GitHub that is related.

<!-- gh-comment-id:1956278593 --> @donbr commented on GitHub (Feb 21, 2024): Apologize for missing the updates. Recommend that we work with the llava team to improve our test scenarios. I posted a discussion item in their GitHub that is related.
Author
Owner

@donbr commented on GitHub (Feb 21, 2024):

@arcaweb-ch did you receive an answer from @jmorganca on this? What does Ollama currently have in the form of regression tests for LLaVA?

My test case was comparing Image Analysis abilities across LLaVA / OpenAI / Gemini, and their ability to tell the difference between a Werewolf and a Wolf. LLaVA 1.5 on Ollama performed consistently better than the others until 1.6.

<!-- gh-comment-id:1957293833 --> @donbr commented on GitHub (Feb 21, 2024): @arcaweb-ch did you receive an answer from @jmorganca on this? What does Ollama currently have in the form of regression tests for LLaVA? My test case was comparing Image Analysis abilities across LLaVA / OpenAI / Gemini, and their ability to tell the difference between a Werewolf and a Wolf. LLaVA 1.5 on Ollama performed consistently better than the others until 1.6. - [Discussion on LLaVA site](https://github.com/haotian-liu/LLaVA/discussions/1157) - [AI Vision Image Analysis / Classification Using Ollama](https://github.com/donbr/visionary_storytelling/blob/main/notebooks/ai_vision_image_classification_ollama.ipynb) - a Jupyter notebook using Ollama LLaVA and Dolphin-Mistral.
Author
Owner

@ChristianWeyer commented on GitHub (May 17, 2024):

yes, llava1.6 splits an image into several lower-resolution images for processing, which improves its capabilities. Without this modification (another pr is still pending), the current implementation won't have all the model's performance.

Has this all been integrated already?

I am running latest Ollama 0.1.38 and still see this: https://github.com/haotian-liu/LLaVA/issues/1497#issuecomment-2117167208
Thanks!

<!-- gh-comment-id:2117186691 --> @ChristianWeyer commented on GitHub (May 17, 2024): > yes, llava1.6 splits an image into several lower-resolution images for processing, which improves its capabilities. Without this modification (another pr is still pending), the current implementation won't have all the model's performance. Has this all been integrated already? I am running latest Ollama 0.1.38 and still see this: https://github.com/haotian-liu/LLaVA/issues/1497#issuecomment-2117167208 Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48016