[GH-ISSUE #3298] Vision with llava-1.6-7B is unusable via CLI #64067

New Issue

GiteaMirror · 2026-05-03T16:03:20-05:00

GiteaMirror commented

2026-05-03 16:03:20 -05:00

Originally created by @olafgeibig on GitHub (Mar 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3298

What is the issue?

The image recognition is very poor. It can't describe the picture properly and it also can't extract text. It seems to process a very downscaled image because it complains about the text being too small and it makes assumptions about image elements that seem to be likely but aren't true. It hallucinates a lot and makes things up.

Basically this model is very capable. But with Ollama it is unusable and I tried different tags. llava:7b-v1.6-mistral-q5_K_M and llava:7b from the ollama library. Additionally I downloded a Q5_K_M quant of the model from HF and imported it into my local ollama with a self-created modelfile. The problem was always the same.

I used this image, a screenshot from a PDF manual:

Prompt:

extract the text underneath the image: /Users/olaf/Desktop/Screenshot\ 2024-03-22\ at\ 18.37.08.png
Added image '/Users/olaf/Desktop/Screenshot 2024-03-22 at 18.37.08.png'

Ollama response:

The image shows a digital advertisement or informational graphic, but it is too small and low resolution to provide a clear transcription of the text. The visible content includes a large purple logo in the upper right corner with the name "SWITCH" below it, and several smaller graphics or logos aligned vertically along the top and bottom edges of the image, which may be related to the product or service advertised on this webpage.

Please note that due to the low resolution and size of the text in the image, it's difficult to provide an exact transcription of what is written.

It did not even try to extract the text, instead described the image but the description is complete nonsense.

Converting the image to JPG improved the recognition but the model still made things up and the text extraction is just partially correct:

The text underneath the image reads:

"Click on any package result to see above. Otherwise the top row will not be visible. A popup will appear and display the packages in the next step.

The first group of buttons is for the product options. The second set of buttons are for different shipping options. The third set ofbuttons allows you to change your email if needed.

Please note that these buttons might not be active on this page but they will be when you click through from a live site."

Using a python script as described in https://ollama.com/blog/vision-models does not improve anything.

Log output after script execution with the png file:

time=2024-03-22T20:03:14.404+01:00 level=INFO source=dyn_ext_server.go:172 msg="loaded 1 images"
{"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":637,"tid":"0x172f5f000","timestamp":1711134194}
{"function":"update_slots","level":"INFO","line":1852,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":637,"tid":"0x172f5f000","timestamp":1711134194}
encode_image_with_clip: image embedding created: 576 tokens

encode_image_with_clip: image encoded in   658.77 ms by CLIP (    1.14 ms per image patch)
{"function":"print_timings","level":"INFO","line":264,"msg":"prompt eval time     =    6374.43 ms /     1 tokens ( 6374.43 ms per token,     0.16 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.15687684555803996,"slot_id":0,"t_prompt_processing":6374.427,"t_token":6374.427,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212}
{"function":"print_timings","level":"INFO","line":278,"msg":"generation eval time =   11258.49 ms /   321 runs   (   35.07 ms per token,    28.51 tokens per second)","n_decoded":321,"n_tokens_second":28.511811350934032,"slot_id":0,"t_token":35.073183800623056,"t_token_generation":11258.492,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212}
{"function":"print_timings","level":"INFO","line":287,"msg":"          total time =   17632.92 ms","slot_id":0,"t_prompt_processing":6374.427,"t_token_generation":11258.492,"t_total":17632.919,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212}
{"function":"update_slots","level":"INFO","line":1660,"msg":"slot released","n_cache_tokens":322,"n_ctx":2048,"n_past":907,"n_system_tokens":0,"slot_id":0,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212,"truncated":false}
[GIN] 2024/03/22 - 20:03:32 | 200 | 17.686484875s |       127.0.0.1 | POST     "/api/chat"

What did you expect to see?

I compared it with the results from this HF space running the same model in FP16 but most probably with pytorch and transformers: https://huggingface.co/spaces/merve/llava-next

prompt:

extract the text underneath the image

response:

The text underneath the image is as follows:

"Click on any package results in additional information popping up, as seen above. Otherwise, the top row of text buttons represents view filters for seeing and sorting packs differently. The first group of buttons offers to filter packages by their source, either by showing only those by Bitwig, only those by Artists, or only those from Partners (like sound design companies, etc.). Or simply turn off this filter to see all packages

This is a good answer. It is exactly the text beneath the image in the image.

Steps to reproduce

ollama run llava:7b
Use the uploaded picture with the prompt from above

Are there any recent changes that introduced the issue?

No response

OS

macOS

Architecture

arm64

Platform

No response

Ollama version

0.1.29

GPU

Apple

GPU info

M1 Pro

CPU

Apple

Other software

No response

Originally created by @olafgeibig on GitHub (Mar 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3298 ### What is the issue? The image recognition is very poor. It can't describe the picture properly and it also can't extract text. It seems to process a very downscaled image because it complains about the text being too small and it makes assumptions about image elements that seem to be likely but aren't true. It hallucinates a lot and makes things up. Basically this model is very capable. But with Ollama it is unusable and I tried different tags. llava:7b-v1.6-mistral-q5_K_M and llava:7b from the ollama library. Additionally I downloded a Q5_K_M quant of the model from HF and imported it into my local ollama with a self-created modelfile. The problem was always the same. I used this image, a screenshot from a PDF manual: ![Screenshot 2024-03-22 at 18 37 08](https://github.com/ollama/ollama/assets/295644/b126b65a-4fbb-41d1-9f44-d2682f17dd9c) Prompt: > extract the text underneath the image: /Users/olaf/Desktop/Screenshot\ 2024-03-22\ at\ 18.37.08.png > Added image '/Users/olaf/Desktop/Screenshot 2024-03-22 at 18.37.08.png' Ollama response: > The image shows a digital advertisement or informational graphic, but it is too small and low resolution to provide a clear transcription of the text. The visible content includes a large purple logo in the upper right corner with the name "SWITCH" below it, and several smaller graphics or logos aligned vertically along the top and bottom edges of the image, which may be related to the product or service advertised on this webpage. > > Please note that due to the low resolution and size of the text in the image, it's difficult to provide an exact transcription of what is written. It did not even try to extract the text, instead described the image but the description is complete nonsense. Converting the image to JPG improved the recognition but the model still made things up and the text extraction is just partially correct: > The text underneath the image reads: > > "Click on any package result to see above. Otherwise the top row will not be visible. A popup will appear and display the packages in the next step. > > The first group of buttons is for the product options. The second set of buttons are for different shipping options. The third set ofbuttons allows you to change your email if needed. > > Please note that these buttons might not be active on this page but they will be when you click through from a live site." Using a python script as described in https://ollama.com/blog/vision-models does not improve anything. Log output after script execution with the png file: ``` time=2024-03-22T20:03:14.404+01:00 level=INFO source=dyn_ext_server.go:172 msg="loaded 1 images" {"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":637,"tid":"0x172f5f000","timestamp":1711134194} {"function":"update_slots","level":"INFO","line":1852,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":637,"tid":"0x172f5f000","timestamp":1711134194} encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 658.77 ms by CLIP ( 1.14 ms per image patch) {"function":"print_timings","level":"INFO","line":264,"msg":"prompt eval time = 6374.43 ms / 1 tokens ( 6374.43 ms per token, 0.16 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.15687684555803996,"slot_id":0,"t_prompt_processing":6374.427,"t_token":6374.427,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212} {"function":"print_timings","level":"INFO","line":278,"msg":"generation eval time = 11258.49 ms / 321 runs ( 35.07 ms per token, 28.51 tokens per second)","n_decoded":321,"n_tokens_second":28.511811350934032,"slot_id":0,"t_token":35.073183800623056,"t_token_generation":11258.492,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212} {"function":"print_timings","level":"INFO","line":287,"msg":" total time = 17632.92 ms","slot_id":0,"t_prompt_processing":6374.427,"t_token_generation":11258.492,"t_total":17632.919,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212} {"function":"update_slots","level":"INFO","line":1660,"msg":"slot released","n_cache_tokens":322,"n_ctx":2048,"n_past":907,"n_system_tokens":0,"slot_id":0,"task_id":637,"tid":"0x172f5f000","timestamp":1711134212,"truncated":false} [GIN] 2024/03/22 - 20:03:32 | 200 | 17.686484875s | 127.0.0.1 | POST "/api/chat" ``` ### What did you expect to see? I compared it with the results from this HF space running the same model in FP16 but most probably with pytorch and transformers: https://huggingface.co/spaces/merve/llava-next prompt: > extract the text underneath the image response: > The text underneath the image is as follows: > > "Click on any package results in additional information popping up, as seen above. Otherwise, the top row of text buttons represents view filters for seeing and sorting packs differently. The first group of buttons offers to filter packages by their source, either by showing only those by Bitwig, only those by Artists, or only those from Partners (like sound design companies, etc.). Or simply turn off this filter to see all packages This is a good answer. It is exactly the text beneath the image in the image. ### Steps to reproduce 1. ollama run llava:7b 2. Use the uploaded picture with the prompt from above ### Are there any recent changes that introduced the issue? _No response_ ### OS macOS ### Architecture arm64 ### Platform _No response_ ### Ollama version 0.1.29 ### GPU Apple ### GPU info M1 Pro ### CPU Apple ### Other software _No response_

GiteaMirror added the bug label 2026-05-03 16:03:20 -05:00

GiteaMirror closed this issue

2026-05-03 16:03:22 -05:00

GiteaMirror commented

2026-05-03 16:03:26 -05:00

@MorrisLu-Taipei commented on GitHub (Mar 23, 2024):

no supprise , llava is aways poor in image recognition, is not ollama issue, maybe have to ask llava team.

@MorrisLu-Taipei commented on GitHub (Mar 23, 2024): no supprise , llava is aways poor in image recognition, is not ollama issue, maybe have to ask llava team.

GiteaMirror commented

2026-05-03 16:03:30 -05:00

@olafgeibig commented on GitHub (Mar 23, 2024):

I disagree. The model performs very well in that regard, better than any other model <= 7B that I tested. Please read the "What did you expect to see?" section. The same model served in a HF spaces is doing near perfect OCR and description of the image.

@olafgeibig commented on GitHub (Mar 23, 2024): I disagree. The model performs very well in that regard, better than any other model <= 7B that I tested. Please read the "What did you expect to see?" section. The same model served in a HF spaces is doing near perfect OCR and description of the image.

GiteaMirror commented

2026-05-03 16:03:31 -05:00

@igorschlum commented on GitHub (Mar 23, 2024):

Hi @olafgeibig

There are 3 Llava Models, the biggest is the best. I crop the image you provided to keep only the part with the text in Black and White and asked Llava : can you give me the text written on this image? /Users/igor/Documents/readme
... .png

The result is:

The image contains text that appears to be instructions for filtering
packages or content, possibly on a website or software platform. Here is
the transcribed text:

"Click on any package result is additional information popping up as seen
above otherwise the top row of text buttons represents view filters for
seeing and sorting packs differently. The first group of buttons offers to
filter packages by their source such as Only By Blizz, only by Artists, or
simply turn off this page. The second group of buttons offers to filter
package by their status installed/uninstalled meaning if they are
available on your computer. Finally, the third group offers sorting
options one option is to sort packages alphabetically with the Name 1
button Or choose to sort packages based on their release date with the
Recommended button."

When I use the same picture and same prompt but when I use the Llava13b model, I got this answer:

The text in the image appears to be a bit blurry and stretched due to the
perspective, making it difficult to read accurately. However, from what is
visible, it seems to be a description or instructions related to sorting
or filtering packages on a website or application. If you provide a
clearer image or more details about the text, I could try to transcribe it
for you.

I used a old llava 1.3 from 3 month ago (a 7b model and it was working), but the recognition was not as good in 1.3 than in 1.6 so the result was :

The first group of buttons offers filters for package results, additional
information popping up as seen above any row of text buttons representing
various views. Filters are available from the source by showing on either
BittyWig (artists), a like sound design company, or simply by using
Partner's like sound design. They have also included simple options for
sorting packages alphabetically with their name button and choosing to
sort by date when the release is announced, which will be displayed on
this page.

@igorschlum commented on GitHub (Mar 23, 2024): Hi @olafgeibig There are 3 Llava Models, the biggest is the best. I crop the image you provided to keep only the part with the text in Black and White and asked Llava : can you give me the text written on this image? /Users/igor/Documents/readme ... .png The result is: The image contains text that appears to be instructions for filtering packages or content, possibly on a website or software platform. Here is the transcribed text: "Click on any package result is additional information popping up as seen above otherwise the top row of text buttons represents view filters for seeing and sorting packs differently. The first group of buttons offers to filter packages by their source such as Only By Blizz, only by Artists, or simply turn off this page. The second group of buttons offers to filter package by their status installed/uninstalled meaning if they are available on your computer. Finally, the third group offers sorting options one option is to sort packages alphabetically with the Name 1 button Or choose to sort packages based on their release date with the Recommended button." When I use the same picture and same prompt but when I use the Llava13b model, I got this answer: The text in the image appears to be a bit blurry and stretched due to the perspective, making it difficult to read accurately. However, from what is visible, it seems to be a description or instructions related to sorting or filtering packages on a website or application. If you provide a clearer image or more details about the text, I could try to transcribe it for you. I used a old llava 1.3 from 3 month ago (a 7b model and it was working), but the recognition was not as good in 1.3 than in 1.6 so the result was : The first group of buttons offers filters for package results, additional information popping up as seen above any row of text buttons representing various views. Filters are available from the source by showing on either BittyWig (artists), a like sound design company, or simply by using Partner's like sound design. They have also included simple options for sorting packages alphabetically with their name button and choosing to sort by date when the release is announced, which will be displayed on this page.

GiteaMirror commented

2026-05-03 16:03:33 -05:00

@marksalpeter commented on GitHub (Mar 24, 2024):

Have the same issue. The Huggingface deployments of 1.6 are able to read text out of most of the document images I've given it. When I give the same images to ollama is it unable to read them. I'm on an intel mac though.

@marksalpeter commented on GitHub (Mar 24, 2024): Have the same issue. The Huggingface deployments of 1.6 are able to read text out of most of the document images I've given it. When I give the same images to ollama is it unable to read them. I'm on an intel mac though.

GiteaMirror commented

2026-05-03 16:03:34 -05:00

@igorschlum commented on GitHub (Mar 24, 2024):

@marksalpeter Which version of the model do you use? It seems like the llava:34b-v1.6 works better. I'm trying to install this version from Hugging Face: https://huggingface.co/llava-hf/llava-v1.6-34b-hf but have to spend some time doing that.

@igorschlum commented on GitHub (Mar 24, 2024): @marksalpeter Which version of the model do you use? It seems like the llava:34b-v1.6 works better. I'm trying to install this version from Hugging Face: https://huggingface.co/llava-hf/llava-v1.6-34b-hf but have to spend some time doing that.

GiteaMirror commented

2026-05-03 16:03:35 -05:00

@olafgeibig commented on GitHub (Mar 24, 2024):

My issue is with the 7B model. I can't run a 34B model. Anyway, the point is, that there must be a bug in Ollama or probably the underlying llama.cpp.

@olafgeibig commented on GitHub (Mar 24, 2024): My issue is with the 7B model. I can't run a 34B model. Anyway, the point is, that there must be a bug in Ollama or probably the underlying llama.cpp.

GiteaMirror commented

2026-05-03 16:03:36 -05:00

@igorschlum commented on GitHub (Mar 25, 2024):

@olafgeibig On hugging face, did you test with the 7B model? The 34B model works fine with Ollama, so it could be an issue with the 7B model itself.

@igorschlum commented on GitHub (Mar 25, 2024): @olafgeibig On hugging face, did you test with the 7B model? The 34B model works fine with Ollama, so it could be an issue with the 7B model itself.

GiteaMirror commented

2026-05-03 16:03:39 -05:00

@olafgeibig commented on GitHub (Mar 25, 2024):

Yep, the HF space where they demo the new transformers feature to use Llava models is using the 7B model. See yourself: https://huggingface.co/spaces/merve/llava-next/blob/main/app.py

@olafgeibig commented on GitHub (Mar 25, 2024): Yep, the HF space where they demo the new transformers feature to use Llava models is using the 7B model. See yourself: https://huggingface.co/spaces/merve/llava-next/blob/main/app.py

GiteaMirror commented

2026-05-03 16:03:41 -05:00

@olafgeibig commented on GitHub (Apr 3, 2024):

bump

@olafgeibig commented on GitHub (Apr 3, 2024): bump

GiteaMirror commented

2026-05-03 16:03:42 -05:00

@CoderCowMoo commented on GitHub (Apr 6, 2024):

Can confirm,

And the hf spaces says that the llava 1.6 mistral 7b variant is being used.

Try LLaVA NeXT in this demo (more specifically, the Mistral-7B variant).

@CoderCowMoo commented on GitHub (Apr 6, 2024): Can confirm, ![image](https://github.com/ollama/ollama/assets/84396762/221d2f82-3415-4523-8c30-035e444236bb) ![image](https://github.com/ollama/ollama/assets/84396762/94bdafd6-08a3-4b39-8509-bf94523a058b) And the hf spaces says that the llava 1.6 mistral 7b variant is being used. > Try [LLaVA NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) in this demo (more specifically, the [Mistral-7B variant](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)).

GiteaMirror commented

2026-05-03 16:03:43 -05:00

@igorschlum commented on GitHub (Apr 18, 2024):

I did again the test with ollama run llava:13b with version 0.1.32
and I got this answer that is quite good as AI always rewrite the text it recognize thru OCR.

Sure, the text written in the image is as follows:

"Click on any package results is add all information pop-up to see the top
row of text button represents view showing only packages that are
different from their source.

The first group of buttons offers filters by their source - either by the
artists or not those by partner companies (e.g. "Bandcamp" or
"SoundCloud"). The second group of buttons offers filters by their status

either available for download but are A+, meaning they've been
pre-installed and can be used simply by clicking on the download link, or
again, simply turn off this feature to see all packages in one place.

Finally, the third group of buttons offers sort options. One option is to
sort by name alphabetically with the "Name" button. The other is to choose
a sort based on their release date with the "Release Date" button.
Finally, simply turn off this feature to see all packages in one place."

@igorschlum commented on GitHub (Apr 18, 2024): I did again the test with ollama run llava:13b with version 0.1.32 and I got this answer that is quite good as AI always rewrite the text it recognize thru OCR. Sure, the text written in the image is as follows: "Click on any package results is add all information pop-up to see the top row of text button represents view showing only packages that are different from their source. The first group of buttons offers filters by their source - either by the artists or not those by partner companies (e.g. "Bandcamp" or "SoundCloud"). The second group of buttons offers filters by their status - either available for download but are A+, meaning they've been pre-installed and can be used simply by clicking on the download link, or again, simply turn off this feature to see all packages in one place. Finally, the third group of buttons offers sort options. One option is to sort by name alphabetically with the "Name" button. The other is to choose a sort based on their release date with the "Release Date" button. Finally, simply turn off this feature to see all packages in one place."

GiteaMirror commented

2026-05-03 16:03:44 -05:00

@igorschlum commented on GitHub (Apr 19, 2024):

I loaded lava 7b with version 0.1.32 and I get a good result with this image:

ollama run llava:7b

can you give me the full text of this image ? /Users/igor/Documents/readme.png'

Added image '/Users/igor/Documents/readme.png'
Sure, here's the full text from the image:

click on any package result is additonal information popping up as seen 
above otherwise the top row of buttons are different filters for seeing 
all products by sorting them by price low to high or showing only items 
that have been installed either by me or someone else. The first group of 
buttons offers filter packages by their source. These are not shown but 
they are there if you need to see them.

Below this is the second group of buttons which offer different types of 
packages. You can use a keyboard shortcut to navigate through these 
buttons. However, the package installer is disabled for now. There may be 
some hidden options that you cannot find. The package installer does not 
work without being installed but are available if you simply turn them on.
This is an important message to ensure that users are aware of all 
available packages in order to sort their preferences.

The first set of buttons allows a user to see the name and description of 
each package as well as any buttons that might be associated with it. The 
second row of buttons is disabled for now but simply turning them on will 
allow you to see all packages that are not installed or simply turn off 
this option if you wish to disable it.

The last set of buttons allows you to choose the third group of options 
which are not fully visible in this image and do not have any associated 
keyboard shortcuts. These options are also not fully visible in this 
image. It is important to note that all packages can be sorted into 
different categories based on their function or the number of packages 
they belong to. The final set of buttons allows you to choose your 
preferences. This is an important message to ensure that users are aware 
of all available packages in order to sort their preferences.

The result is not exaclty the same as the image, but I think that we can close this Issue if @olafgeibig agrees.

@igorschlum commented on GitHub (Apr 19, 2024): I loaded lava 7b with version 0.1.32 and I get a good result with this image: ollama run llava:7b >>> can you give me the full text of this image ? /Users/igor/Documents/readme.png' Added image '/Users/igor/Documents/readme.png' Sure, here's the full text from the image: ```css click on any package result is additonal information popping up as seen above otherwise the top row of buttons are different filters for seeing all products by sorting them by price low to high or showing only items that have been installed either by me or someone else. The first group of buttons offers filter packages by their source. These are not shown but they are there if you need to see them. Below this is the second group of buttons which offer different types of packages. You can use a keyboard shortcut to navigate through these buttons. However, the package installer is disabled for now. There may be some hidden options that you cannot find. The package installer does not work without being installed but are available if you simply turn them on. This is an important message to ensure that users are aware of all available packages in order to sort their preferences. The first set of buttons allows a user to see the name and description of each package as well as any buttons that might be associated with it. The second row of buttons is disabled for now but simply turning them on will allow you to see all packages that are not installed or simply turn off this option if you wish to disable it. The last set of buttons allows you to choose the third group of options which are not fully visible in this image and do not have any associated keyboard shortcuts. These options are also not fully visible in this image. It is important to note that all packages can be sorted into different categories based on their function or the number of packages they belong to. The final set of buttons allows you to choose your preferences. This is an important message to ensure that users are aware of all available packages in order to sort their preferences. ``` The result is not exaclty the same as the image, but I think that we can close this Issue if @olafgeibig agrees.

GiteaMirror commented

2026-05-03 16:03:44 -05:00

@hx507 commented on GitHub (May 3, 2024):

Also seeing the same issue where llava from ollama performs significantly worse than other web hosted version.

I loaded lava 7b with version 0.1.32 and I get a good result with this image:

Trying the exact same thing but the result seems very bad with latest ollama.

>>> can you give me the full text of this image ? /home/xxx/Desktop/readme.png 
Added image '/home/xxx/Desktop/readme.png'
 The image contains a humorous meme with a visual pun. Here's the text in a semi-transparent format:

"DON'T PAY ANY ATTENTION TO THE DISTINCT LACK OF LOGICAL SEQUENCES OR THE CLEAR LACK OF COHERENCE BECAUSE THIS IS A MEME"

I was able to give the ollama 34b a try. However, the 34b just keeps complaining the image is blurry.

>>> can you give me the full text of this image ? /home/xxx/Desktop/readme.png
Added image '/home/xxx/Desktop/readme.png'
The image appears to contain a series of text bubbles, likely from an online platform or messaging service, with the word "DUPED" in each one. 
The text is not fully visible due to the angle and resolution of the image, but it seems to be repetitive, possibly indicating a pattern or 
trend being discussed within those messages.

@hx507 commented on GitHub (May 3, 2024): Also seeing the same issue where llava from ollama performs significantly worse than other web hosted version. > I loaded lava 7b with version 0.1.32 and I get a good result with this image: Trying the exact same thing but the result seems very bad with latest ollama. ``` >>> can you give me the full text of this image ? /home/xxx/Desktop/readme.png Added image '/home/xxx/Desktop/readme.png' The image contains a humorous meme with a visual pun. Here's the text in a semi-transparent format: "DON'T PAY ANY ATTENTION TO THE DISTINCT LACK OF LOGICAL SEQUENCES OR THE CLEAR LACK OF COHERENCE BECAUSE THIS IS A MEME" ``` I was able to give the ollama 34b a try. However, the 34b just keeps complaining the image is blurry. ``` >>> can you give me the full text of this image ? /home/xxx/Desktop/readme.png Added image '/home/xxx/Desktop/readme.png' The image appears to contain a series of text bubbles, likely from an online platform or messaging service, with the word "DUPED" in each one. The text is not fully visible due to the angle and resolution of the image, but it seems to be repetitive, possibly indicating a pattern or trend being discussed within those messages. ```

GiteaMirror commented

2026-05-03 16:03:46 -05:00

@hx507 commented on GitHub (May 3, 2024):

Interestingly, restarting ollama server makes the first image query work.

For anything other than the first image query uploaded (even with a fresh client session), the model will just output nonsense or complain about it being too blurry.

@hx507 commented on GitHub (May 3, 2024): Interestingly, restarting ollama server makes the first image query work. For anything other than the first image query uploaded (even with a fresh client session), the model will just output nonsense or complain about it being too blurry.

GiteaMirror commented

2026-05-03 16:03:46 -05:00

@jacopofar commented on GitHub (May 8, 2024):

I had the same issue here. Running the ollama through docker and then

docker exec -it ollama ollama run llava:7b

I get an interactive prompt, where I tried these commands:

Describe this image
Describe this image <path to a real image on a mounted docker volume>
Describe this image <invalid path>

in all cases it happily allucinate and describe an image that does not exist, in case 2 and 3 it outputs "Added image <path>" first but gives no warning about the fact the file does not exist.

I restarted ollama server and this time it described the image correctly.

Still, if I pass a path that does not exist it just invents a description.

I did more tests and it seems the first image I try to describe after a server restart (not just session, the ollama server) is reliable, after that it may or may not work, sometimes it keeps "seeing" the previous image and sometimes nothing at all and generates a random description. I could not find out what creates this broken state.

@jacopofar commented on GitHub (May 8, 2024): I had the same issue here. Running the ollama through docker and then ```docker exec -it ollama ollama run llava:7b``` I get an interactive prompt, where I tried these commands: * `Describe this image` * `Describe this image <path to a real image on a mounted docker volume>` * `Describe this image <invalid path>` in all cases it happily allucinate and describe an image that does not exist, in case 2 and 3 it outputs "`Added image <path>`" first but gives no warning about the fact the file does not exist. I restarted ollama server and this time it described the image correctly. Still, if I pass a path that does not exist it just invents a description. I did more tests and it seems the first image I try to describe after a server restart (not just session, the ollama server) is reliable, after that it may or may not work, sometimes it keeps "seeing" the previous image and sometimes nothing at all and generates a random description. I could not find out what creates this broken state.

GiteaMirror commented

2026-05-03 16:03:47 -05:00

@hx507 commented on GitHub (May 10, 2024):

Looking at the release note of 0.1.34 I think this is already addressed:

Fixed issues with LLaVa models where they would respond incorrectly after the first request

Seems to fix for me on llava:34b.

@hx507 commented on GitHub (May 10, 2024): Looking at the release note of 0.1.34 I think this is already addressed: > - Fixed issues with LLaVa models where they would respond incorrectly after the first request Seems to fix for me on `llava:34b`.

GiteaMirror commented

2026-05-03 16:03:47 -05:00

@jmorganca commented on GitHub (May 10, 2024):

Hi folks 0.1.34 should fix these issues. Sorry this happened. Thanks for creating an issue 😊

@jmorganca commented on GitHub (May 10, 2024): Hi folks 0.1.34 should fix these issues. Sorry this happened. Thanks for creating an issue 😊

GiteaMirror commented

2026-05-03 16:03:48 -05:00

@AverageGuy commented on GitHub (Jun 13, 2024):

I have the same problem on 1.43. Using llava:13b-v1.6.
The first time I ran it, it said there was no text. The second time said "Here's a cool picture of my cat" the third time I ran it it said "{
"text": "JOIN THE GLOBAL MOVEMENT FOR A BETTER FUTURE",
"font_size": "28px",
"font_family": "Arial Black",
"background_color": "transparent",
"text_color": "#ffffff"
}
The 4th time: {
"text": "AI-generated art can be an interesting way to explore creativity with technology. It allows us to push boundaries and see what is possible, even if it may not always be perfect or conventional."
}
The image is a 2000x191 pixel grey image of a set of transactions from a CC statement. Here's the code:
`import ollama
import base64

def get_base64_encoded_image(image_path):
with open(image_path, "rb") as img_file:
return base64.b64encode(img_file.read()).decode('utf-8')

b64 = get_base64_encoded_image("/home/jwl/3LineTrans.png")

Initialize the Ollama client

ollama_client = ollama.Client()

Define the path to your image

image_path = '/home/jwl/3LineTrans.png'

Prepare the message to send to the LLaVA model

message = {
'role': 'user',
'content': 'Print the text found in this image in json format.',
'image': {
'type': 'base64',
'media_type': 'image/png',
'data': b64
}
}

Use the ollama.chat function to send the image and retrieve

response = ollama_client.chat(
#model="llava:latest", # Specify the desired LLaVA model size
model="llava:13b-v1.6", # Specify the desired LLaVA model size
messages=[message]
)

Print the model's description of the image

print(response)
print(response['message']['content'])
`

@AverageGuy commented on GitHub (Jun 13, 2024): I have the same problem on 1.43. Using llava:13b-v1.6. The first time I ran it, it said there was no text. The second time said "Here's a cool picture of my cat" the third time I ran it it said "{ "text": "JOIN THE GLOBAL MOVEMENT FOR A BETTER FUTURE", "font_size": "28px", "font_family": "Arial Black", "background_color": "transparent", "text_color": "#ffffff" } The 4th time: { "text": "AI-generated art can be an interesting way to explore creativity with technology. It allows us to push boundaries and see what is possible, even if it may not always be perfect or conventional." } The image is a 2000x191 pixel grey image of a set of transactions from a CC statement. Here's the code: `import ollama import base64 def get_base64_encoded_image(image_path): with open(image_path, "rb") as img_file: return base64.b64encode(img_file.read()).decode('utf-8') b64 = get_base64_encoded_image("/home/jwl/3LineTrans.png") # Initialize the Ollama client ollama_client = ollama.Client() # Define the path to your image image_path = '/home/jwl/3LineTrans.png' # Prepare the message to send to the LLaVA model message = { 'role': 'user', 'content': 'Print the text found in this image in json format.', 'image': { 'type': 'base64', 'media_type': 'image/png', 'data': b64 } } # Use the ollama.chat function to send the image and retrieve response = ollama_client.chat( #model="llava:latest", # Specify the desired LLaVA model size model="llava:13b-v1.6", # Specify the desired LLaVA model size messages=[message] ) # Print the model's description of the image print(response) print(response['message']['content']) `

GiteaMirror commented

2026-05-03 16:03:52 -05:00

@AverageGuy commented on GitHub (Jun 13, 2024):

I don't know what I did to mess up the format. I used the <> button which inserted a couple of ticks and I pasted the text into the texts. It doesn't seem to like comments.

@AverageGuy commented on GitHub (Jun 13, 2024): I don't know what I did to mess up the format. I used the <> button which inserted a couple of ticks and I pasted the text into the texts. It doesn't seem to like comments.

GiteaMirror commented

2026-05-03 16:03:55 -05:00

@samyan commented on GitHub (Aug 12, 2024):

Guys any update with this for llava 8b?

@samyan commented on GitHub (Aug 12, 2024): Guys any update with this for llava 8b?

GiteaMirror commented

2026-05-03 16:03:56 -05:00

@Falkensmaze0 commented on GitHub (Oct 11, 2024):

@olafgeibig On hugging face, did you test with the 7B model? The 34B model works fine with Ollama, so it could be an issue with the 7B model itself.

Currently using the Llava:34b-v1.6-fp16 with Ollama, and I've had practically zero luck with any type of image reading. The infamous "picture is blurry or too small" has been a constant, unfortunately.

I've tried calling the ollama.chat() from ollama's python library, similar results, except the model complains about the start/stop parameters in the template as well. (template is original)

UPDATE
I recreated the model in ollama by setting the Temperature at 0.1. Now it's working with approx. 85% accuracy. Jumps lines occasionally and changes units, still.

UPDATE2
Via prompting and using the phrase "refrain from rephrasing the text, or any alterations whatsoever. report only what is in the image" the accuracy got upped even more. I suppose proper image pre-processing coupled with LLM post-processing could achieve near perfect results.
It might be noteworthy that fine-tuning the model with your subject matter dictionary might prove helpful if anyone intends to use it on specialized matters or academic literature.

OS: Ubuntu 24.02 LTS
CPU: Intel 14900K
GPU: RTX 4090 TUF
RAM: 128GB DDR5

@Falkensmaze0 commented on GitHub (Oct 11, 2024): > @olafgeibig On hugging face, did you test with the 7B model? The 34B model works fine with Ollama, so it could be an issue with the 7B model itself. Currently using the Llava:34b-v1.6-fp16 with Ollama, and I've had practically zero luck with any type of image reading. The infamous "picture is blurry or too small" has been a constant, unfortunately. I've tried calling the ollama.chat() from ollama's python library, similar results, except the model complains about the start/stop parameters in the template as well. (template is original) **UPDATE** I recreated the model in ollama by setting the Temperature at 0.1. Now it's working with approx. 85% accuracy. Jumps lines occasionally and changes units, still. **UPDATE2** Via prompting and using the phrase _"refrain from rephrasing the text, or any alterations whatsoever. report only what is in the image"_ the accuracy got upped even more. I suppose proper image pre-processing coupled with LLM post-processing could achieve near perfect results. It might be noteworthy that fine-tuning the model with your subject matter dictionary might prove helpful if anyone intends to use it on specialized matters or academic literature. OS: Ubuntu 24.02 LTS CPU: Intel 14900K GPU: RTX 4090 TUF RAM: 128GB DDR5

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#64067