[GH-ISSUE #3322] I can't make vision models work #27802

Closed
opened 2026-04-22 05:23:56 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @donnadulcinea on GitHub (Mar 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3322

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am running ollama via docker. Everything works smootly but vision models.
I tried llava and bakllava with no success.

What did you expect to see?

The description of the image I provided.

Steps to reproduce

Run an instance of ollama with docker, pull latest model of llava or bakllava.
Make a query test, exactly as in
https://github.com/ollama/ollama/blob/main/docs/api.md#request-with-images
The answer is not as expected, it is always random, for example:

{
    "model": "llava",
    "created_at": "2024-03-24T05:02:22.859351985Z",
    "response": " The image shows a person sitting at a table with some papers or documents. The focus is on the person's face, which appears to be in deep thought or concentration. There are no other discernable objects or details in the picture. ",
    "done": true,
    "context": [...

Tried with llava and bakllava, every other model seems to work smoothly. I tried with highest quality and simple content images.

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

arm64

Platform

Docker

Ollama version

ollama version is 0.1.28

GPU

No response

GPU info

No response

CPU

No response

Other software

No response

Originally created by @donnadulcinea on GitHub (Mar 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3322 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am running ollama via docker. Everything works smootly but vision models. I tried `llava` and `bakllava` with no success. ### What did you expect to see? The description of the image I provided. ### Steps to reproduce Run an instance of ollama with docker, pull latest model of llava or bakllava. Make a query test, exactly as in https://github.com/ollama/ollama/blob/main/docs/api.md#request-with-images The answer is not as expected, it is always random, for example: ``` { "model": "llava", "created_at": "2024-03-24T05:02:22.859351985Z", "response": " The image shows a person sitting at a table with some papers or documents. The focus is on the person's face, which appears to be in deep thought or concentration. There are no other discernable objects or details in the picture. ", "done": true, "context": [... ``` Tried with llava and bakllava, every other model seems to work smoothly. I tried with highest quality and simple content images. ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture arm64 ### Platform Docker ### Ollama version ollama version is 0.1.28 ### GPU _No response_ ### GPU info _No response_ ### CPU _No response_ ### Other software _No response_
GiteaMirror added the needs more infobug labels 2026-04-22 05:23:56 -05:00
Author
Owner

@marksalpeter commented on GitHub (Mar 24, 2024):

Might be a duplicate of #3298. I'm having the same issue running llava locally on my mac.

<!-- gh-comment-id:2016936228 --> @marksalpeter commented on GitHub (Mar 24, 2024): Might be a duplicate of #3298. I'm having the same issue running llava locally on my mac.
Author
Owner

@donnadulcinea commented on GitHub (Mar 25, 2024):

Might be a duplicate of #3298. I'm having the same issue running llava locally on my mac.

I saw this issue, but I opened this one because it seems a different one:
First I am experimenting by Rest API, second, it seems there is an issue like downsampling there.
Anyway thank you forl linking that, they may be related.

<!-- gh-comment-id:2017021297 --> @donnadulcinea commented on GitHub (Mar 25, 2024): > Might be a duplicate of #3298. I'm having the same issue running llava locally on my mac. I saw this issue, but I opened this one because it seems a different one: First I am experimenting by Rest API, second, it seems there is an issue like downsampling there. Anyway thank you forl linking that, they may be related.
Author
Owner

@mxyng commented on GitHub (Mar 27, 2024):

I can't reproduce this. Using the example from the link, this is what I get:

$ curl http://localhost:11434/api/generate -d '{
  "model": "llava",
  "prompt":"What is in this picture?",
  "stream": false,
  "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}'
{"model":"llava","created_at":"2024-03-27T22:43:39.689889Z","response":" The image shows an animated character that appears to be a stylized, cartoon-like creature. It's drawn in black and white with simple lines, giving it a cute and whimsical appearance. This character is often associated with a certain type of internet meme known as \"doge memes.\" ","done":true,"context":[733,16289,28793,1824,349,297,456,5754,28804,733,28748,16289,28793,415,3469,4370,396,25693,3233,369,8045,298,347,264,341,2951,1332,28725,7548,4973,28733,4091,15287,28723,661,28742,28713,10421,297,2687,304,3075,395,3588,4715,28725,5239,378,264,17949,304,388,7805,745,9293,28723,851,3233,349,2608,5363,395,264,2552,1212,302,7865,1626,28706,2651,390,345,17693,28706,1626,274,611,28705],"total_duration":2748008333,"load_duration":646098000,"prompt_eval_count":1,"prompt_eval_duration":1060742000,"eval_count":66,"eval_duration":1040527000}

Here's the image, for reference, decoded from the base64 input:
tmp

While it's not perfect, it's in line with the LLaVA demo which uses a much larger model (34b vs. mistral 7b):

image

<!-- gh-comment-id:2024110770 --> @mxyng commented on GitHub (Mar 27, 2024): I can't reproduce this. Using the example from the link, this is what I get: ``` $ curl http://localhost:11434/api/generate -d '{ "model": "llava", "prompt":"What is in this picture?", "stream": false, "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"] }' {"model":"llava","created_at":"2024-03-27T22:43:39.689889Z","response":" The image shows an animated character that appears to be a stylized, cartoon-like creature. It's drawn in black and white with simple lines, giving it a cute and whimsical appearance. This character is often associated with a certain type of internet meme known as \"doge memes.\" ","done":true,"context":[733,16289,28793,1824,349,297,456,5754,28804,733,28748,16289,28793,415,3469,4370,396,25693,3233,369,8045,298,347,264,341,2951,1332,28725,7548,4973,28733,4091,15287,28723,661,28742,28713,10421,297,2687,304,3075,395,3588,4715,28725,5239,378,264,17949,304,388,7805,745,9293,28723,851,3233,349,2608,5363,395,264,2552,1212,302,7865,1626,28706,2651,390,345,17693,28706,1626,274,611,28705],"total_duration":2748008333,"load_duration":646098000,"prompt_eval_count":1,"prompt_eval_duration":1060742000,"eval_count":66,"eval_duration":1040527000} ``` Here's the image, for reference, decoded from the base64 input: ![tmp](https://github.com/ollama/ollama/assets/2372640/d28b1013-90a4-4e96-919d-357cac8a8cd6) While it's not perfect, it's in line with the LLaVA [demo](https://llava.hliu.cc/) which uses a much larger model (34b vs. mistral 7b): ![image](https://github.com/ollama/ollama/assets/2372640/bbb1e521-48eb-4fcc-8434-373447920701)
Author
Owner

@marksalpeter commented on GitHub (Mar 28, 2024):

@mxyng the easiest way for me to reproduce this issue is to give llava a large document containing text. Its clear that the demo versions deployed on the web are able to read and interpret the text in the image. The ollama version of llava consistently either lies about the content of the documents, or can only read the largest headlines on the page and then claims that it cannot read further details because the text is "blurry". This indicates that is is unable to interpret the text on the page.

Example Image: This LLaVa 1.6 abstract

Screenshot 2024-03-28 at 1 00 59 PM

Example results from https://llava.hliu.cc/

Screenshot 2024-03-28 at 1 03 48 PM

Example results from ollama

Screenshot 2024-03-28 at 1 10 43 PM
<!-- gh-comment-id:2025038491 --> @marksalpeter commented on GitHub (Mar 28, 2024): @mxyng the easiest way for me to reproduce this issue is to give llava a large document containing text. Its clear that the demo versions deployed on the web are able to read and interpret the text in the image. The ollama version of llava consistently either lies about the content of the documents, or can only read the largest headlines on the page and then claims that it cannot read further details because the text is "blurry". This indicates that is is unable to interpret the text on the page. ## Example Image: This LLaVa 1.6 abstract <img width="1792" alt="Screenshot 2024-03-28 at 1 00 59 PM" src="https://github.com/ollama/ollama/assets/1033500/82fc772c-6a55-4b32-93ca-bd2a049d5d6f"> ## Example results from https://llava.hliu.cc/ <img width="1792" alt="Screenshot 2024-03-28 at 1 03 48 PM" src="https://github.com/ollama/ollama/assets/1033500/d1d4ea5f-6350-41af-aaa6-76a20f7d31e9"> ## Example results from ollama <img width="1373" alt="Screenshot 2024-03-28 at 1 10 43 PM" src="https://github.com/ollama/ollama/assets/1033500/98415ba1-e18e-4124-aa96-a3cb6ca80161">
Author
Owner

@sebastian-philipp commented on GitHub (Apr 12, 2024):

I can reproduce this here:

image

source is:

image

<!-- gh-comment-id:2052056510 --> @sebastian-philipp commented on GitHub (Apr 12, 2024): I can reproduce this here: ![image](https://github.com/ollama/ollama/assets/2574405/7830ce27-3b33-4814-8082-5e633609f972) source is: ![image](https://github.com/ollama/ollama/assets/2574405/13abdaf2-141f-460a-bebc-10073944510e)
Author
Owner

@chiragv24 commented on GitHub (May 19, 2024):

Did anyone manage to find a solution to this. It was said that Ollama 0.1.34 solved the issue, but it isn't the case for me

<!-- gh-comment-id:2119351879 --> @chiragv24 commented on GitHub (May 19, 2024): Did anyone manage to find a solution to this. It was said that Ollama 0.1.34 solved the issue, but it isn't the case for me
Author
Owner

@solar-sprout commented on GitHub (May 27, 2024):

I set this up using ollama 0.1.38 and llava-llama-3-8b-v1.1. I followed the instructions on the model card and used the int4 model.

https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf

$ ollama --version
ollama version is 0.1.38

I curled the llama image request above and this was the response:

In the image, there's a playful scene featuring a cartoon cat. The cat, drawn in black and white, is sitting on its hind legs with its front paws raised as if it's about to perform or execute something. Its eyes are closed, suggesting a sense of concentration or anticipation.\n\nThe cat's head is slightly tilted to the left, adding to its expressive posture. It has a small nose and ears, typical characteristics of many cartoon cats...

I too was getting generic and random replies to every image (usually describing coffee and cafes) but my problem was I forgot to include the mmproj file in my model file.

FROM ./models/llava-llama-3-8b-v1_1-int4.gguf
FROM ./models/llava-llama-3-8b-v1_1-mmproj-f16.gguf

llava is now working on ollama for me with both files in place.

<!-- gh-comment-id:2132635504 --> @solar-sprout commented on GitHub (May 27, 2024): I set this up using ollama 0.1.38 and llava-llama-3-8b-v1.1. I followed the instructions on the model card and used the int4 model. https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf $ ollama --version ollama version is 0.1.38 I curled the llama image request above and this was the response: In the image, there's a playful scene featuring a cartoon cat. The cat, drawn in black and white, is sitting on its hind legs with its front paws raised as if it's about to perform or execute something. Its eyes are closed, suggesting a sense of concentration or anticipation.\n\nThe cat's head is slightly tilted to the left, adding to its expressive posture. It has a small nose and ears, typical characteristics of many cartoon cats... I too was getting generic and random replies to every image (usually describing coffee and cafes) but my problem was I forgot to include the `mmproj` file in my model file. FROM ./models/llava-llama-3-8b-v1_1-int4.gguf FROM ./models/llava-llama-3-8b-v1_1-mmproj-f16.gguf llava is now working on ollama for me with both files in place.
Author
Owner

@dhiltgen commented on GitHub (Nov 6, 2024):

Is this still a problem for folks?

Please give 0.4.0 a try, and llama3.2 vision. You might also have better luck with llava:34b if your image is more complicated.

<!-- gh-comment-id:2460419183 --> @dhiltgen commented on GitHub (Nov 6, 2024): Is this still a problem for folks? Please give 0.4.0 a try, and llama3.2 vision. You might also have better luck with llava:34b if your image is more complicated.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27802