[GH-ISSUE #9020] Llama3.2-vision Doesn't Accept an Array with Single Image. #5867

Closed
opened 2026-04-12 17:12:25 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @chigkim on GitHub (Feb 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9020

What is the issue?

If you pass an array containing single image to llama3.2-vision, it throws an error:
If you run the same request to minicpm-v, it works.
However, if you ask llama3.2-vision with the same requests but using OpenAI API, it accepts fine.
I understand that llama3.2-vision can handle only one image, but Ollama should be smart and accept array containing a single image.
Thanks!

Relevant log output

.venv\Lib\site-packages\ollama\_client.py", line 168, in inner
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: vision model only supports a single image per message (status code: 500)

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.5.7

Originally created by @chigkim on GitHub (Feb 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9020 ### What is the issue? If you pass an array containing single image to llama3.2-vision, it throws an error: If you run the same request to minicpm-v, it works. However, if you ask llama3.2-vision with the same requests but using OpenAI API, it accepts fine. I understand that llama3.2-vision can handle only one image, but Ollama should be smart and accept array containing a single image. Thanks! ### Relevant log output ```shell .venv\Lib\site-packages\ollama\_client.py", line 168, in inner raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: vision model only supports a single image per message (status code: 500) ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-12 17:12:25 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 11, 2025):

llama3.2-vision does accept an array with one image via the ollama API:

 $ echo '{"model": "llama3.2-vision",
         "messages":[{
            "role":"user","content":"describe the animals shown in the images",
            "images": [
              "'"$(base64 puppy.jpg)"'"
            ]
          }],
         "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq
{
  "model": "llama3.2-vision",
  "created_at": "2025-02-11T20:03:31.249468548Z",
  "message": {
    "role": "assistant",
    "content": "The image shows a small, white puppy sitting on what appears to be a stone or concrete surface. The puppy is positioned facing towards the right side of the image and has its ears folded back against its head. It is wearing a red collar adorned with a gold bell around its neck.\n\nThe puppy's fur is short and fluffy, giving it a soft and cuddly appearance. Its eyes are dark brown and appear to be looking off into the distance, as if it is gazing at something beyond the frame of the image. The puppy's nose is black, and its mouth is closed, giving it a peaceful expression.\n\nThe background of the image is out of focus, but it appears to be a room or outdoor space with a wall or other surface visible in the distance. The overall atmosphere of the image is one of calmness and serenity, with the puppy appearing relaxed and content as it sits on its perch."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 10183230188,
  "load_duration": 2682934437,
  "prompt_eval_count": 19,
  "prompt_eval_duration": 2266000000,
  "eval_count": 185,
  "eval_duration": 5134000000
}
<!-- gh-comment-id:2651953204 --> @rick-github commented on GitHub (Feb 11, 2025): llama3.2-vision does accept an array with one image via the ollama API: ```console $ echo '{"model": "llama3.2-vision", "messages":[{ "role":"user","content":"describe the animals shown in the images", "images": [ "'"$(base64 puppy.jpg)"'" ] }], "stream":false}' | curl -s http://localhost:11434/api/chat -d @- | jq ``` ```json { "model": "llama3.2-vision", "created_at": "2025-02-11T20:03:31.249468548Z", "message": { "role": "assistant", "content": "The image shows a small, white puppy sitting on what appears to be a stone or concrete surface. The puppy is positioned facing towards the right side of the image and has its ears folded back against its head. It is wearing a red collar adorned with a gold bell around its neck.\n\nThe puppy's fur is short and fluffy, giving it a soft and cuddly appearance. Its eyes are dark brown and appear to be looking off into the distance, as if it is gazing at something beyond the frame of the image. The puppy's nose is black, and its mouth is closed, giving it a peaceful expression.\n\nThe background of the image is out of focus, but it appears to be a room or outdoor space with a wall or other surface visible in the distance. The overall atmosphere of the image is one of calmness and serenity, with the puppy appearing relaxed and content as it sits on its perch." }, "done_reason": "stop", "done": true, "total_duration": 10183230188, "load_duration": 2682934437, "prompt_eval_count": 19, "prompt_eval_duration": 2266000000, "eval_count": 185, "eval_duration": 5134000000 } ```
Author
Owner

@chigkim commented on GitHub (Feb 12, 2025):

Sorry about that! It looks like LlamaIndex duplicates image twice when using legacy ImageDocument for some reason.
Thanks for verifying!

<!-- gh-comment-id:2652334518 --> @chigkim commented on GitHub (Feb 12, 2025): Sorry about that! It looks like LlamaIndex duplicates image twice when using legacy ImageDocument for some reason. Thanks for verifying!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5867