[GH-ISSUE #7388] Llama3.2-vision - fails to process png files #30456

Closed
opened 2026-04-22 10:04:37 -05:00 by GiteaMirror · 21 comments
Owner

Originally created by @pitimespi on GitHub (Oct 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7388

What is the issue?

Couldn't process image: "invalid image type: application/octet-stream"
Error: invalid image type: application/octet-stream

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

3.2-vision 0.4.0-rc5

Originally created by @pitimespi on GitHub (Oct 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7388 ### What is the issue? Couldn't process image: "invalid image type: application/octet-stream" Error: invalid image type: application/octet-stream ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 3.2-vision 0.4.0-rc5
GiteaMirror added the bug label 2026-04-22 10:04:37 -05:00
Author
Owner

@pitimespi commented on GitHub (Oct 27, 2024):

Also get this error on some jpg files as well:

Couldn't process image: "invalid image type: image/webp"
Error: invalid image type: image/webp

<!-- gh-comment-id:2440224228 --> @pitimespi commented on GitHub (Oct 27, 2024): Also get this error on some jpg files as well: ``` Couldn't process image: "invalid image type: image/webp" Error: invalid image type: image/webp ```
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

How are you calling the model?

$ file puppy.png
puppy.png: PNG image data, 366 x 555, 8-bit/color RGB, non-interlaced
$ (echo '{"model":"x/llama3.2-vision","messages":[{"role":"user","content":"describe this image","images":["' ; base64 -w0 puppy.png ; echo '"]}],"stream":false}') | curl -s localhost:11434/api/chat -d @- | jq
{
  "model": "x/llama3.2-vision",
  "created_at": "2024-10-28T00:21:18.795239901Z",
  "message": {
    "role": "assistant",
    "content": "The image features a small, white puppy sitting on a stone surface. The puppy is positioned in the center of the frame and faces to the right, with its head slightly turned towards the camera. It has short, fluffy fur that appears to be either pure white or very light-colored, making it difficult to discern any darker markings.\n\nThe puppy's ears are folded back against its head, giving it a cute and endearing appearance. A small red collar encircles its neck, adorned with a shiny gold bell that adds a touch of elegance to the overall scene.\n\nIn the background, the stone surface on which the puppy is sitting provides a subtle yet rustic contrast to the puppy's soft fur. The image is well-lit, suggesting that it was taken outdoors during the daytime or in a brightly lit indoor setting. Overall, the image presents a charming and intimate portrait of a small white puppy, capturing its innocence and playfulness."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 5857875161,
  "load_duration": 32072587,
  "prompt_eval_count": 13,
  "prompt_eval_duration": 1994000000,
  "eval_count": 186,
  "eval_duration": 3742000000
}
<!-- gh-comment-id:2440250485 --> @rick-github commented on GitHub (Oct 28, 2024): How are you calling the model? ```console $ file puppy.png puppy.png: PNG image data, 366 x 555, 8-bit/color RGB, non-interlaced $ (echo '{"model":"x/llama3.2-vision","messages":[{"role":"user","content":"describe this image","images":["' ; base64 -w0 puppy.png ; echo '"]}],"stream":false}') | curl -s localhost:11434/api/chat -d @- | jq { "model": "x/llama3.2-vision", "created_at": "2024-10-28T00:21:18.795239901Z", "message": { "role": "assistant", "content": "The image features a small, white puppy sitting on a stone surface. The puppy is positioned in the center of the frame and faces to the right, with its head slightly turned towards the camera. It has short, fluffy fur that appears to be either pure white or very light-colored, making it difficult to discern any darker markings.\n\nThe puppy's ears are folded back against its head, giving it a cute and endearing appearance. A small red collar encircles its neck, adorned with a shiny gold bell that adds a touch of elegance to the overall scene.\n\nIn the background, the stone surface on which the puppy is sitting provides a subtle yet rustic contrast to the puppy's soft fur. The image is well-lit, suggesting that it was taken outdoors during the daytime or in a brightly lit indoor setting. Overall, the image presents a charming and intimate portrait of a small white puppy, capturing its innocence and playfulness." }, "done_reason": "stop", "done": true, "total_duration": 5857875161, "load_duration": 32072587, "prompt_eval_count": 13, "prompt_eval_duration": 1994000000, "eval_count": 186, "eval_duration": 3742000000 } ```
Author
Owner

@pitimespi commented on GitHub (Oct 28, 2024):

ollama run x/llama3.2-vision:latest "Describe image: <path_to_image>"

I have this working for some jpeg files.

<!-- gh-comment-id:2440257878 --> @pitimespi commented on GitHub (Oct 28, 2024): ollama run x/llama3.2-vision:latest "Describe image: <path_to_image>" I have this working for some jpeg files.
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

Can you verify that the files are actually JPG files, and not mis-named WEBP files? Use the file command to identify the file type.

<!-- gh-comment-id:2440265824 --> @rick-github commented on GitHub (Oct 28, 2024): Can you verify that the files are actually JPG files, and not mis-named WEBP files? Use the `file` command to identify the file type.
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

There's an issue with passing image files via the CLI: https://github.com/ollama/ollama/issues/7386

<!-- gh-comment-id:2440281316 --> @rick-github commented on GitHub (Oct 28, 2024): There's an issue with passing image files via the CLI: https://github.com/ollama/ollama/issues/7386
Author
Owner

@pitimespi commented on GitHub (Oct 28, 2024):

Hmm.. This is getting even more interesting.

File 1
ISO Media, HEIF Image HEVC Main or Main Still Picture Profile

So, I saved it as .png and file <converted file>:
PNG image data, 3088 x 2320, 8-bit/color RGB, non-interlaced

And send it to ollama:

ollama run x/llama3.2-vision:latest "Describe image: <..>"
Added image '<..>'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


The output is just a bunch of !

File 2:
Yes, it turned out to be a webp file.
RIFF (little-endian) data, Web/P image, VP8 encoding, 1024x1024, Scaling: [none]x[none], YUV color, decoders should clamp

I saved it as .jpg:
JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 1024x1024, components 3

Ollama output:

Error: POST predict: Post "http://127.0.0.1:39043/completion": EOF

I am using ImageViewer on Ubuntu 24 to convert. Maybe this tool is no good? I am getting decent results on a few other jpeg and png images. I think I just happen to be feeding images of question source to Ollama. The ones that work fine are ones that I downloaded from my own camera.

<!-- gh-comment-id:2440283577 --> @pitimespi commented on GitHub (Oct 28, 2024): Hmm.. This is getting even more interesting. File 1 `ISO Media, HEIF Image HEVC Main or Main Still Picture Profile` So, I saved it as .png and `file <converted file>`: `PNG image data, 3088 x 2320, 8-bit/color RGB, non-interlaced` And send it to ollama: ``` ollama run x/llama3.2-vision:latest "Describe image: <..>" Added image '<..>' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ``` The output is just a bunch of `!` File 2: Yes, it turned out to be a webp file. `RIFF (little-endian) data, Web/P image, VP8 encoding, 1024x1024, Scaling: [none]x[none], YUV color, decoders should clamp` I saved it as .jpg: `JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 1024x1024, components 3` Ollama output: ``` Error: POST predict: Post "http://127.0.0.1:39043/completion": EOF ``` I am using ImageViewer on Ubuntu 24 to convert. Maybe this tool is no good? I am getting decent results on a few other jpeg and png images. I think I just happen to be feeding images of question source to Ollama. The ones that work fine are ones that I downloaded from my own camera.
Author
Owner

@mchiang0610 commented on GitHub (Oct 28, 2024):

@pitimespi Sorry about this. Do you have the image itself for us to test?

The error is showing that it is a webp image not a PNG.

Would love to get the converted JPEG from you.

<!-- gh-comment-id:2440319432 --> @mchiang0610 commented on GitHub (Oct 28, 2024): @pitimespi Sorry about this. Do you have the image itself for us to test? The error is showing that it is a webp image not a PNG. Would love to get the converted JPEG from you.
Author
Owner

@pitimespi commented on GitHub (Oct 28, 2024):

image1_converted

<!-- gh-comment-id:2440340833 --> @pitimespi commented on GitHub (Oct 28, 2024): ![image1_converted](https://github.com/user-attachments/assets/06b1e0a8-1c49-4792-8fc1-82f821a004cf)
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

Does it work better if you use the interactive mode?

$ ollama run x/llama3.2-vision:latest
>>> Describe image: <path_to_image>

Might be related to https://github.com/ollama/ollama/issues/7362.

<!-- gh-comment-id:2440368756 --> @rick-github commented on GitHub (Oct 28, 2024): Does it work better if you use the interactive mode? ```sh $ ollama run x/llama3.2-vision:latest >>> Describe image: <path_to_image> ``` Might be related to https://github.com/ollama/ollama/issues/7362.
Author
Owner

@pitimespi commented on GitHub (Oct 28, 2024):

Does it work better if you use the interactive mode?

No.

<!-- gh-comment-id:2440385959 --> @pitimespi commented on GitHub (Oct 28, 2024): > Does it work better if you use the interactive mode? No.
Author
Owner

@pitimespi commented on GitHub (Oct 28, 2024):

When I get Error: POST predict: Post "http://127.0.0.1:39043/completion": EOF, I ran journalctl -u ollama --no-pager which showed ollama[1983902]: [GIN] 2024/10/27 - 20:14:37 | 200 | 2.946318763s | 127.0.0.1 | POST "/api/generate" which seems to confirm that the issue is in fact related to #7362.

<!-- gh-comment-id:2440458089 --> @pitimespi commented on GitHub (Oct 28, 2024): When I get `Error: POST predict: Post "http://127.0.0.1:39043/completion": EOF`, I ran `journalctl -u ollama --no-pager` which showed `ollama[1983902]: [GIN] 2024/10/27 - 20:14:37 | 200 | 2.946318763s | 127.0.0.1 | POST "/api/generate"` which seems to confirm that the issue is in fact related to #7362.
Author
Owner

@pitimespi commented on GitHub (Oct 28, 2024):

Hitting /api/chat directly and passing base64 encoded strings worked for all images! Should I just close this as a dup of 7362 then?

<!-- gh-comment-id:2440482864 --> @pitimespi commented on GitHub (Oct 28, 2024): Hitting /api/chat directly and passing base64 encoded strings worked for all images! Should I just close this as a dup of 7362 then?
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

Interactive mode uses /api/chat so if you are still seeing problems there might be something else, so perhaps not a dupe of #7362.

<!-- gh-comment-id:2441291421 --> @rick-github commented on GitHub (Oct 28, 2024): Interactive mode uses `/api/chat` so if you are still seeing problems there might be something else, so perhaps not a dupe of #7362.
Author
Owner

@pdevine commented on GitHub (Oct 28, 2024):

Just a few things before we can close this issue:

  • The !!!!!!! issue is because of a kv cache problem that we're still sorting out.
  • webp isn't supported right now, just jpeg and pngs (although we can certainly add it in the future)
  • I've just merged #7384 which fixes the issues on /api/generate

@pitimespi Where are you seeing Error: POST predict: Post "http://127.0.0.1:39043/completion": EOF? That shouldn't be something generated by Ollama (and it doesn't quite look like the Open AI API /v1/completions endpoint which doesn't support images anyway unlike /api/generate)

<!-- gh-comment-id:2442637658 --> @pdevine commented on GitHub (Oct 28, 2024): Just a few things before we can close this issue: * The `!!!!!!!` issue is because of a kv cache problem that we're still sorting out. * webp isn't supported right now, just jpeg and pngs (although we can certainly add it in the future) * I've just merged #7384 which fixes the issues on `/api/generate` @pitimespi Where are you seeing `Error: POST predict: Post "http://127.0.0.1:39043/completion": EOF`? That shouldn't be something generated by Ollama (and it doesn't quite look like the Open AI API `/v1/completions` endpoint which doesn't support images anyway unlike `/api/generate`)
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

It's returned from the runner:
084929c293/llm/server.go (L764)

$ ollama:0.4.0-rc5 run x/llama3.2-vision describe this image: ./puppy.jpg
Added image './puppy.jpg'
Error: POST predict: Post "http://127.0.0.1:45799/completion": EOF
<!-- gh-comment-id:2442675373 --> @rick-github commented on GitHub (Oct 28, 2024): It's returned from the runner: https://github.com/ollama/ollama/blob/084929c29318c6a6604f1b01eb0a782cb31242ba/llm/server.go#L764 ```console $ ollama:0.4.0-rc5 run x/llama3.2-vision describe this image: ./puppy.jpg Added image './puppy.jpg' Error: POST predict: Post "http://127.0.0.1:45799/completion": EOF ```
Author
Owner

@oderwat commented on GitHub (Oct 28, 2024):

I think this is exactly the same as #7362

PNG is working just fine using chat.

I think what is easily to miss is that:

ollama run llama3.2 "What is a LLM?"
echo  "What is a LLM?" | ollama run llama3.2
ollama run llama3.2
>>> What is a LLM?

are not use the same API endpoint. 1. and 2. are using "generate" while 3. is using "chat".

<!-- gh-comment-id:2442687842 --> @oderwat commented on GitHub (Oct 28, 2024): I think this is exactly the same as #7362 PNG is working just fine using `chat`. I think what is easily to miss is that: 1. ``` ollama run llama3.2 "What is a LLM?" ``` 2. ``` echo "What is a LLM?" | ollama run llama3.2 ``` 3. ``` ollama run llama3.2 >>> What is a LLM? ``` are not use the same API endpoint. 1. and 2. are using "generate" while 3. is using "chat".
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

Austin indicates in https://github.com/ollama/ollama/issues/7388#issuecomment-2440385959 that his problem is not fixed by using 3.

<!-- gh-comment-id:2442881480 --> @rick-github commented on GitHub (Oct 28, 2024): Austin indicates in https://github.com/ollama/ollama/issues/7388#issuecomment-2440385959 that his problem is not fixed by using 3.
Author
Owner

@oderwat commented on GitHub (Oct 28, 2024):

@rick-github Well, maybe he could double-check that. Using https://github.com/ollama/ollama/issues/7362#issuecomment-2442862378 I can run jpg and png just fine.

<!-- gh-comment-id:2442887617 --> @oderwat commented on GitHub (Oct 28, 2024): @rick-github Well, maybe he could double-check that. Using https://github.com/ollama/ollama/issues/7362#issuecomment-2442862378 I can run jpg and png just fine.
Author
Owner

@pdevine commented on GitHub (Oct 29, 2024):

@rick-github ack, the runner errors shouldn't be bleeding back out that way. This is a side effect of the new runner code.

<!-- gh-comment-id:2442908798 --> @pdevine commented on GitHub (Oct 29, 2024): @rick-github ack, the runner errors shouldn't be bleeding back out that way. This is a side effect of the new runner code.
Author
Owner

@pitimespi commented on GitHub (Oct 29, 2024):

I can confirm I no longer have the issues I mentioned above (well, I can work around them now).

<!-- gh-comment-id:2443176236 --> @pitimespi commented on GitHub (Oct 29, 2024): I can confirm I no longer have the issues I mentioned above (well, I can work around them now).
Author
Owner

@pitimespi commented on GitHub (Oct 29, 2024):

Closing as all the remaining issues are known issues and are tracked elsewhere.

<!-- gh-comment-id:2443177035 --> @pitimespi commented on GitHub (Oct 29, 2024): Closing as all the remaining issues are known issues and are tracked elsewhere.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30456