[GH-ISSUE #12789] Cloud doesn't support images in /api/generate. #8484

Closed
opened 2026-04-12 21:10:49 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @rick-github on GitHub (Oct 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12789

Originally assigned to: @gr4ceG on GitHub.

What is the issue?

The docs indicate that images are supported in /api/generate. Indeed, it works fine for local models:

$ echo '{"model":"qwen2.5vl","prompt":"describe","images":["'"$(base64 -w0 ./picture.png)"'"],"stream":false}' | curl -s localhost:11434/api/generate -d @- | jq -r .response
The image shows a small, fluffy white puppy sitting on a concrete surface. The puppy has a soft, white coat and is wearing a red collar with a small bell attached to it. The background appears to be indoors, with a dark wooden structure visible in the upper part of the image. The puppy looks curious and alert, with its ears perked up and its eyes wide open. The overall scene is calm and endearing, capturing a moment of the puppy's life.

However Cloud models treat the API request as if there is no image, and asks for more info or hallucinates a response:

$ echo '{"model":"qwen3-vl:235b-cloud","prompt":"describe","images":["'"$(base64 -w0 ./picture.png)"'"],"stream":false}' | curl -s localhost:11434/api/generate -d @- | jq -r .response
Sure! But I need a bit more context — “describe” what? 😊

You could be asking me to describe:

- A person, place, thing, or concept  
- An image or scene (if you can provide details)  
- A feeling, emotion, or experience  
- A process or how something works  
- A fictional world, character, or story  
- Or even *you* — if you’d like me to describe you based on what you’ve shared

Just give me a subject, and I’ll paint you a vivid, detailed picture — whether it’s poetic, technical, humorous, or whatever tone you prefer!

What would you like me to describe? 🌟

Direct connect has the same issue:

$ echo '{"model":"qwen3-vl:235b","prompt":"describe","images":["'"$(base64 -w0 ./picture.png)"'"],"stream":false}' | curl -s -H "Authorization: Bearer $OLLAMA_API_KEY" https://ollama.com/api/generate -d @- | jq -r .response
Sure! But I need a bit more context — could you clarify what you’d like me to describe?

For example:
- A person, place, object, or concept?
- A scene or event?
- A feeling or emotion?
- Something from a book, movie, or game?

Just say the word or phrase you’d like described, and I’ll give you a rich, detailed description. 😊

Go ahead — what should I describe?

/api/chat works as expected:

$ echo '{"model":"qwen3-vl:235b","messages":[{"role":"user","content":"describe","images":["'"$(base64 -w0 ./picture.png)"'"]}],"stream":false}' | curl -s -H "Authorization: Bearer $OLLAMA_API_KEY" https://ollama.com/api/chat -d @- | jq -r .message.content
This is an adorable, fluffy white puppy sitting upright on a stone or tiled step. The puppy has a soft, thick coat, round dark eyes, and a small black nose. Its ears are slightly floppy, and it’s looking off to the side with a curious, alert expression.

Around its neck is a bright red collar with a small, shiny gold bell attached — adding a charming, playful touch. The background is softly blurred, drawing focus to the puppy, while the textured gray step it’s sitting on provides a neutral, earthy contrast to its pure white fur.

The overall mood is sweet, innocent, and heartwarming — capturing a quiet, tender moment of a young pup exploring its surroundings.

This results in unexpected results when doing one-shot completions:

$ ollama run qwen2.5vl describe ./picture.png
Added image './picture.png'
The image shows a small, fluffy white puppy sitting on a concrete surface. The puppy has a soft, white coat and is wearing a red collar with a small bell attached to it. The background appears to be indoors, with a dark wooden structure 
visible in the upper part of the image. The puppy looks curious and alert, with its ears perked up and its eyes wide open. The overall scene is calm and endearing, capturing a moment of the puppy's life.

$ ollama run qwen3-vl:235b-cloud describe ./picture.png
Added image './picture.png'
Sure! But I need a little more context — could you clarify what you’d like me to describe? For example:

- A person, place, object, or concept?
- A scene, emotion, or experience?
- Something abstract like “freedom” or “joy”?
- Or perhaps you meant to ask me to describe *you* or *yourself*?

Just let me know — I’m here to help! 😊

There is an open PR that mentions reworking /generate so perhaps this is WAI.

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @rick-github on GitHub (Oct 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12789 Originally assigned to: @gr4ceG on GitHub. ### What is the issue? The [docs](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion) indicate that images are supported in `/api/generate`. Indeed, it works fine for local models: ```console $ echo '{"model":"qwen2.5vl","prompt":"describe","images":["'"$(base64 -w0 ./picture.png)"'"],"stream":false}' | curl -s localhost:11434/api/generate -d @- | jq -r .response The image shows a small, fluffy white puppy sitting on a concrete surface. The puppy has a soft, white coat and is wearing a red collar with a small bell attached to it. The background appears to be indoors, with a dark wooden structure visible in the upper part of the image. The puppy looks curious and alert, with its ears perked up and its eyes wide open. The overall scene is calm and endearing, capturing a moment of the puppy's life. ``` However Cloud models treat the API request as if there is no image, and asks for more info or hallucinates a response: ```console $ echo '{"model":"qwen3-vl:235b-cloud","prompt":"describe","images":["'"$(base64 -w0 ./picture.png)"'"],"stream":false}' | curl -s localhost:11434/api/generate -d @- | jq -r .response Sure! But I need a bit more context — “describe” what? 😊 You could be asking me to describe: - A person, place, thing, or concept - An image or scene (if you can provide details) - A feeling, emotion, or experience - A process or how something works - A fictional world, character, or story - Or even *you* — if you’d like me to describe you based on what you’ve shared Just give me a subject, and I’ll paint you a vivid, detailed picture — whether it’s poetic, technical, humorous, or whatever tone you prefer! What would you like me to describe? 🌟 ``` Direct connect has the same issue: ```console $ echo '{"model":"qwen3-vl:235b","prompt":"describe","images":["'"$(base64 -w0 ./picture.png)"'"],"stream":false}' | curl -s -H "Authorization: Bearer $OLLAMA_API_KEY" https://ollama.com/api/generate -d @- | jq -r .response Sure! But I need a bit more context — could you clarify what you’d like me to describe? For example: - A person, place, object, or concept? - A scene or event? - A feeling or emotion? - Something from a book, movie, or game? Just say the word or phrase you’d like described, and I’ll give you a rich, detailed description. 😊 Go ahead — what should I describe? ``` `/api/chat` works as expected: ```console $ echo '{"model":"qwen3-vl:235b","messages":[{"role":"user","content":"describe","images":["'"$(base64 -w0 ./picture.png)"'"]}],"stream":false}' | curl -s -H "Authorization: Bearer $OLLAMA_API_KEY" https://ollama.com/api/chat -d @- | jq -r .message.content This is an adorable, fluffy white puppy sitting upright on a stone or tiled step. The puppy has a soft, thick coat, round dark eyes, and a small black nose. Its ears are slightly floppy, and it’s looking off to the side with a curious, alert expression. Around its neck is a bright red collar with a small, shiny gold bell attached — adding a charming, playful touch. The background is softly blurred, drawing focus to the puppy, while the textured gray step it’s sitting on provides a neutral, earthy contrast to its pure white fur. The overall mood is sweet, innocent, and heartwarming — capturing a quiet, tender moment of a young pup exploring its surroundings. ``` This results in unexpected results when doing one-shot completions: ```console $ ollama run qwen2.5vl describe ./picture.png Added image './picture.png' The image shows a small, fluffy white puppy sitting on a concrete surface. The puppy has a soft, white coat and is wearing a red collar with a small bell attached to it. The background appears to be indoors, with a dark wooden structure visible in the upper part of the image. The puppy looks curious and alert, with its ears perked up and its eyes wide open. The overall scene is calm and endearing, capturing a moment of the puppy's life. $ ollama run qwen3-vl:235b-cloud describe ./picture.png Added image './picture.png' Sure! But I need a little more context — could you clarify what you’d like me to describe? For example: - A person, place, object, or concept? - A scene, emotion, or experience? - Something abstract like “freedom” or “joy”? - Or perhaps you meant to ask me to describe *you* or *yourself*? Just let me know — I’m here to help! 😊 ``` There is an open [PR](https://github.com/ollama/ollama/pull/12670) that mentions reworking `/generate` so perhaps this is WAI. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the cloudbug labels 2026-04-12 21:10:49 -05:00
Author
Owner

@maternion commented on GitHub (Oct 28, 2025):

Not very important, but the PR link isn't working for me. Maybe some issue with the markdown.

<!-- gh-comment-id:3456373485 --> @maternion commented on GitHub (Oct 28, 2025): Not very important, but the PR link isn't working for me. Maybe some issue with the markdown.
Author
Owner

@gr4ceG commented on GitHub (Oct 30, 2025):

hey! i believe this issue has been fixed with one of our recent changes. going to close it for now, feel free to reopen it and let me know if you still see any issues!

<!-- gh-comment-id:3470672859 --> @gr4ceG commented on GitHub (Oct 30, 2025): hey! i believe this issue has been fixed with one of our recent changes. going to close it for now, feel free to reopen it and let me know if you still see any issues!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8484