[GH-ISSUE #13462] Feature request: Enable image input for vision-capable cloud models #34643

Closed
opened 2026-04-22 18:22:59 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @eitelnick on GitHub (Dec 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13462

Cloud models that have vision-capable variants upstream (for example families like Ministral 3 and GLM-4.6/GLM-4.6V) only expose text in Ollama Cloud today (ministral-3:cloud, glm-4.6:cloud).

I’d like to request that, whenever a cloud model has a vision-capable variant, Ollama Cloud exposes image input for it across:

Official UIs (image upload / drag-and-drop)
Native API / SDKs (images: [...] or equivalent)
OpenAI-compatible /v1/chat/completions endpoint
Current vs expected behavior
Current:

Local vision models can accept images via CLI and native API.
Cloud models like ministral-3:cloud and glm-4.6:cloud only support text, even though their families have strong multimodal variants elsewhere.
Expected:

If a cloud model is vision-capable (or has a vision sibling), it should:
Show an image upload option in the UI.
Accept images via images (native API/SDKs).
Accept images via the OpenAI-compatible /v1 API in a documented format.
Why this would help
Parity: Cloud models don’t feel artificially limited vs local vision models.
DX: One simple rule of thumb: “if it’s a vision model (or family), I can always send images, local or cloud.”
Use cases: Code-from-screenshot, UI analysis, diagram understanding, and other visual workflows benefit most from large cloud models.
If there are constraints or a roadmap item for cloud vision support, it’d be great to know. Otherwise, please consider making image/vision support a first-class, consistent feature for all applicable cloud models.

Originally created by @eitelnick on GitHub (Dec 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13462 Cloud models that have vision-capable variants upstream (for example families like Ministral 3 and GLM-4.6/GLM-4.6V) only expose text in Ollama Cloud today (ministral-3:cloud, glm-4.6:cloud). I’d like to request that, whenever a cloud model has a vision-capable variant, Ollama Cloud exposes image input for it across: Official UIs (image upload / drag-and-drop) Native API / SDKs (images: [...] or equivalent) OpenAI-compatible /v1/chat/completions endpoint Current vs expected behavior Current: Local vision models can accept images via CLI and native API. Cloud models like ministral-3:cloud and glm-4.6:cloud only support text, even though their families have strong multimodal variants elsewhere. Expected: If a cloud model is vision-capable (or has a vision sibling), it should: Show an image upload option in the UI. Accept images via images (native API/SDKs). Accept images via the OpenAI-compatible /v1 API in a documented format. Why this would help Parity: Cloud models don’t feel artificially limited vs local vision models. DX: One simple rule of thumb: “if it’s a vision model (or family), I can always send images, local or cloud.” Use cases: Code-from-screenshot, UI analysis, diagram understanding, and other visual workflows benefit most from large cloud models. If there are constraints or a roadmap item for cloud vision support, it’d be great to know. Otherwise, please consider making image/vision support a first-class, consistent feature for all applicable cloud models.
GiteaMirror added the feature request label 2026-04-22 18:22:59 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 14, 2025):

glm-4.6 is not a vision model. glm-4.6v is a vision model but is not supported in Cloud at the moment.

The following Cloud models are vision capable and accept images via the CLI and Ollama/OpenAI APIs, with the exception of the gemma3 models which respond with an internal server error on the OpenAI API:

qwen3-vl:235b-instruct
qwen3-vl:235b
ministral-3:3b
ministral-3:8b
ministral-3:14b
mistral-large-3:675b
devstral-small-2:24b
gemini-3-pro-preview
gemma3:4b
gemma3:12b
gemma3:27b

I assume they will also accept image uploads through the UI but I don't have access to a Mac/Windows machine to check.

<!-- gh-comment-id:3650049214 --> @rick-github commented on GitHub (Dec 14, 2025): glm-4.6 is not a vision model. glm-4.6v is a vision model but is not supported in Cloud at the moment. The following Cloud models are vision capable and accept images via the CLI and Ollama/OpenAI APIs, with the exception of the gemma3 models which respond with an [internal server error](https://github.com/ollama/ollama/issues/13464) on the OpenAI API: qwen3-vl:235b-instruct qwen3-vl:235b ministral-3:3b ministral-3:8b ministral-3:14b mistral-large-3:675b devstral-small-2:24b gemini-3-pro-preview gemma3:4b gemma3:12b gemma3:27b I assume they will also accept image uploads through the UI but I don't have access to a Mac/Windows machine to check.
Author
Owner

@eitelnick commented on GitHub (Dec 14, 2025):

Thanks for the response! The documentation, for example for Ministral-3, says "text" only for Cloud models such as ministral-3:14b-cloud. In the UI I can seem to accept image uploads...might just be a documentation fix.

<!-- gh-comment-id:3650231731 --> @eitelnick commented on GitHub (Dec 14, 2025): Thanks for the response! The documentation, for example for [Ministral-3](https://ollama.com/library/ministral-3), says "text" only for Cloud models such as ministral-3:14b-cloud. In the UI I can seem to accept image uploads...might just be a documentation fix.
Author
Owner

@rick-github commented on GitHub (Dec 14, 2025):

#13468

<!-- gh-comment-id:3650289468 --> @rick-github commented on GitHub (Dec 14, 2025): #13468
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34643