[GH-ISSUE #11243] Multi-Modal Support #53919

Open
opened 2026-04-29 04:57:21 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Vinay-Umrethe on GitHub (Jun 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11243

CURRENT OLLAMA SUPPORTS :

  1. Text Generation
  2. Image Understanding With Multiple Images As Well With Vision Models.

WHAT IT NEEDS :

Models Like openbmb/MiniCPM-o-2_6 or openbmb/MiniCPM-o-2_6-gguf are great models which supports

AUDIO/TEXT/IMAGE/LIVE-VIDEO INPUT

and can generate

TEXT/AUDIO-SPEECH OUTPUT

they have there models on ollama avilable but current ollama features LIMITS it's capabilities by only image/text input and text output.

PLEASE ADD

  1. VIDEO, AUDIO - INPUT
  2. AUDIO - OUTPUT

ALSO REST API FOR IT UPDATE DOCUMENTATIONS IF DONE, HOPE THIS FEATURE WILL BE ADDED SOON...

IT'S GOOD FOR RUNNING OMNI MODELS.

Originally created by @Vinay-Umrethe on GitHub (Jun 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11243 ### CURRENT OLLAMA SUPPORTS : 1. Text Generation 2. Image Understanding With Multiple Images As Well With Vision Models. ### WHAT IT NEEDS : Models Like `openbmb/MiniCPM-o-2_6` or `openbmb/MiniCPM-o-2_6-gguf` are great models which supports > **AUDIO/TEXT/IMAGE/LIVE-VIDEO INPUT** and can generate > **TEXT/AUDIO-SPEECH OUTPUT** they have there models on ollama avilable but current ollama features LIMITS it's capabilities by only image/text input and text output. ### PLEASE ADD 1. VIDEO, AUDIO - INPUT ✨ 2. AUDIO - OUTPUT ✨ ALSO REST API FOR IT UPDATE DOCUMENTATIONS IF DONE, HOPE THIS FEATURE WILL BE ADDED SOON... IT'S GOOD FOR RUNNING ⚡OMNI MODELS.
GiteaMirror added the feature request label 2026-04-29 04:57:21 -05:00
Author
Owner

@profnagol commented on GitHub (Jun 30, 2025):

This is currently the only feature I think is missing that deserve attention, if there is one thing to do, that's it.

<!-- gh-comment-id:3019704167 --> @profnagol commented on GitHub (Jun 30, 2025): This is currently the only feature I think is missing that deserve attention, if there is one thing to do, that's it.
Author
Owner

@wfcola commented on GitHub (Jul 9, 2025):

Really needed! 90% time can be saved for testing different m-model.

<!-- gh-comment-id:3051044212 --> @wfcola commented on GitHub (Jul 9, 2025): Really needed! 90% time can be saved for testing different m-model.
Author
Owner

@chllei commented on GitHub (Jul 29, 2025):

I also require this feature. Do you have any updates from Ollama regarding the new engine?

<!-- gh-comment-id:3131095723 --> @chllei commented on GitHub (Jul 29, 2025): I also require this feature. Do you have any updates from Ollama regarding the new engine?
Author
Owner

@Vinay-Umrethe commented on GitHub (Apr 10, 2026):

I'd advise to move to llama.cpp (which is also core of Ollama) specifically llama-server it provides WebUI + supports direct Audio, Image, PDF, inputs if the model supports it...

Keeping this OPEN anyway

<!-- gh-comment-id:4225196495 --> @Vinay-Umrethe commented on GitHub (Apr 10, 2026): I'd advise to move to `llama.cpp` (which is also core of Ollama) specifically `llama-server` it provides WebUI + supports direct Audio, Image, PDF, inputs if the model supports it... Keeping this OPEN anyway
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53919