[GH-ISSUE #5304] Support for multimodal embedding models #3327

Open
opened 2026-04-12 13:54:34 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @k0marov on GitHub (Jun 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5304

Hi! It seems I'm not able to find a REST API endpoint for generating embeddings for an image, in other words, providing functionality for using models like CLIP which can take both text and images as input.
But these models are very useful in many applications, such as semantic image search, classification, etc.

Note: I'll be glad to contribute by implementing support for this.

Originally created by @k0marov on GitHub (Jun 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5304 Hi! It seems I'm not able to find a REST API endpoint for generating embeddings for an image, in other words, providing functionality for using models like CLIP which can take both text and images as input. But these models are very useful in many applications, such as semantic image search, classification, etc. Note: I'll be glad to contribute by implementing support for this.
GiteaMirror added the feature request label 2026-04-12 13:54:34 -05:00
Author
Owner

@AndreBerzun commented on GitHub (Sep 2, 2024):

@jmorganca I definitely second this feature request. Is this something that's on your roadmap? In fact, is there an offical roadmap or place where the core maintainers document what features have the highest prio right now?

I think vision/multimodal embedding models are going to be a game changer for RAG apps since they can basically replace entire OCR pipelines. In particular there is a new VLM called ColPali that seems to work miracles for RAG apps and I'm really eager to see it added to Ollama.

Here is the original ColPali paper: https://arxiv.org/abs/2407.01449
... and here is a great breakdown of its capabilities: https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/

<!-- gh-comment-id:2325262729 --> @AndreBerzun commented on GitHub (Sep 2, 2024): @jmorganca I definitely second this feature request. Is this something that's on your roadmap? In fact, is there an offical roadmap or place where the core maintainers document what features have the highest prio right now? I think vision/multimodal embedding models are going to be a game changer for RAG apps since they can basically replace entire OCR pipelines. In particular there is a new VLM called **ColPali** that seems to work miracles for RAG apps and I'm really eager to see it added to Ollama. Here is the original ColPali paper: https://arxiv.org/abs/2407.01449 ... and here is a great breakdown of its capabilities: https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/
Author
Owner

@pulinagrawal commented on GitHub (Oct 1, 2024):

I want to deploy some projects using ollama but lack of image embeddings is limiting me. Are there other options out there?

<!-- gh-comment-id:2386100471 --> @pulinagrawal commented on GitHub (Oct 1, 2024): I want to deploy some projects using ollama but lack of image embeddings is limiting me. Are there other options out there?
Author
Owner

@om2468 commented on GitHub (Nov 12, 2024):

  • 1 please help to implement it would be great to have this
<!-- gh-comment-id:2471340963 --> @om2468 commented on GitHub (Nov 12, 2024): + 1 please help to implement it would be great to have this
Author
Owner

@DewiarQR commented on GitHub (Feb 13, 2025):

Уверен, что с появлением моделей подобного рода проект Ollama станет значительно более популярным и востребованным. Это сейчас максимально нужная штука. Пожалуйста, подумайте над добавлением ColPali

<!-- gh-comment-id:2656333906 --> @DewiarQR commented on GitHub (Feb 13, 2025): Уверен, что с появлением моделей подобного рода проект Ollama станет значительно более популярным и востребованным. Это сейчас максимально нужная штука. Пожалуйста, подумайте над добавлением ColPali
Author
Owner

@eamonburns commented on GitHub (Mar 11, 2025):

https://github.com/ollama/ollama/issues/4296

<!-- gh-comment-id:2712113591 --> @eamonburns commented on GitHub (Mar 11, 2025): https://github.com/ollama/ollama/issues/4296
Author
Owner

@hoeflechner commented on GitHub (Mar 11, 2025):

i wrote a minimal service that can provide embeddings for the open_clip models. i will use that until ollama can do it.
https://github.com/hoeflechner/oclip

<!-- gh-comment-id:2715014101 --> @hoeflechner commented on GitHub (Mar 11, 2025): i wrote a minimal service that can provide embeddings for the open_clip models. i will use that until ollama can do it. https://github.com/hoeflechner/oclip
Author
Owner

@shellphy commented on GitHub (Apr 15, 2025):

Is there any plan for this request?

<!-- gh-comment-id:2803489335 --> @shellphy commented on GitHub (Apr 15, 2025): Is there any plan for this request?
Author
Owner

@tjwebb commented on GitHub (Jul 29, 2025):

+1

<!-- gh-comment-id:3133109600 --> @tjwebb commented on GitHub (Jul 29, 2025): +1
Author
Owner

@atarora commented on GitHub (Aug 1, 2025):

+1 , Would be great to have a support for multimodal embedding model !

<!-- gh-comment-id:3144415104 --> @atarora commented on GitHub (Aug 1, 2025): +1 , Would be great to have a support for multimodal embedding model !
Author
Owner

@ajroetker commented on GitHub (Aug 5, 2025):

Curious what the maintainers think about this https://github.com/ollama/ollama/pull/10728?

<!-- gh-comment-id:3156616219 --> @ajroetker commented on GitHub (Aug 5, 2025): Curious what the maintainers think about this https://github.com/ollama/ollama/pull/10728?
Author
Owner

@reneleonhardt commented on GitHub (Aug 6, 2025):

I guess after one year of not replying to this feature request and 3 months of that pull request they're continue to not think about multimodal at all 😉

<!-- gh-comment-id:3158906377 --> @reneleonhardt commented on GitHub (Aug 6, 2025): I guess after one year of not replying to this feature request and 3 months of that pull request they're continue to not think about multimodal at all 😉
Author
Owner

@dam2452 commented on GitHub (Oct 19, 2025):

bump

<!-- gh-comment-id:3419768230 --> @dam2452 commented on GitHub (Oct 19, 2025): bump
Author
Owner

@youyuzzg commented on GitHub (Jan 25, 2026):

Add support for Qwen3-VL-Embedding and Qwen3-VL-Reranker model series

<!-- gh-comment-id:3796545006 --> @youyuzzg commented on GitHub (Jan 25, 2026): Add support for Qwen3-VL-Embedding and Qwen3-VL-Reranker model series
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3327