[GH-ISSUE #7677] Enable image embeddings for vision models #51409

Closed
opened 2026-04-28 19:54:41 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @kevin-pw on GitHub (Nov 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7677

I would love to be able to create embeddings for images with vision models like llama3.2-vision.

Creating image and text embeddings with a vision-capable model should allow creating image search and image categorization applications.

If my understanding of the shared semantic vector space of image models is correct, it should be possible to perform calculations like cosine similarity on text and image embeddings to, for example, find all the photos of puppy dogs in a random assortment of photos :)

At this time, the generate endpoint accepts an images parameter, but the embed endpoint does not. I tried passing an image as a base64 string to the input parameter of the embed endpoint, but the resulting embedding appears to be the vector of the text string and not of the image.

Would it be possible to expand the embed endpoint to accept an image parameter?

Originally created by @kevin-pw on GitHub (Nov 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7677 I would love to be able to create embeddings for images with vision models like `llama3.2-vision`. Creating image and text embeddings with a vision-capable model should allow creating image search and image categorization applications. If my understanding of the shared semantic vector space of image models is correct, it should be possible to perform calculations like cosine similarity on text and image embeddings to, for example, find all the photos of puppy dogs in a random assortment of photos :) At this time, the `generate` endpoint accepts an `images` parameter, but the `embed` endpoint does not. I tried passing an image as a base64 string to the `input` parameter of the `embed` endpoint, but the resulting embedding appears to be the vector of the text string and not of the image. Would it be possible to expand the `embed` endpoint to accept an image parameter?
GiteaMirror added the feature request label 2026-04-28 19:54:41 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 15, 2024):

https://github.com/ollama/ollama/issues/5304

<!-- gh-comment-id:2478352842 --> @rick-github commented on GitHub (Nov 15, 2024): https://github.com/ollama/ollama/issues/5304
Author
Owner

@kevin-pw commented on GitHub (Nov 15, 2024):

Thank you for pointing me to the existing issue on my feature request. I am closing this issue as duplicate.

<!-- gh-comment-id:2479461785 --> @kevin-pw commented on GitHub (Nov 15, 2024): Thank you for pointing me to the existing issue on my feature request. I am closing this issue as duplicate.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51409