[GH-ISSUE #1772] Metadata field for multimodal models #26777

Closed
opened 2026-04-22 03:22:44 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @shreyaskarnik on GitHub (Jan 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1772

Would it be possible to add some metadata to the model indicating that it is multimodal? This will help to select the right model in applications that are built on top of the API to support multimodal architecture. I believe this will also help to search through models at https://ollama.ai/library and filter based on multimodal support.

Originally created by @shreyaskarnik on GitHub (Jan 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1772 Would it be possible to add some metadata to the model indicating that it is multimodal? This will help to select the right model in applications that are built on top of the API to support multimodal architecture. I believe this will also help to search through models at https://ollama.ai/library and filter based on multimodal support.
Author
Owner

@pdevine commented on GitHub (Jan 4, 2024):

This is possible already through the API by using the /api/show endpoint and looking through .details.families. An example with curl:

curl localhost:11434/api/show -X POST -d '{"name": "llava"}' | jq ".details.families"

It should output:

[
  "llama",
  "clip"
]

It shows the two families which the model uses (i.e. the "multimodal" part), namely the "llama" model family for that chat portion, and the "clip" family which is used for converting images into text descriptions.

Hopefully that helps! I'm going to close the issue, but feel free to keep commenting on it/re-open it.

<!-- gh-comment-id:1876138025 --> @pdevine commented on GitHub (Jan 4, 2024): This is possible already through the API by using the `/api/show` endpoint and looking through `.details.families`. An example with curl: ``` curl localhost:11434/api/show -X POST -d '{"name": "llava"}' | jq ".details.families" ``` It should output: ``` [ "llama", "clip" ] ``` It shows the two families which the model uses (i.e. the "multimodal" part), namely the "llama" model family for that chat portion, and the "clip" family which is used for converting images into text descriptions. Hopefully that helps! I'm going to close the issue, but feel free to keep commenting on it/re-open it.
Author
Owner

@shreyaskarnik commented on GitHub (Jan 4, 2024):

@pdevine thanks for the clarification I was using the clip selector to make the appropriate selection of model. Thanks for confirming that it is the way to go. On the https://ollama.ai/library page will there be a option to select only multimodal models in the future?

<!-- gh-comment-id:1876143666 --> @shreyaskarnik commented on GitHub (Jan 4, 2024): @pdevine thanks for the clarification I was using the clip selector to make the appropriate selection of model. Thanks for confirming that it is the way to go. On the https://ollama.ai/library page will there be a option to select only multimodal models in the future?
Author
Owner

@pdevine commented on GitHub (Jan 4, 2024):

Sounds like a really good addition.

<!-- gh-comment-id:1876190176 --> @pdevine commented on GitHub (Jan 4, 2024): Sounds like a really good addition.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26777