[GH-ISSUE #2608] How to identify multimodal models? #27296

Closed
opened 2026-04-22 04:31:15 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @gluonfield on GitHub (Feb 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2608

Hi guys, incredible work with Ollama!

I'm building client for Ollama and wondering what is the best way to identify multimodal models like llava, bakllava from the API? I want to display additional UI if model supports images.

It seems that both llava and bakllava returns /api/tags response containing families clip

    {
      ...
      "details": {
        "families": ["clip"],
      }
    }

Should clip be associated with model's image support?

Originally created by @gluonfield on GitHub (Feb 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2608 Hi guys, incredible work with Ollama! I'm building client for Ollama and wondering what is the best way to identify multimodal models like `llava`, `bakllava` from the API? I want to display additional UI if model supports images. It seems that both `llava` and `bakllava` returns `/api/tags` response containing families `clip` ```json { ... "details": { "families": ["clip"], } } ``` Should `clip` be associated with model's image support?
GiteaMirror added the question label 2026-04-22 04:31:15 -05:00
Author
Owner

@BruceMacD commented on GitHub (Feb 20, 2024):

Hey @AugustDev, you're correct. The "clip" family indicates that a model is multimodal. That is how we detect multi-modal models in our CLI right now too.

Resolving this one for now, let me know if you have any follow-up questions. Happy to help out.

<!-- gh-comment-id:1954740127 --> @BruceMacD commented on GitHub (Feb 20, 2024): Hey @AugustDev, you're correct. The "clip" family indicates that a model is multimodal. That is how we detect multi-modal models in our CLI right now too. Resolving this one for now, let me know if you have any follow-up questions. Happy to help out.
Author
Owner

@JHubi1 commented on GitHub (Mar 30, 2025):

How is this dealt with right now? More recent models like gemma3 and llama3.2-vision don't have the clip family anymore. Is there a way to do this through the API?

<!-- gh-comment-id:2764698990 --> @JHubi1 commented on GitHub (Mar 30, 2025): How is this dealt with right now? More recent models like `gemma3` and `llama3.2-vision` don't have the `clip` family anymore. Is there a way to do this through the API?
Author
Owner

@BruceMacD commented on GitHub (Mar 31, 2025):

@JHubi1 In the Ollama CLI it was just checking for specific fields from Ollama's /show API, but this was not ideal as it put a burden on the client for keeping up with how to check for a model's capabilities.

I've opened a pull request, #10066, to return a models capabilities explicitly from the /show API endpoint.

<!-- gh-comment-id:2767649046 --> @BruceMacD commented on GitHub (Mar 31, 2025): @JHubi1 In the Ollama CLI it was just checking for specific fields from Ollama's `/show` API, but this was not ideal as it put a burden on the client for keeping up with how to check for a model's capabilities. I've opened a pull request, #10066, to return a models capabilities explicitly from the `/show` API endpoint.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27296