[GH-ISSUE #3477] Support CLIP in LLaVA to provide services externally #64179

Open
opened 2026-05-03 16:28:34 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Andiedie on GitHub (Apr 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3477

What are you trying to do?

I am new to ollama (including llama.cpp, of course), so my questions may be a bit silly.

My use case is to serve both CLIP and LLaVA (which combines clip and mistral) at the same time.

LLaVA can run perfectly on ollama. But I need to open another service for CLIP.

What I want to ask is

  1. Can ollama support the CLIP embedding interface? The current embedding interface seems to only support text.
  2. Since the frozen CLIP included in LLaVA running on ollama, can it be directly reused instead of loading two copies in memory?

How should we solve this?

No response

What is the impact of not solving this?

No response

Anything else?

No response

Originally created by @Andiedie on GitHub (Apr 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3477 ### What are you trying to do? I am new to ollama (including llama.cpp, of course), so my questions may be a bit silly. My use case is to serve both CLIP and LLaVA (which combines clip and mistral) at the same time. LLaVA can run perfectly on ollama. But I need to open another service for CLIP. What I want to ask is 1. Can ollama support the CLIP embedding interface? The current embedding interface seems to only support text. 2. Since the frozen CLIP included in LLaVA running on ollama, can it be directly reused instead of loading two copies in memory? ### How should we solve this? _No response_ ### What is the impact of not solving this? _No response_ ### Anything else? _No response_
Author
Owner

@igorschlum commented on GitHub (Apr 4, 2024):

Hi @Andiedie
LLava uses CLIP internally to understand images, but you can't directly ask CLIP questions within LLava.<
There's currently no way to use CLIP independently from LLava.

Here are some options:

You could try to build a new model for Ollama from the Hugging face code of CLIP. Or someone could have done it and could publish it.

<!-- gh-comment-id:2035932858 --> @igorschlum commented on GitHub (Apr 4, 2024): Hi @Andiedie LLava uses CLIP internally to understand images, but you can't directly ask CLIP questions within LLava.< There's currently no way to use CLIP independently from LLava. Here are some options: You could try to build a new model for Ollama from the Hugging face code of CLIP. Or someone could have done it and could publish it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64179