[GH-ISSUE #2726] Ollama 01.26 embeddings, alternative Models? #48150

Closed
opened 2026-04-28 06:52:09 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @Daniel07n on GitHub (Feb 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2726

Hi, is there the possibility to load alternative embedding models other than BERT and Nomic? Like for the larger LLMs either via the list shown on Ollama.com or as a manual download from Hugginface?

Originally created by @Daniel07n on GitHub (Feb 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2726 Hi, is there the possibility to load alternative embedding models other than BERT and Nomic? Like for the larger LLMs either via the list shown on Ollama.com or as a manual download from Hugginface?
Author
Owner

@wrapss commented on GitHub (Feb 24, 2024):

this works literally the same way as with models, you need to find a embedding model in gguf format and use it in a ModelFile (see https://github.com/ollama/ollama/blob/main/docs/modelfile.md).

<!-- gh-comment-id:1962331198 --> @wrapss commented on GitHub (Feb 24, 2024): this works literally the same way as with models, you need to find a embedding model in gguf format and use it in a ModelFile (see https://github.com/ollama/ollama/blob/main/docs/modelfile.md).
Author
Owner

@shuther commented on GitHub (Feb 26, 2024):

Maybe I am confused but I am not sure I understand how embedding works with ollama. Usually, the embedding model is different to the chat model (i.e. intfloat/multilingual-e5-small vs. GPT4), so I am confused what ollama is doing when we hit the endpoint /embedding with the model mistral (is it bert, nomic-embed, something else?) Also, I did not find any example related to embedding on https://ollama.com/library - is there any specilized model for embedding or documentation to get one (such as intfloat/multilingual-e5-small)?

<!-- gh-comment-id:1964045455 --> @shuther commented on GitHub (Feb 26, 2024): Maybe I am confused but I am not sure I understand how embedding works with ollama. Usually, the embedding model is different to the chat model (i.e. intfloat/multilingual-e5-small vs. GPT4), so I am confused what ollama is doing when we hit the endpoint /embedding with the model mistral (is it bert, nomic-embed, something else?) Also, I did not find any example related to embedding on [https://ollama.com/library](https://ollama.com/library) - is there any specilized model for embedding or documentation to get one (such as intfloat/multilingual-e5-small)?
Author
Owner

@wrapss commented on GitHub (Feb 26, 2024):

Maybe I am confused but I am not sure I understand how embedding works with ollama. Usually, the embedding model is different to the chat model (i.e. intfloat/multilingual-e5-small vs. GPT4), so I am confused what ollama is doing when we hit the endpoint /embedding with the model mistral (is it bert, nomic-embed, something else?) Also, I did not find any example related to embedding on https://ollama.com/library - is there any specilized model for embedding or documentation to get one (such as intfloat/multilingual-e5-small)?

https://ollama.com/library?sort=newest

<!-- gh-comment-id:1964081359 --> @wrapss commented on GitHub (Feb 26, 2024): > Maybe I am confused but I am not sure I understand how embedding works with ollama. Usually, the embedding model is different to the chat model (i.e. intfloat/multilingual-e5-small vs. GPT4), so I am confused what ollama is doing when we hit the endpoint /embedding with the model mistral (is it bert, nomic-embed, something else?) Also, I did not find any example related to embedding on https://ollama.com/library - is there any specilized model for embedding or documentation to get one (such as intfloat/multilingual-e5-small)? https://ollama.com/library?sort=newest
Author
Owner

@wrapss commented on GitHub (Feb 26, 2024):

Maybe I am confused but I am not sure I understand how embedding works with ollama. Usually, the embedding model is different to the chat model (i.e. intfloat/multilingual-e5-small vs. GPT4), so I am confused what ollama is doing when we hit the endpoint /embedding with the model mistral (is it bert, nomic-embed, something else?) Also, I did not find any example related to embedding on https://ollama.com/library - is there any specilized model for embedding or documentation to get one (such as intfloat/multilingual-e5-small)?

and ollama uses llama.cpp for inference, so it's on their side that you'll need to check how it works.

<!-- gh-comment-id:1964086742 --> @wrapss commented on GitHub (Feb 26, 2024): > Maybe I am confused but I am not sure I understand how embedding works with ollama. Usually, the embedding model is different to the chat model (i.e. intfloat/multilingual-e5-small vs. GPT4), so I am confused what ollama is doing when we hit the endpoint /embedding with the model mistral (is it bert, nomic-embed, something else?) Also, I did not find any example related to embedding on https://ollama.com/library - is there any specilized model for embedding or documentation to get one (such as intfloat/multilingual-e5-small)? and ollama uses llama.cpp for inference, so it's on their side that you'll need to check how it works.
Author
Owner

@netroy commented on GitHub (Feb 29, 2024):

One of the top performing Embedding models (SFR-Embedding-Mistral) is available as GGUF on Huggingface.

This is how I imported it into Ollama:

  1. downloaded the q4_k_m file from here
  2. created a Modelfile with the text FROM ./ggml-sfr-embedding-mistral-q4_k_m.gguf
  3. ran ollama create sfr-embedding-mistral:q4_k_m -f Modelfile to import the model

Now I can use the embeddings endpoint with

curl http://localhost:11434/api/embeddings -d '{ "model": "sfr-embedding-mistral:q4_k_m", "prompt": "HERE GOES YOUR TEXT"}'
<!-- gh-comment-id:1971117232 --> @netroy commented on GitHub (Feb 29, 2024): One of the [top performing](https://huggingface.co/spaces/mteb/leaderboard) Embedding models (`SFR-Embedding-Mistral`) is [available as GGUF on Huggingface](https://huggingface.co/dranger003/SFR-Embedding-Mistral-GGUF). This is how I imported it into Ollama: 1. downloaded the `q4_k_m` file from [here](https://huggingface.co/dranger003/SFR-Embedding-Mistral-GGUF/blob/main/ggml-sfr-embedding-mistral-q4_k_m.gguf) 2. created a `Modelfile` with the text `FROM ./ggml-sfr-embedding-mistral-q4_k_m.gguf` 3. ran `ollama create sfr-embedding-mistral:q4_k_m -f Modelfile` to import the model Now I can use the embeddings endpoint with ``` curl http://localhost:11434/api/embeddings -d '{ "model": "sfr-embedding-mistral:q4_k_m", "prompt": "HERE GOES YOUR TEXT"}' ```
Author
Owner

@tolasing commented on GitHub (Mar 1, 2024):

@netroy can you share the template format in the modelfile for the (SFR-Embedding-Mistral)?.

<!-- gh-comment-id:1972595501 --> @tolasing commented on GitHub (Mar 1, 2024): @netroy can you share the template format in the modelfile for the (SFR-Embedding-Mistral)?.
Author
Owner

@netroy commented on GitHub (Mar 2, 2024):

@tolas92 I don't know if there is a template format for Embedding models. but I'm not familiar enough with this side of the things, so all I can say is that I have no answer for you.

<!-- gh-comment-id:1974876787 --> @netroy commented on GitHub (Mar 2, 2024): @tolas92 I don't know if there is a template format for Embedding models. but I'm not familiar enough with this side of the things, so all I can say is that I have no answer for you.
Author
Owner

@timtensor commented on GitHub (Mar 9, 2024):

Is there a documentation to how to plug it in with RAG application .Right now i am trying to use hugging face to embed ,ollama to set the llm and it is a but messy at the moment . or maybe this is not possible ?

<!-- gh-comment-id:1986919640 --> @timtensor commented on GitHub (Mar 9, 2024): Is there a documentation to how to plug it in with RAG application .Right now i am trying to use hugging face to embed ,ollama to set the llm and it is a but messy at the moment . or maybe this is not possible ?
Author
Owner

@jmorganca commented on GitHub (Mar 12, 2024):

Hi there, multilingual-e5-small and other Bert architecture models should be supported by Ollama, you can import them by following https://github.com/jmorganca/ollama/blob/main/docs/import.md (you shouldn't need a TEMPLATE command for embedding models). Let me know if you hit any issues!

<!-- gh-comment-id:1990389892 --> @jmorganca commented on GitHub (Mar 12, 2024): Hi there, `multilingual-e5-small` and other Bert architecture models should be supported by Ollama, you can import them by following https://github.com/jmorganca/ollama/blob/main/docs/import.md (you shouldn't need a `TEMPLATE` command for embedding models). Let me know if you hit any issues!
Author
Owner

@longregen commented on GitHub (Apr 2, 2024):

@netroy can you share the template format in the modelfile for the (SFR-Embedding-Mistral)?.

Indeed, look at how the model requires a particular format "Instruct: {}\nQuery: {}" from the SFR-Embedding-Mistral model card

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery: {query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'
queries = [
    get_detailed_instruct(task, 'How to bake a chocolate cake'),
    get_detailed_instruct(task, 'Symptoms of the flu')
]
# No need to add instruction for retrieval documents
passages = [
    "To bake a delicious chocolate cake, you'll need the following ingredients: all-purpose flour, sugar, cocoa powder, baking powder, baking soda, salt, eggs, milk, vegetable oil, and vanilla extract. Start by preheating your oven to 350°F (175°C). In a mixing bowl, combine the dry ingredients (flour, sugar, cocoa powder, baking powder, baking soda, and salt). In a separate bowl, whisk together the wet ingredients (eggs, milk, vegetable oil, and vanilla extract). Gradually add the wet mixture to the dry ingredients, stirring until well combined. Pour the batter into a greased cake pan and bake for 30-35 minutes. Let it cool before frosting with your favorite chocolate frosting. Enjoy your homemade chocolate cake!",
    "The flu, or influenza, is an illness caused by influenza viruses. Common symptoms of the flu include a high fever, chills, cough, sore throat, runny or stuffy nose, body aches, headache, fatigue, and sometimes nausea and vomiting. These symptoms can come on suddenly and are usually more severe than the common cold. It's important to get plenty of rest, stay hydrated, and consult a healthcare professional if you suspect you have the flu. In some cases, antiviral medications can help alleviate symptoms and reduce the duration of the illness."
]

embeddings = model.encode(queries + passages)
scores = util.cos_sim(embeddings[:2], embeddings[2:]) * 100
print(scores.tolist())
<!-- gh-comment-id:2032622850 --> @longregen commented on GitHub (Apr 2, 2024): > @netroy can you share the template format in the modelfile for the (SFR-Embedding-Mistral)?. Indeed, look at how the model requires a particular format "Instruct: {}\nQuery: {}" from [the SFR-Embedding-Mistral model card](https://huggingface.co/Salesforce/SFR-Embedding-Mistral) ``` def get_detailed_instruct(task_description: str, query: str) -> str: return f'Instruct: {task_description}\nQuery: {query}' # Each query must come with a one-sentence instruction that describes the task task = 'Given a web search query, retrieve relevant passages that answer the query' queries = [ get_detailed_instruct(task, 'How to bake a chocolate cake'), get_detailed_instruct(task, 'Symptoms of the flu') ] # No need to add instruction for retrieval documents passages = [ "To bake a delicious chocolate cake, you'll need the following ingredients: all-purpose flour, sugar, cocoa powder, baking powder, baking soda, salt, eggs, milk, vegetable oil, and vanilla extract. Start by preheating your oven to 350°F (175°C). In a mixing bowl, combine the dry ingredients (flour, sugar, cocoa powder, baking powder, baking soda, and salt). In a separate bowl, whisk together the wet ingredients (eggs, milk, vegetable oil, and vanilla extract). Gradually add the wet mixture to the dry ingredients, stirring until well combined. Pour the batter into a greased cake pan and bake for 30-35 minutes. Let it cool before frosting with your favorite chocolate frosting. Enjoy your homemade chocolate cake!", "The flu, or influenza, is an illness caused by influenza viruses. Common symptoms of the flu include a high fever, chills, cough, sore throat, runny or stuffy nose, body aches, headache, fatigue, and sometimes nausea and vomiting. These symptoms can come on suddenly and are usually more severe than the common cold. It's important to get plenty of rest, stay hydrated, and consult a healthcare professional if you suspect you have the flu. In some cases, antiviral medications can help alleviate symptoms and reduce the duration of the illness." ] embeddings = model.encode(queries + passages) scores = util.cos_sim(embeddings[:2], embeddings[2:]) * 100 print(scores.tolist()) ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48150