[GH-ISSUE #327] Embedding model support #142

Closed
opened 2026-04-12 09:40:23 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @jmorganca on GitHub (Aug 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/327

Originally assigned to: @jmorganca on GitHub.

Add embedding models to use primarily with /api/embeddings

  • instructor-xl
  • bge-large
  • all-MiniLM-L6-v2

See the full leaderboard

Originally created by @jmorganca on GitHub (Aug 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/327 Originally assigned to: @jmorganca on GitHub. Add embedding models to use primarily with `/api/embeddings` * `instructor-xl` * `bge-large` * `all-MiniLM-L6-v2` See the full [leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
GiteaMirror added the feature requestmodel labels 2026-04-12 09:40:23 -05:00
Author
Owner

@brunnolou commented on GitHub (Oct 18, 2023):

Yes, please! Any of these embedding models above text-embedding-ada-002 would be a great addition.

I've tried LLam2 and Mistral model with the /api/embeddings as is, and I'm getting poor-quality similarity scores.
Even with almost identical queries, It fails to retrieve results. Are there some prompting technics to improve the embedding quality?

Anyway, in comparison, I've tried Xenova/gte-small with transformers and it is much faster and yields better results.

<!-- gh-comment-id:1768363892 --> @brunnolou commented on GitHub (Oct 18, 2023): Yes, please! Any of these embedding models above `text-embedding-ada-002` would be a great addition. I've tried LLam2 and Mistral model with the `/api/embeddings` as is, and I'm getting poor-quality similarity scores. Even with almost identical queries, It fails to retrieve results. Are there some prompting technics to improve the embedding quality? Anyway, in comparison, I've tried [Xenova/gte-small](https://huggingface.co/Xenova/gte-small) with transformers and it is much faster and yields better results.
Author
Owner

@corani commented on GitHub (Nov 17, 2023):

jinaai/jina-embeddings-v2-base-en (and other variants) also look promising.

<!-- gh-comment-id:1816078155 --> @corani commented on GitHub (Nov 17, 2023): `jinaai/jina-embeddings-v2-base-en` (and other variants) also look promising.
Author
Owner

@sandangel commented on GitHub (Dec 7, 2023):

Hi, is there an update on this issue? I would love to contribute

<!-- gh-comment-id:1844401118 --> @sandangel commented on GitHub (Dec 7, 2023): Hi, is there an update on this issue? I would love to contribute
Author
Owner

@corani commented on GitHub (Dec 8, 2023):

I've been playing with https://github.com/nlpodyssey/cybertron which is pure Go (but I guess CPU only?) and at least supports all-MiniLM-L6-v2, e5-*-v2, bge-*-en-v1.5 and ember-v1.

I did some testing with the STS-2016 dataset and got the below accuracies compared to llama2 and mistral:instruct (Pearson correlation with the gold answers):

  • Ollama
    • llama2: 0.23431
    • mistral:instruct: 0.5656
  • Cybertron
    • all-MiniLM-L6-v2: 0.80344
    • e5-small-v2: 0.82318
    • e5-base-v2: 0.83845
    • bge-small-en-v1.5: 0.84514
    • bge-base-en-v1.5: 0.85297

So I agree with the previous comment that the embeddings generated by the completion models are pretty bad!

<!-- gh-comment-id:1846841081 --> @corani commented on GitHub (Dec 8, 2023): I've been playing with https://github.com/nlpodyssey/cybertron which is pure Go (but I guess CPU only?) and at least supports `all-MiniLM-L6-v2`, `e5-*-v2`, `bge-*-en-v1.5` and `ember-v1`. I did some testing with the [STS-2016](https://alt.qcri.org/semeval2016/task1/) dataset and got the below accuracies compared to `llama2` and `mistral:instruct` (Pearson correlation with the gold answers): - Ollama - `llama2`: 0.23431 - `mistral:instruct`: 0.5656 - Cybertron - `all-MiniLM-L6-v2`: 0.80344 - `e5-small-v2`: 0.82318 - `e5-base-v2`: 0.83845 - `bge-small-en-v1.5`: 0.84514 - `bge-base-en-v1.5`: 0.85297 So I agree with the previous comment that the embeddings generated by the completion models are pretty bad!
Author
Owner

@sandangel commented on GitHub (Dec 8, 2023):

That is interesting. For GPU support, I guess we will need to use: https://github.com/skeskinen/bert.cpp ?
I think the implementation would be something similar to llama.cpp?

<!-- gh-comment-id:1846914774 --> @sandangel commented on GitHub (Dec 8, 2023): That is interesting. For GPU support, I guess we will need to use: https://github.com/skeskinen/bert.cpp ? I think the implementation would be something similar to llama.cpp?
Author
Owner

@sandangel commented on GitHub (Dec 8, 2023):

I found this issue: https://github.com/ggerganov/llama.cpp/issues/2872
I think they plan to implement it in llama.cpp. So maybe we will just need to update the llama.cpp when it's done?

<!-- gh-comment-id:1846988582 --> @sandangel commented on GitHub (Dec 8, 2023): I found this issue: https://github.com/ggerganov/llama.cpp/issues/2872 I think they plan to implement it in llama.cpp. So maybe we will just need to update the llama.cpp when it's done?
Author
Owner

@sandangel commented on GitHub (Dec 11, 2023):

I also found this https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md, which we can use to run inference on M1 mac. Is it possible to support mlx with Ollama?

<!-- gh-comment-id:1849906676 --> @sandangel commented on GitHub (Dec 11, 2023): I also found this https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md, which we can use to run inference on M1 mac. Is it possible to support `mlx` with Ollama?
Author
Owner

@CodeWithKyrian commented on GitHub (Dec 29, 2023):

Any update on this or plan to allow Bert Models?

<!-- gh-comment-id:1872103263 --> @CodeWithKyrian commented on GitHub (Dec 29, 2023): Any update on this or plan to allow Bert Models?
Author
Owner

@tjohnson4 commented on GitHub (Jan 20, 2024):

Any update on this issue?

<!-- gh-comment-id:1901817912 --> @tjohnson4 commented on GitHub (Jan 20, 2024): Any update on this issue?
Author
Owner

@ymohamed08 commented on GitHub (Jan 20, 2024):

Do you have any updates so far? very interested to contribute

<!-- gh-comment-id:1902256534 --> @ymohamed08 commented on GitHub (Jan 20, 2024): Do you have any updates so far? very interested to contribute
Author
Owner

@ill-yes commented on GitHub (Feb 2, 2024):

Any updates here?

<!-- gh-comment-id:1923593838 --> @ill-yes commented on GitHub (Feb 2, 2024): Any updates here?
Author
Owner

@easp commented on GitHub (Feb 3, 2024):

Plans to support BERT models in llama.cpp stalled out when the dev who had assumed the task ended up focusing on something else. In the last few days it looks like the project management artifacts were updated to acknowledge this, so maybe there will be some action soon. Actually, it looks like there has been some activity. Maybe there will be working code soon:
https://github.com/ggerganov/llama.cpp/issues/2872

<!-- gh-comment-id:1925439708 --> @easp commented on GitHub (Feb 3, 2024): Plans to support BERT models in llama.cpp stalled out when the dev who had assumed the task ended up focusing on something else. In the last few days it looks like the project management artifacts were updated to acknowledge this, so maybe there will be some action soon. Actually, it looks like there has been some activity. Maybe there will be working code soon: https://github.com/ggerganov/llama.cpp/issues/2872
Author
Owner

@AndreBerzun commented on GitHub (Feb 15, 2024):

BERT support was merged 3 days ago into llama.cpp

<!-- gh-comment-id:1945203922 --> @AndreBerzun commented on GitHub (Feb 15, 2024): BERT support was [merged](https://github.com/ggerganov/llama.cpp/pull/5423) 3 days ago into llama.cpp
Author
Owner

@easp commented on GitHub (Feb 15, 2024):

Looks like there are still kinks being worked out.

<!-- gh-comment-id:1945443776 --> @easp commented on GitHub (Feb 15, 2024): Looks like there are still kinks being worked out.
Author
Owner

@s-kostyaev commented on GitHub (Feb 15, 2024):

Looks like there are still kinks being worked out.

Link to check the progress https://github.com/ggerganov/llama.cpp/pull/5500

<!-- gh-comment-id:1945511760 --> @s-kostyaev commented on GitHub (Feb 15, 2024): > Looks like there are still kinks being worked out. Link to check the progress https://github.com/ggerganov/llama.cpp/pull/5500
Author
Owner

@s-kostyaev commented on GitHub (Feb 15, 2024):

Looks like there are still kinks being worked out.

Link to check the progress ggerganov/llama.cpp#5500

It is merged now

<!-- gh-comment-id:1946673048 --> @s-kostyaev commented on GitHub (Feb 15, 2024): > > Looks like there are still kinks being worked out. > > Link to check the progress [ggerganov/llama.cpp#5500](https://github.com/ggerganov/llama.cpp/pull/5500) It is merged now
Author
Owner

@AndreBerzun commented on GitHub (Feb 19, 2024):

@jmorganca just wanted to follow up and see if this topic is on your roadmap. Since llama.cpp added support for BERT models, this seems like a great low-hanging fruit, no?

Initial support for BERT models has been merged with ggerganov/llama.cpp#5423 and released with b2127. Some kinks related to embedding pooling were fixed with ggerganov/llama.cpp#5500. Batch embedding is supported as well.

There has been a new bug related to the tokenizer implementation but that's it as far as I can tell.

<!-- gh-comment-id:1953250510 --> @AndreBerzun commented on GitHub (Feb 19, 2024): @jmorganca just wanted to follow up and see if this topic is on your roadmap. Since llama.cpp added support for BERT models, this seems like a great low-hanging fruit, no? Initial support for BERT models has been merged with [ggerganov/llama.cpp#5423](https://github.com/ggerganov/llama.cpp/pull/5423) and released with [b2127](https://github.com/ggerganov/llama.cpp/releases/tag/b2127). Some kinks related to embedding pooling were fixed with [ggerganov/llama.cpp#5500](https://github.com/ggerganov/llama.cpp/pull/5500). [Batch embedding](https://github.com/ggerganov/llama.cpp/pull/5466) is supported as well. There has been a new bug related to the [tokenizer implementation](https://github.com/ggerganov/llama.cpp/issues/5496) but that's it as [far as I can tell](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+bert).
Author
Owner

@jmorganca commented on GitHub (Feb 20, 2024):

@AndreBerzun it absolutely is – working on it!

<!-- gh-comment-id:1953296819 --> @jmorganca commented on GitHub (Feb 20, 2024): @AndreBerzun it absolutely is – working on it!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#142