[GH-ISSUE #7085] Embedding discrepancies vs SentenceTransformers and between Ollama Versions #66554

Open
opened 2026-05-04 07:24:49 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @nicksrusso on GitHub (Oct 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7085

What is the issue?

I'm working on a RAG app, I'd like to simplify my stack by using ollama instead of sentence transformers but I'm observing some odd behavior from ollama embeddings. Here's what I'm seeing:

  1. Output embeddings don't match sentence transformers
  2. Hitting the "embd" endpoint from the api docs here returns different results than the "embeddings" endpoint described in the blog here
  3. Embeddings returned by different versions of ollama are different.

To test this, I embedded the same prompt with sentence transformers, ollama/embed, and ollama/embeddings. I printed out the first 5 elements in the output vector to inspect, and calculated a cosine similarity matrix between each pair of embedding methods (results in a symmetrical matrix with 1s along the diagonal). I repeated this using all-mini-lm and nomic-embed-text. Here's the code and outputs for ollama 0.3.3 and 0.3.12

code

ollama033_output

ollama_0312

The embeddings are pretty clearly different, both when compared to sentence transformers and between ollama 0.3.3 and 0.3.12.

The cosine similarity between ollama and sentence transformers is pretty close though. I also created a 3500 chunk vectorstore, once with ollama and once with sentence transformers. When I query them, I get the same chunks returned albeit in a different order. That makes me think every vector that is getting embedded is getting transformed somehow (when compared to sentence transformers), but its happening to all of them more or less equally.

Is this expected behavior? If so, is there anywhere I can read up on what ollama is doing under the hood?

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.3.3, 0.3.12

Originally created by @nicksrusso on GitHub (Oct 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7085 ### What is the issue? I'm working on a RAG app, I'd like to simplify my stack by using ollama instead of sentence transformers but I'm observing some odd behavior from ollama embeddings. Here's what I'm seeing: 1. Output embeddings don't match sentence transformers 2. Hitting the "embd" endpoint from the api docs [here](https://github.com/ollama/ollama/blob/79d3b1e2bdfc97542a7259b0c839520d39578514/docs/api.md#generate-embeddings) returns different results than the "embeddings" endpoint described in the blog [here](https://ollama.com/blog/embedding-models) 3. Embeddings returned by different versions of ollama are different. To test this, I embedded the same prompt with sentence transformers, ollama/embed, and ollama/embeddings. I printed out the first 5 elements in the output vector to inspect, and calculated a cosine similarity matrix between each pair of embedding methods (results in a symmetrical matrix with 1s along the diagonal). I repeated this using all-mini-lm and nomic-embed-text. Here's the code and outputs for ollama 0.3.3 and 0.3.12 ![code](https://github.com/user-attachments/assets/f0d3620f-4511-49d9-8e87-10a520bbeb62) ![ollama033_output](https://github.com/user-attachments/assets/1870367b-85b6-4fb7-9f07-adfaf80e7cba) ![ollama_0312](https://github.com/user-attachments/assets/81d61fcb-9749-4ed5-ba4a-ac4959587567) The embeddings are pretty clearly different, both when compared to sentence transformers and between ollama 0.3.3 and 0.3.12. The cosine similarity between ollama and sentence transformers is pretty close though. I also created a 3500 chunk vectorstore, once with ollama and once with sentence transformers. When I query them, I get the same chunks returned albeit in a different order. That makes me think every vector that is getting embedded is getting transformed somehow (when compared to sentence transformers), but its happening to all of them more or less equally. Is this expected behavior? If so, is there anywhere I can read up on what ollama is doing under the hood? ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.3.3, 0.3.12
GiteaMirror added the bug label 2026-05-04 07:24:49 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 3, 2024):

I have no input to the rest of your document, but /api/embeddings is deprecated.

<!-- gh-comment-id:2391514065 --> @rick-github commented on GitHub (Oct 3, 2024): I have no input to the rest of your document, but `/api/embeddings` [is deprecated](https://github.com/ollama/ollama/blob/main/docs/api.md#:~:text=Note%3A%20this%20endpoint%20has%20been%20superseded%20by%20/api/embed).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66554