[GH-ISSUE #11856] The result of ollama bge-m3 vector is different from vllm and tei #7870

Open
opened 2026-04-12 20:01:51 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @mpc-killer on GitHub (Aug 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11856

What is the issue?

When starting BGE-M3 with OLLAMA, the vector generated by 513 token will suddenly change, and the cosine value of the vector generated by 513 token and the vector generated by 512 token is only about 0.1. Even though the 513rd token is a meaningless symbol character.

Similar phenomena also occur at 1025 (512 * 2+1) and 1537 (512 * 3+1) tokens.

You can reproduce this phenomenon with the following java code

// Any string of about 3000 tokens
String fullText = "..."; 

int currentPosition = 1;

List<Double> lastEmbedding = getOllamaEmbedding("");

while(currentPosition < fullText.length()){

  String thisText = fullText.substring(0, currentPosition);

  List<Double> thisEmbedding = getOllamaEmbedding(thisText);

  // when currentPosition at 513 token, 1025 token, 1537 token position
  // the cosineValue will suddenly change from 0.99+ to about 0.1
  double cosineValue = calculateCosineValue(thisEmbedding, lastEmbedding);

  lastEmbedding = thisEmbedding;
  currentPosition = currentPosition + 1;
}

However, the above phenomenon does not occur when using vllm or text embedding interface. Has ollama made any special treatment for bge-m3? No relevant clues have been found on relevant websites

This problem will cause the vector cosine values generated by text with 0-512 tokens and text with 513-1024 tokens to be extremely low, even if the two texts are highly semantically related.

Relevant log output


OS

No response

GPU

GTX 1080 Ti, RTX 3060

CPU

No response

Ollama version

0.6.6

Originally created by @mpc-killer on GitHub (Aug 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11856 ### What is the issue? When starting BGE-M3 with OLLAMA, the vector generated by 513 token will suddenly change, and the cosine value of the vector generated by 513 token and the vector generated by 512 token is only about 0.1. Even though the 513rd token is a meaningless symbol character. Similar phenomena also occur at 1025 (512 * 2+1) and 1537 (512 * 3+1) tokens. You can reproduce this phenomenon with the following java code ```java // Any string of about 3000 tokens String fullText = "..."; int currentPosition = 1; List<Double> lastEmbedding = getOllamaEmbedding(""); while(currentPosition < fullText.length()){ String thisText = fullText.substring(0, currentPosition); List<Double> thisEmbedding = getOllamaEmbedding(thisText); // when currentPosition at 513 token, 1025 token, 1537 token position // the cosineValue will suddenly change from 0.99+ to about 0.1 double cosineValue = calculateCosineValue(thisEmbedding, lastEmbedding); lastEmbedding = thisEmbedding; currentPosition = currentPosition + 1; } ``` However, the above phenomenon does not occur when using vllm or text embedding interface. Has ollama made any special treatment for bge-m3? No relevant clues have been found on relevant websites This problem will cause the vector cosine values generated by text with 0-512 tokens and text with 513-1024 tokens to be extremely low, even if the two texts are highly semantically related. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU GTX 1080 Ti, RTX 3060 ### CPU _No response_ ### Ollama version 0.6.6
GiteaMirror added the bug label 2026-04-12 20:01:51 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7870