[GH-ISSUE #7595] Inconsistent Embedding Results with Non-Power-of-Two Context Sizes #4843

Open
opened 2026-04-12 15:50:28 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @ItzCrazyKns on GitHub (Nov 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7595

What is the issue?

When using different context sizes (num_ctx) with the Ollama embedding model, I noticed big differences in the cosine similarity of the embeddings. Specifically, when I set the context size to a non-power-of-two (like 513), the similarity scores drop significantly compared to powers of two (like 512 or 1024). This suggests that the model might be optimized for powers of two, leading to inconsistent results with other values.

In contrast, other embedding providers like FastEmbed and Sentence Transformers produce stable results even with context sizes like 2^x + 1 (e.g., 513). The similarity between FastEmbed and Sentence Transformers embeddings is nearly perfect, regardless of context size, indicating that this issue seems specific to Ollama.

Steps to Reproduce

  1. Run the code below to generate embeddings with Ollama using different context sizes (512, 513, and 1024).
  2. Compare the cosine similarity of these embeddings with those from FastEmbed and Sentence Transformers.
  3. Observe that Ollama’s similarity scores vary a lot with non-power-of-two context sizes, while FastEmbed and Sentence Transformers stay consistent.

Code

from ollama import Client
from fastembed import TextEmbedding
from sentence_transformers import SentenceTransformer
import numpy as np

target_data = """Text data, should be something big."""

fe_nomic = TextEmbedding(model_name="nomic-ai/nomic-embed-text-v1.5", cache_dir="fastembed_cache")
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
ollama = Client(host='http://localhost:11434')

ollama512 = ollama.embed(
    model="nomic-embed-text:v1.5",
    truncate=True,
    options={
        "num_ctx": 512
    },
    input=target_data
)

ollama513 = ollama.embed(
    model="nomic-embed-text:v1.5",
    truncate=True,
    options={
        "num_ctx": 513
    },
    input=target_data
)

ollama1024 = ollama.embed(
    model="nomic-embed-text:v1.5",
    truncate=True,
    options={
        "num_ctx": 1024
    },
    input=target_data
)

ollama1 = np.array(ollama512["embeddings"][0])
ollama2 = np.array(ollama513["embeddings"][0])
ollama3 = np.array(ollama1024["embeddings"][0])
fe_embeddings = list(fe_nomic.embed([target_data]))[0]
embeddings = np.array(model.encode(target_data))

from numpy import dot
from numpy.linalg import norm

a = ollama1
b = ollama2
c = fe_embeddings
d = embeddings
f = ollama3

print("-------- Testing 2^9 ---------")
print(dot(a, c)/(norm(a)*norm(c)), "- FastEmbed vs Ollama 512")
print(dot(a, d)/(norm(a)*norm(d)), " - Sentence Transformers vs Ollama 512")
print("--------- Testing 2^9+1 ---------")
print(dot(b, c)/(norm(b)*norm(c)), "- FastEmbed vs Ollama 513")
print(dot(d, b)/(norm(d)*norm(b)), " - Sentence Transformers vs Ollama 513")
print("--------- Testing 2^10 ---------")
print(dot(f, c)/(norm(f)*norm(c)), "- FastEmbed vs Ollama 1024")
print(dot(f, d)/(norm(f)*norm(d)), "- Sentence Transformers vs Ollama 1024")
print("--------- Fastembed vs Sentence Transformers ---------")
print(dot(c, d)/(norm(c)*norm(d)), "- FastEmbed vs Sentence Transformers")

Results

Context Size Model Comparison Cosine Similarity
512 FastEmbed vs Ollama 0.9376883175384901
512 Sentence Transformers vs Ollama 0.9376883820976254
513 FastEmbed vs Ollama 0.4483180557634305
513 Sentence Transformers vs Ollama 0.4483179868322506
1024 FastEmbed vs Ollama 0.9136944500983835
1024 Sentence Transformers vs Ollama 0.9136943746759493
- FastEmbed vs Sentence Transformers 1.0000001

Observations

  • When using context sizes that are powers of two (512 and 1024), the cosine similarity between Ollama embeddings and other models is high, indicating consistent results.
  • For the non-power-of-two context size (513), the cosine similarity scores drop significantly, showing lower consistency.
  • FastEmbed and Sentence Transformers provide stable and nearly identical embeddings across all context sizes, including non-standard ones like 513. The cosine similarity between FastEmbed and Sentence Transformers embeddings is nearly 1.0, indicating perfect alignment.

Expected Behavior

The model should produce stable and consistent embeddings regardless of the context size, as long as the size is within reasonable limits. Non-standard context sizes (like 2^x + 1) should not lead to significantly different embeddings.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.4.1

Originally created by @ItzCrazyKns on GitHub (Nov 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7595 ### What is the issue? When using different context sizes (`num_ctx`) with the Ollama embedding model, I noticed big differences in the cosine similarity of the embeddings. Specifically, when I set the context size to a non-power-of-two (like 513), the similarity scores drop significantly compared to powers of two (like 512 or 1024). This suggests that the model might be optimized for powers of two, leading to inconsistent results with other values. In contrast, other embedding providers like FastEmbed and Sentence Transformers produce stable results even with context sizes like `2^x + 1` (e.g., 513). The similarity between FastEmbed and Sentence Transformers embeddings is nearly perfect, regardless of context size, indicating that this issue seems specific to Ollama. ### Steps to Reproduce 1. Run the code below to generate embeddings with Ollama using different context sizes (512, 513, and 1024). 2. Compare the cosine similarity of these embeddings with those from FastEmbed and Sentence Transformers. 3. Observe that Ollama’s similarity scores vary a lot with non-power-of-two context sizes, while FastEmbed and Sentence Transformers stay consistent. ### Code ```python from ollama import Client from fastembed import TextEmbedding from sentence_transformers import SentenceTransformer import numpy as np target_data = """Text data, should be something big.""" fe_nomic = TextEmbedding(model_name="nomic-ai/nomic-embed-text-v1.5", cache_dir="fastembed_cache") model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True) ollama = Client(host='http://localhost:11434') ollama512 = ollama.embed( model="nomic-embed-text:v1.5", truncate=True, options={ "num_ctx": 512 }, input=target_data ) ollama513 = ollama.embed( model="nomic-embed-text:v1.5", truncate=True, options={ "num_ctx": 513 }, input=target_data ) ollama1024 = ollama.embed( model="nomic-embed-text:v1.5", truncate=True, options={ "num_ctx": 1024 }, input=target_data ) ollama1 = np.array(ollama512["embeddings"][0]) ollama2 = np.array(ollama513["embeddings"][0]) ollama3 = np.array(ollama1024["embeddings"][0]) fe_embeddings = list(fe_nomic.embed([target_data]))[0] embeddings = np.array(model.encode(target_data)) from numpy import dot from numpy.linalg import norm a = ollama1 b = ollama2 c = fe_embeddings d = embeddings f = ollama3 print("-------- Testing 2^9 ---------") print(dot(a, c)/(norm(a)*norm(c)), "- FastEmbed vs Ollama 512") print(dot(a, d)/(norm(a)*norm(d)), " - Sentence Transformers vs Ollama 512") print("--------- Testing 2^9+1 ---------") print(dot(b, c)/(norm(b)*norm(c)), "- FastEmbed vs Ollama 513") print(dot(d, b)/(norm(d)*norm(b)), " - Sentence Transformers vs Ollama 513") print("--------- Testing 2^10 ---------") print(dot(f, c)/(norm(f)*norm(c)), "- FastEmbed vs Ollama 1024") print(dot(f, d)/(norm(f)*norm(d)), "- Sentence Transformers vs Ollama 1024") print("--------- Fastembed vs Sentence Transformers ---------") print(dot(c, d)/(norm(c)*norm(d)), "- FastEmbed vs Sentence Transformers") ``` ### Results | Context Size | Model Comparison | Cosine Similarity | |--------------|------------------------------------|--------------------| | 512 | FastEmbed vs Ollama | 0.9376883175384901 | | 512 | Sentence Transformers vs Ollama | 0.9376883820976254 | | 513 | FastEmbed vs Ollama | 0.4483180557634305 | | 513 | Sentence Transformers vs Ollama | 0.4483179868322506 | | 1024 | FastEmbed vs Ollama | 0.9136944500983835 | | 1024 | Sentence Transformers vs Ollama | 0.9136943746759493 | | - | FastEmbed vs Sentence Transformers | 1.0000001 | ### Observations - When using context sizes that are powers of two (512 and 1024), the cosine similarity between Ollama embeddings and other models is high, indicating consistent results. - For the non-power-of-two context size (513), the cosine similarity scores drop significantly, showing lower consistency. - FastEmbed and Sentence Transformers provide stable and nearly identical embeddings across all context sizes, including non-standard ones like 513. The cosine similarity between FastEmbed and Sentence Transformers embeddings is nearly 1.0, indicating perfect alignment. ### Expected Behavior The model should produce stable and consistent embeddings regardless of the context size, as long as the size is within reasonable limits. Non-standard context sizes (like `2^x + 1`) should not lead to significantly different embeddings. ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.1
GiteaMirror added the bug label 2026-04-12 15:50:28 -05:00
Author
Owner

@ItzCrazyKns commented on GitHub (Nov 10, 2024):

Only seems to happen with Nomic embed text v1.5

<!-- gh-comment-id:2466796089 --> @ItzCrazyKns commented on GitHub (Nov 10, 2024): Only seems to happen with Nomic embed text v1.5
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4843