[GH-ISSUE #6262] Batch embeddings get progressively worse with larger batches #65957

New Issue

GiteaMirror · 2026-05-03T23:20:30-05:00

GiteaMirror commented

2026-05-03 23:20:30 -05:00

Originally created by @jorgetrejo36 on GitHub (Aug 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6262

What is the issue?

I am using the ollama Python library for all the results I am getting.

As I create embeddings using ollama.embed() I get progressively worse embeddings as the batches are larger. This is compared against creating embeddings one at a time. There seems to be a jump that happens at batch sizes of 16 or larger. All of my tests assume that I am getting the embeddings back in the same order given I submitted an issue not too long ago that was resolved (#6187).

It is imperative that these embeddings be accurate given I am using them for a RAG app and retrieval needs to be good for all inserted embeddings.

I ran function "chunk_text" with text from peter pan (https://www.gutenberg.org/files/16/16-h/16-h.htm), chunk_size = 256, max_characters of 65536 (256 chunks with 256 characters each).

I ran the function "test" with chunks from the above function call and a batch_size_list of [2, 4, 8, 16, 32, 64, 128, 256].

Below all the code are results and some plots of the results.

import ollama
import numpy as np
import os
from typing import List
from dotenv import load_dotenv
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt

load_dotenv()

# Embedding model used was "bge-large:latest"
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL")
EPS=1e-4

def chunk_text(text: str, chunk_size: int, max_characters: int) -> List[str]:
    chunks = []
    for i in range(0, len(text) if len(text) < max_characters else max_characters, chunk_size):
        chunk = text[i:i + chunk_size]
        chunks.append(chunk)
    return chunks

# Used first few chapters of Peter Pan
text = ""

chunk_size = 256
# 256 is the max batch size that is defined later
chunks = chunk_text(text, chunk_size, chunk_size * 256)

def embed_string(s: str) -> np.ndarray:
    return np.array(ollama.embed(
        input=s,
        model=EMBEDDING_MODEL,
        options={
            
        },
        truncate=False
    )["embeddings"])[0]

def embed_list(s: List[str]) -> np.ndarray:
    return np.array(ollama.embed(
        input=s,
        model=EMBEDDING_MODEL,
        options={
            
        },
        truncate=False
    )["embeddings"])

def test(list_of_string: List[str], batch_sizes: List[int]) -> bool:
    avg_distances = []
    avg_similarites = []

    max_distances = []
    min_similarities = []

    for batch_size in batch_sizes:
        print(f"Results for batch size: {batch_size}")
        singles = np.array([embed_string(s) for s in list_of_string[:batch_size]])
        as_list = embed_list(list_of_string[:batch_size])
    
        distances = []
        for single_embedding, as_list_embedding in zip(singles, as_list):
            distance = np.sqrt(((single_embedding - as_list_embedding) ** 2).sum())
            distances.append(distance)

        distances = np.array(distances)

        mean = np.mean(distances)
        max = np.max(distances)

        avg_distances.append(mean)
        max_distances.append(max)

        print("Euclidean Distance:")
        print(f"\tMean of euclidean distances: {mean}")
        print(f"\tMax euclidean distance: {max}")
        
        # Cosine similarity
        similarities = []
        for single_embedding, as_list_embedding in zip(singles, as_list):
            vector1 = single_embedding.reshape(1, -1)
            vector2 = as_list_embedding.reshape(1, -1)
            similarity = cosine_similarity(vector1, vector2)
            similarities.append(similarity)


        similarities = np.array(similarities)

        mean = np.mean(similarities)  
        min = np.min(similarities)

        avg_similarites.append(mean)
        min_similarities.append(min)

        print("Cosine Similarity:")
        print(f"\tMean of cosine similarites: {mean}")
        print(f"\tMin cosine similarity: {min}")

        print("==========================================================")

    return (batch_sizes, avg_distances, avg_similarites, max_distances, min_similarities)

batch_sizes_list = [2**i for i in range(1, 9)]
batch_sizes, avg_distances, avg_similarities, max_distances, min_similarities = test(chunks, batch_sizes_list)

RESULTS:

Results for batch size: 2
Euclidean Distance:
	Mean of euclidean distances: 0.0027100650691554615
	Max euclidean distance: 0.003069791207141852
Cosine Similarity:
	Mean of cosine similarites: 0.999996263071194
	Min cosine similarity: 0.99999528818776
==========================================================
Results for batch size: 4
Euclidean Distance:
	Mean of euclidean distances: 0.002698965850379388
	Max euclidean distance: 0.0032083663351101474
Cosine Similarity:
	Mean of cosine similarites: 0.9999962901587796
	Min cosine similarity: 0.9999948531925777
==========================================================
Results for batch size: 8
Euclidean Distance:
	Mean of euclidean distances: 0.003292175370207458
	Max euclidean distance: 0.0038060000679778546
Cosine Similarity:
	Mean of cosine similarites: 0.9999945197318343
	Min cosine similarity: 0.999992757181494
==========================================================
Results for batch size: 16
Euclidean Distance:
	Mean of euclidean distances: 0.11461230989305338
	Max euclidean distance: 1.136748198080119
Cosine Similarity:
	Mean of cosine similarites: 0.946342128810411
	Min cosine similarity: 0.35390177365096614
==========================================================
Results for batch size: 32
Euclidean Distance:
	Mean of euclidean distances: 0.08102131219835153
	Max euclidean distance: 0.8772319282635773
Cosine Similarity:
	Mean of cosine similarites: 0.9674320902167836
	Min cosine similarity: 0.6152323203539565
==========================================================
Results for batch size: 64
Euclidean Distance:
	Mean of euclidean distances: 0.09294858544026222
	Max euclidean distance: 1.1095371609913112
Cosine Similarity:
	Mean of cosine similarites: 0.960566610535093
	Min cosine similarity: 0.38446375093677954
==========================================================
Results for batch size: 128
Euclidean Distance:
	Mean of euclidean distances: 0.08298749768139443
	Max euclidean distance: 0.9481241092937398
Cosine Similarity:
	Mean of cosine similarites: 0.9696922749059266
	Min cosine similarity: 0.5505303906957912
==========================================================
Results for batch size: 256
Euclidean Distance:
	Mean of euclidean distances: 0.08726932907397295
	Max euclidean distance: 1.0992560821737951
Cosine Similarity:
	Mean of cosine similarites: 0.966701897336651
	Min cosine similarity: 0.39581805237428414
==========================================================

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.3.4

Originally created by @jorgetrejo36 on GitHub (Aug 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6262 ### What is the issue? I am using the ollama Python library for all the results I am getting. As I create embeddings using ollama.embed() I get progressively worse embeddings as the batches are larger. This is compared against creating embeddings one at a time. There seems to be a jump that happens at batch sizes of 16 or larger. All of my tests assume that I am getting the embeddings back in the same order given I submitted an issue not too long ago that was resolved (#6187). It is imperative that these embeddings be accurate given I am using them for a RAG app and retrieval needs to be good for all inserted embeddings. I ran function "chunk_text" with text from peter pan ([https://www.gutenberg.org/files/16/16-h/16-h.htm](url)), chunk_size = 256, max_characters of 65536 (256 chunks with 256 characters each). I ran the function "test" with chunks from the above function call and a batch_size_list of [2, 4, 8, 16, 32, 64, 128, 256]. Below all the code are results and some plots of the results. ``` import ollama import numpy as np import os from typing import List from dotenv import load_dotenv from sklearn.metrics.pairwise import cosine_similarity import matplotlib.pyplot as plt load_dotenv() # Embedding model used was "bge-large:latest" EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL") EPS=1e-4 def chunk_text(text: str, chunk_size: int, max_characters: int) -> List[str]: chunks = [] for i in range(0, len(text) if len(text) < max_characters else max_characters, chunk_size): chunk = text[i:i + chunk_size] chunks.append(chunk) return chunks # Used first few chapters of Peter Pan text = "" chunk_size = 256 # 256 is the max batch size that is defined later chunks = chunk_text(text, chunk_size, chunk_size * 256) def embed_string(s: str) -> np.ndarray: return np.array(ollama.embed( input=s, model=EMBEDDING_MODEL, options={ }, truncate=False )["embeddings"])[0] def embed_list(s: List[str]) -> np.ndarray: return np.array(ollama.embed( input=s, model=EMBEDDING_MODEL, options={ }, truncate=False )["embeddings"]) def test(list_of_string: List[str], batch_sizes: List[int]) -> bool: avg_distances = [] avg_similarites = [] max_distances = [] min_similarities = [] for batch_size in batch_sizes: print(f"Results for batch size: {batch_size}") singles = np.array([embed_string(s) for s in list_of_string[:batch_size]]) as_list = embed_list(list_of_string[:batch_size]) distances = [] for single_embedding, as_list_embedding in zip(singles, as_list): distance = np.sqrt(((single_embedding - as_list_embedding) ** 2).sum()) distances.append(distance) distances = np.array(distances) mean = np.mean(distances) max = np.max(distances) avg_distances.append(mean) max_distances.append(max) print("Euclidean Distance:") print(f"\tMean of euclidean distances: {mean}") print(f"\tMax euclidean distance: {max}") # Cosine similarity similarities = [] for single_embedding, as_list_embedding in zip(singles, as_list): vector1 = single_embedding.reshape(1, -1) vector2 = as_list_embedding.reshape(1, -1) similarity = cosine_similarity(vector1, vector2) similarities.append(similarity) similarities = np.array(similarities) mean = np.mean(similarities) min = np.min(similarities) avg_similarites.append(mean) min_similarities.append(min) print("Cosine Similarity:") print(f"\tMean of cosine similarites: {mean}") print(f"\tMin cosine similarity: {min}") print("==========================================================") return (batch_sizes, avg_distances, avg_similarites, max_distances, min_similarities) batch_sizes_list = [2**i for i in range(1, 9)] batch_sizes, avg_distances, avg_similarities, max_distances, min_similarities = test(chunks, batch_sizes_list) ``` RESULTS: ``` Results for batch size: 2 Euclidean Distance: Mean of euclidean distances: 0.0027100650691554615 Max euclidean distance: 0.003069791207141852 Cosine Similarity: Mean of cosine similarites: 0.999996263071194 Min cosine similarity: 0.99999528818776 ========================================================== Results for batch size: 4 Euclidean Distance: Mean of euclidean distances: 0.002698965850379388 Max euclidean distance: 0.0032083663351101474 Cosine Similarity: Mean of cosine similarites: 0.9999962901587796 Min cosine similarity: 0.9999948531925777 ========================================================== Results for batch size: 8 Euclidean Distance: Mean of euclidean distances: 0.003292175370207458 Max euclidean distance: 0.0038060000679778546 Cosine Similarity: Mean of cosine similarites: 0.9999945197318343 Min cosine similarity: 0.999992757181494 ========================================================== Results for batch size: 16 Euclidean Distance: Mean of euclidean distances: 0.11461230989305338 Max euclidean distance: 1.136748198080119 Cosine Similarity: Mean of cosine similarites: 0.946342128810411 Min cosine similarity: 0.35390177365096614 ========================================================== Results for batch size: 32 Euclidean Distance: Mean of euclidean distances: 0.08102131219835153 Max euclidean distance: 0.8772319282635773 Cosine Similarity: Mean of cosine similarites: 0.9674320902167836 Min cosine similarity: 0.6152323203539565 ========================================================== Results for batch size: 64 Euclidean Distance: Mean of euclidean distances: 0.09294858544026222 Max euclidean distance: 1.1095371609913112 Cosine Similarity: Mean of cosine similarites: 0.960566610535093 Min cosine similarity: 0.38446375093677954 ========================================================== Results for batch size: 128 Euclidean Distance: Mean of euclidean distances: 0.08298749768139443 Max euclidean distance: 0.9481241092937398 Cosine Similarity: Mean of cosine similarites: 0.9696922749059266 Min cosine similarity: 0.5505303906957912 ========================================================== Results for batch size: 256 Euclidean Distance: Mean of euclidean distances: 0.08726932907397295 Max euclidean distance: 1.0992560821737951 Cosine Similarity: Mean of cosine similarites: 0.966701897336651 Min cosine similarity: 0.39581805237428414 ========================================================== ``` ![batch_size_vs_max_euclidean_distance](https://github.com/user-attachments/assets/d82d7e56-44fd-4066-b697-dcf158a23fa8) ![batch_size_vs_min_cosine_similarity](https://github.com/user-attachments/assets/489a45c5-ed08-4b98-bece-9512ed27d7b4) ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.3.4

GiteaMirror added the bug label 2026-05-03 23:20:30 -05:00

GiteaMirror commented

2026-05-03 23:20:34 -05:00

@igorschlum commented on GitHub (Aug 8, 2024):

hi @jorgetrejo36 I would like to run you code to see if I can replicate the issue on MacOS, but some pieces are missing. Can you provide them?

@igorschlum commented on GitHub (Aug 8, 2024): hi @jorgetrejo36 I would like to run you code to see if I can replicate the issue on MacOS, but some pieces are missing. Can you provide them?

GiteaMirror commented

2026-05-03 23:20:36 -05:00

@jmorganca commented on GitHub (Aug 8, 2024):

Sorry about this - looking into it

@jmorganca commented on GitHub (Aug 8, 2024): Sorry about this - looking into it

GiteaMirror commented

2026-05-03 23:20:38 -05:00

@rick-github commented on GitHub (Aug 8, 2024):

Which model are you using? I was unable to replicate with

  "nomic-embed-text:latest",
  "paraphrase-multilingual:latest",
  "snowflake-arctic-embed:latest",
  "mxbai-embed-large:latest",
  "bge-large:latest",
  "all-minilm:l12",

@rick-github commented on GitHub (Aug 8, 2024): Which model are you using? I was unable to replicate with ``` "nomic-embed-text:latest", "paraphrase-multilingual:latest", "snowflake-arctic-embed:latest", "mxbai-embed-large:latest", "bge-large:latest", "all-minilm:l12", ```

GiteaMirror commented

2026-05-03 23:20:42 -05:00

@jorgetrejo36 commented on GitHub (Aug 9, 2024):

I am using bge-large:latest

@jorgetrejo36 commented on GitHub (Aug 9, 2024): I am using bge-large:latest

GiteaMirror commented

2026-05-03 23:20:46 -05:00

@royjhan commented on GitHub (Aug 12, 2024):

@jorgetrejo36 is the problem still persisting?

@royjhan commented on GitHub (Aug 12, 2024): @jorgetrejo36 is the problem still persisting?

GiteaMirror commented

2026-05-03 23:20:50 -05:00

@jorgetrejo36 commented on GitHub (Aug 13, 2024):

@royjhan I just ran the tests again and I got similar results. I revised the code above with the actual function calls I made so you or anyone else should just be able to copy the code and immediately run it and see what I see. Does running it on your own machines result in zero discrepancies or what is it looking like? I'm just curious because maybe there's something wrong on my side that I'm not noticing.

@jorgetrejo36 commented on GitHub (Aug 13, 2024): @royjhan I just ran the tests again and I got similar results. I revised the code above with the actual function calls I made so you or anyone else should just be able to copy the code and immediately run it and see what I see. Does running it on your own machines result in zero discrepancies or what is it looking like? I'm just curious because maybe there's something wrong on my side that I'm not noticing.

GiteaMirror commented

2026-05-03 23:20:51 -05:00

@rick-github commented on GitHub (Aug 13, 2024):

I did wget https://www.gutenberg.org/files/16/16-h/16-h.htm, made this change

--- 6262.py.orig	2024-08-13 18:24:05.560356363 +0200
+++ 6262.py	2024-08-13 18:24:36.903051485 +0200
@@ -21,6 +21,8 @@
 
 # Used first few chapters of Peter Pan
 text = ""
+with open("16-h.htm") as fd:
+  text = fd.read()
 
 chunk_size = 256
 # 256 is the max batch size that is defined later

ran this command:

EMBEDDING_MODEL=bge-large:latest python3 ./6262.py

and got these results:

Results for batch size: 2
Euclidean Distance:
	Mean of euclidean distances: 0.0030663777330302523
	Max euclidean distance: 0.0036879363736636315
Cosine Similarity:
	Mean of cosine similarites: 0.999995105494726
	Min cosine similarity: 0.9999931995599575
==========================================================
Results for batch size: 4
Euclidean Distance:
	Mean of euclidean distances: 0.0027463073727389655
	Max euclidean distance: 0.00358716464884748
Cosine Similarity:
	Mean of cosine similarites: 0.9999960895006781
	Min cosine similarity: 0.9999935661230478
==========================================================
Results for batch size: 8
Euclidean Distance:
	Mean of euclidean distances: 0.0025482074992666114
	Max euclidean distance: 0.00358716464884748
Cosine Similarity:
	Mean of cosine similarites: 0.999996656158536
	Min cosine similarity: 0.9999935661230478
==========================================================
Results for batch size: 16
Euclidean Distance:
	Mean of euclidean distances: 0.0024955809009619316
	Max euclidean distance: 0.0034347715901393936
Cosine Similarity:
	Mean of cosine similarites: 0.9999968216936727
	Min cosine similarity: 0.9999941011717459
==========================================================
Results for batch size: 32
Euclidean Distance:
	Mean of euclidean distances: 0.0024666019674360485
	Max euclidean distance: 0.0034347715901393936
Cosine Similarity:
	Mean of cosine similarites: 0.9999969092075001
	Min cosine similarity: 0.9999941011717459
==========================================================
Results for batch size: 64
Euclidean Distance:
	Mean of euclidean distances: 0.002441116828212457
	Max euclidean distance: 0.0034347715901393936
Cosine Similarity:
	Mean of cosine similarites: 0.9999969774363446
	Min cosine similarity: 0.9999941011717459
==========================================================
Results for batch size: 128
Euclidean Distance:
	Mean of euclidean distances: 0.0024016578407820323
	Max euclidean distance: 0.003220162033245072
Cosine Similarity:
	Mean of cosine similarites: 0.9999970830756011
	Min cosine similarity: 0.999994815278163
==========================================================
Results for batch size: 256
Euclidean Distance:
	Mean of euclidean distances: 0.002390881610128061
	Max euclidean distance: 0.00358716464884748
Cosine Similarity:
	Mean of cosine similarites: 0.9999971108082498
	Min cosine similarity: 0.9999935661230478
==========================================================

@rick-github commented on GitHub (Aug 13, 2024): I did `wget https://www.gutenberg.org/files/16/16-h/16-h.htm`, made this change ```diff --- 6262.py.orig 2024-08-13 18:24:05.560356363 +0200 +++ 6262.py 2024-08-13 18:24:36.903051485 +0200 @@ -21,6 +21,8 @@ # Used first few chapters of Peter Pan text = "" +with open("16-h.htm") as fd: + text = fd.read() chunk_size = 256 # 256 is the max batch size that is defined later ``` ran this command: ``` EMBEDDING_MODEL=bge-large:latest python3 ./6262.py ``` and got these results: ``` Results for batch size: 2 Euclidean Distance: Mean of euclidean distances: 0.0030663777330302523 Max euclidean distance: 0.0036879363736636315 Cosine Similarity: Mean of cosine similarites: 0.999995105494726 Min cosine similarity: 0.9999931995599575 ========================================================== Results for batch size: 4 Euclidean Distance: Mean of euclidean distances: 0.0027463073727389655 Max euclidean distance: 0.00358716464884748 Cosine Similarity: Mean of cosine similarites: 0.9999960895006781 Min cosine similarity: 0.9999935661230478 ========================================================== Results for batch size: 8 Euclidean Distance: Mean of euclidean distances: 0.0025482074992666114 Max euclidean distance: 0.00358716464884748 Cosine Similarity: Mean of cosine similarites: 0.999996656158536 Min cosine similarity: 0.9999935661230478 ========================================================== Results for batch size: 16 Euclidean Distance: Mean of euclidean distances: 0.0024955809009619316 Max euclidean distance: 0.0034347715901393936 Cosine Similarity: Mean of cosine similarites: 0.9999968216936727 Min cosine similarity: 0.9999941011717459 ========================================================== Results for batch size: 32 Euclidean Distance: Mean of euclidean distances: 0.0024666019674360485 Max euclidean distance: 0.0034347715901393936 Cosine Similarity: Mean of cosine similarites: 0.9999969092075001 Min cosine similarity: 0.9999941011717459 ========================================================== Results for batch size: 64 Euclidean Distance: Mean of euclidean distances: 0.002441116828212457 Max euclidean distance: 0.0034347715901393936 Cosine Similarity: Mean of cosine similarites: 0.9999969774363446 Min cosine similarity: 0.9999941011717459 ========================================================== Results for batch size: 128 Euclidean Distance: Mean of euclidean distances: 0.0024016578407820323 Max euclidean distance: 0.003220162033245072 Cosine Similarity: Mean of cosine similarites: 0.9999970830756011 Min cosine similarity: 0.999994815278163 ========================================================== Results for batch size: 256 Euclidean Distance: Mean of euclidean distances: 0.002390881610128061 Max euclidean distance: 0.00358716464884748 Cosine Similarity: Mean of cosine similarites: 0.9999971108082498 Min cosine similarity: 0.9999935661230478 ========================================================== ```

GiteaMirror commented

2026-05-03 23:20:52 -05:00

@royjhan commented on GitHub (Aug 13, 2024):

@rick-github thank you for contributing here. @jorgetrejo36 how are you loading Peter Pan into text, and can you see if the above changes change anything for you? I'm still not able to recreate the issue.

@royjhan commented on GitHub (Aug 13, 2024): @rick-github thank you for contributing here. @jorgetrejo36 how are you loading Peter Pan into text, and can you see if the above changes change anything for you? I'm still not able to recreate the issue.

GiteaMirror commented

2026-05-03 23:20:53 -05:00

@jorgetrejo36 commented on GitHub (Aug 14, 2024):

@royjhan Did some more testing and it seems to be a problem with the parallelism ollama does. We took away these two env vars:

OLLAMA_MAX_QUEUE=1000
OLLAMA_NUM_PARALLEL=100

and we got similiar results to @rick-github.

Ideally, we would like to keep these env vars set but until then a temp fix will be just disabling these settings. Let me know if you get anything similar to my initial results when these env vars are set.

@jorgetrejo36 commented on GitHub (Aug 14, 2024): @royjhan Did some more testing and it seems to be a problem with the parallelism ollama does. We took away these two env vars: ``` OLLAMA_MAX_QUEUE=1000 OLLAMA_NUM_PARALLEL=100 ``` and we got similiar results to @rick-github. Ideally, we would like to keep these env vars set but until then a temp fix will be just disabling these settings. Let me know if you get anything similar to my initial results when these env vars are set.

GiteaMirror commented

2026-05-03 23:20:56 -05:00

@rick-github commented on GitHub (Aug 14, 2024):

Can you post some server logs? I'm curious as to the effect of OLLAMA_NUM_PARALLEL/OLLAMA_MAX_QUEUE on scheduling on your hardware.

@rick-github commented on GitHub (Aug 14, 2024): Can you post some [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues)? I'm curious as to the effect of OLLAMA_NUM_PARALLEL/OLLAMA_MAX_QUEUE on scheduling on your hardware.

GiteaMirror commented

2026-05-03 23:20:57 -05:00

@jorgetrejo36 commented on GitHub (Aug 21, 2024):

@rick-github

server-logs.txt

I OLLAMA_NUM_PARALLEL = 8 and I still get the same issue. When I did a single test with OLLAMA_NUM_PARALLEL = 4 everything seemed to work fine.

@jorgetrejo36 commented on GitHub (Aug 21, 2024): @rick-github [server-logs.txt](https://github.com/user-attachments/files/16698360/server-logs.txt) I OLLAMA_NUM_PARALLEL = 8 and I still get the same issue. When I did a single test with OLLAMA_NUM_PARALLEL = 4 everything seemed to work fine.

GiteaMirror commented

2026-05-03 23:20:59 -05:00

@rick-github commented on GitHub (Aug 21, 2024):

Thanks, I also got poor results at OLLAMA_NUM_PARALLEL=8, so something to keep in mind when doing bulk embeddings.

@rick-github commented on GitHub (Aug 21, 2024): Thanks, I also got poor results at `OLLAMA_NUM_PARALLEL=8`, so something to keep in mind when doing bulk embeddings.

GiteaMirror commented

2026-05-03 23:21:01 -05:00

@jorgetrejo36 commented on GitHub (Sep 12, 2024):

@rick-github @royjhan Is there any possibility of this being fixed or has there been any updates on it?

@jorgetrejo36 commented on GitHub (Sep 12, 2024): @rick-github @royjhan Is there any possibility of this being fixed or has there been any updates on it?

GiteaMirror commented

2026-05-03 23:21:03 -05:00

@rick-github commented on GitHub (Sep 13, 2024):

I'm looking at issues with generation being bad with high OLLAMA_NUM_PARALLEL, so it's not just embeddings. But I currently don't have a fix, it's a work in progress.

@rick-github commented on GitHub (Sep 13, 2024): I'm looking at issues with generation being bad with high OLLAMA_NUM_PARALLEL, so it's not just embeddings. But I currently don't have a fix, it's a work in progress.

GiteaMirror commented

2026-05-03 23:21:05 -05:00

@gaileys commented on GitHub (Oct 11, 2024):

I can confirm I'm seeing similar problems with a batch size of 128. I have OLLAMA_NUM_PARALLEL=4. I haven't run thie above code to test the extent of this problem, but I have moved from non-batch processing to batch and see this clearly in the results.

@gaileys commented on GitHub (Oct 11, 2024): I can confirm I'm seeing similar problems with a batch size of 128. I have OLLAMA_NUM_PARALLEL=4. I haven't run thie above code to test the extent of this problem, but I have moved from non-batch processing to batch and see this clearly in the results.

GiteaMirror commented

2026-05-03 23:21:07 -05:00

@gaileys commented on GitHub (Oct 15, 2024):

My understanding is the the problem is the order of resultant embeddings is not the same as the order of the chunks supplied. The Embeddings are correct but they are randomised.

@gaileys commented on GitHub (Oct 15, 2024): My understanding is the the problem is the order of resultant embeddings is not the same as the order of the chunks supplied. The Embeddings are correct but they are randomised.

GiteaMirror commented

2026-05-03 23:21:10 -05:00

@gaileys commented on GitHub (Nov 5, 2024):

My understanding is the the problem is the order of resultant embeddings is not the same as the order of the chunks supplied. The Embeddings are correct but they are randomised.

OK, I have run some more tests and that isn't the case. This is as stated origninally that the larger batch sizes generate increasinly dispersed results. I'm using the embeddings for clustering and the results are unusable for me at batch sizes of 128.

@gaileys commented on GitHub (Nov 5, 2024): > My understanding is the the problem is the order of resultant embeddings is not the same as the order of the chunks supplied. The Embeddings are correct but they are randomised. OK, I have run some more tests and that isn't the case. This is as stated origninally that the larger batch sizes generate increasinly dispersed results. I'm using the embeddings for clustering and the results are unusable for me at batch sizes of 128.

GiteaMirror commented

2026-05-03 23:21:14 -05:00

@KastanDay commented on GitHub (Mar 19, 2025):

@jorgetrejo36 @gaileys and @rick-github any suggestions on max stable OLLAMA_NUM_PARALLEL? I see OLLAMA_NUM_PARALLEL=128 is "unusable", but do you have perfect performance with OLLAMA_NUM_PARALLEL=64 or 32 or less?

Thanks for any guidance on the most performant + most stable config.

@KastanDay commented on GitHub (Mar 19, 2025): @jorgetrejo36 @gaileys and @rick-github any suggestions on max stable `OLLAMA_NUM_PARALLEL`? I see `OLLAMA_NUM_PARALLEL=128` is "unusable", but do you have perfect performance with `OLLAMA_NUM_PARALLEL=64 or 32 or less`? Thanks for any guidance on the most performant + most stable config.

GiteaMirror commented

2026-05-03 23:21:16 -05:00

@rick-github commented on GitHub (Mar 20, 2025):

ollama doesn't currently support parallel embeddings (as of 90ca84172). If you need parallelism, the only method at the moment is to run multiple ollama servers and run a proxy in front to present a unified interface (eg https://github.com/ollama/ollama/issues/8186#issuecomment-2560443545)

@rick-github commented on GitHub (Mar 20, 2025): ollama doesn't currently support parallel embeddings (as of 90ca84172). If you need parallelism, the only method at the moment is to run multiple ollama servers and run a proxy in front to present a unified interface (eg https://github.com/ollama/ollama/issues/8186#issuecomment-2560443545)

GiteaMirror commented

2026-05-03 23:21:18 -05:00

@tobiaswuerth commented on GitHub (Apr 6, 2025):

ollama doesn't currently support parallel embeddings (as of 90ca841). If you need parallelism, the only method at the moment is to run multiple ollama servers and run a proxy in front to present a unified interface (eg #8186 (comment))

Looking forward to embedding batching, is there a release timeline for this feature?

@tobiaswuerth commented on GitHub (Apr 6, 2025): > ollama doesn't currently support parallel embeddings (as of [90ca841](https://github.com/ollama/ollama/commit/90ca84172c2a98ecfd76eb7e05cd3e33e1dde507)). If you need parallelism, the only method at the moment is to run multiple ollama servers and run a proxy in front to present a unified interface (eg [#8186 (comment)](https://github.com/ollama/ollama/issues/8186#issuecomment-2560443545)) Looking forward to embedding batching, is there a release timeline for this feature?

GiteaMirror commented

2026-05-03 23:21:18 -05:00

@rick-github commented on GitHub (Apr 7, 2025):

If I were to guess, I'd say this will be a feature of the new runner architecture. While it doesn't support embedding at the moment, the new architecture means ollama developers won't have the same limitations as with the llama.cpp backend. How long until it gets there is anybody's guess.

@rick-github commented on GitHub (Apr 7, 2025): If I were to guess, I'd say this will be a feature of the new runner architecture. While it [doesn't support](https://github.com/ollama/ollama/blob/0f3f9e353df96d4cfc40ac19114c782a57fe30f5/runner/ollamarunner/runner.go#L867) embedding at the moment, the new architecture means ollama developers won't have the same limitations as with the llama.cpp backend. How long until it gets there is anybody's guess.

GiteaMirror commented

2026-05-03 23:21:19 -05:00

@jorgetrejo36 commented on GitHub (Apr 7, 2025):

Just to clarify,

Does batch processing for embeddings work at all? If OLLAMA_NUM_PARALLEL is just set to 1 will batch processing work or does the current implementation depend on the parallelism stuff?

@rick-github

@jorgetrejo36 commented on GitHub (Apr 7, 2025): Just to clarify, Does batch processing for embeddings work at all? If OLLAMA_NUM_PARALLEL is just set to 1 will batch processing work or does the current implementation depend on the parallelism stuff? @rick-github

GiteaMirror commented

2026-05-03 23:21:20 -05:00

@rick-github commented on GitHub (Apr 7, 2025):

Batch processing works, just not in parallel. The contents of the batch are serialized, processed, and the embeddings are then returned to the client.

@rick-github commented on GitHub (Apr 7, 2025): Batch processing works, just not in parallel. The contents of the batch are serialized, processed, and the embeddings are then returned to the client.

GiteaMirror commented

2026-05-03 23:21:22 -05:00

@tobiaswuerth commented on GitHub (Apr 7, 2025):

Batch processing works, just not in parallel. The contents of the batch are serialized, processed, and the embeddings are then returned to the client.

Ah, okay! Then I misunderstood. Because on the website at the bottom it states:

Coming soon: Batch embeddings: processing multiple input data prompts simultaneously

@tobiaswuerth commented on GitHub (Apr 7, 2025): > Batch processing works, just not in parallel. The contents of the batch are serialized, processed, and the embeddings are then returned to the client. Ah, okay! Then I misunderstood. Because on the [website](https://ollama.com/blog/embedding-models) at the bottom it states: > Coming soon: Batch embeddings: processing multiple input data prompts simultaneously

GiteaMirror commented

2026-05-03 23:21:24 -05:00

@PaulCapestany commented on GitHub (Apr 8, 2025):

FWIW: unless I'm missing something, it appears that with a simple tweak, parallel processing to /embed can now work, including with batching. All I did was change numParallel = 1 to numParallel = 4, built a new ollama binary via go install ./..., and yeah, after running a simple shell script to sanity-check/test sequential vs parallel requests, it seems to work fine for me.

It looks like the problem originated in llamacpp (https://github.com/ggml-org/llama.cpp/issues/6722), and then the issue was addressed by disabling parallel embedding generation in ollama (https://github.com/ollama/ollama/pull/6467), but, it appears to me that it must have been fixed in llamacpp since I am no longer getting server crashes.

@PaulCapestany commented on GitHub (Apr 8, 2025): FWIW: unless I'm missing something, it appears that with a simple tweak, parallel processing to `/embed` can now work, including with batching. All I did was change [`numParallel = 1`](https://github.com/ollama/ollama/blob/ad22ace439eb3fab7230134e56bb6276a78347e4/server/sched.go#L198) to `numParallel = 4`, built a new ollama binary via `go install ./...`, and yeah, after running a simple [shell script](https://gist.github.com/PaulCapestany/6d2c6f0d9bb7261ceedb736268ad6377) to sanity-check/test sequential vs parallel requests, it seems to work fine for me. It looks like the problem originated in llamacpp (https://github.com/ggml-org/llama.cpp/issues/6722), and then the issue was addressed by disabling parallel embedding generation in ollama (https://github.com/ollama/ollama/pull/6467), but, it appears to me that it must have been fixed in llamacpp since I am no longer getting server crashes.

GiteaMirror commented

2026-05-03 23:21:25 -05:00

@rick-github commented on GitHub (Apr 8, 2025):

The problem is not server crashes, it's that embeddings done in parallel are corrupted.

$ for p in 1 2 3 4 ; do echo parallel=$p ; seq 0 9 | parallel -n0 -P$p curl -s localhost:11434/api/embed -d \''{"model":"snowflake-arctic-embed","input":"What is the weather like today?"}'\' | jq -c '.embeddings[0]|.[0:3] + ["..."] + .[-3:]' ; done
parallel=1
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
parallel=2
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
parallel=3
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896]
[0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896]
[0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
parallel=4
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896]
[0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]
[0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468]

@rick-github commented on GitHub (Apr 8, 2025): The problem is not server crashes, it's that embeddings done in parallel are corrupted. ```console $ for p in 1 2 3 4 ; do echo parallel=$p ; seq 0 9 | parallel -n0 -P$p curl -s localhost:11434/api/embed -d \''{"model":"snowflake-arctic-embed","input":"What is the weather like today?"}'\' | jq -c '.embeddings[0]|.[0:3] + ["..."] + .[-3:]' ; done parallel=1 [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] parallel=2 [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] parallel=3 [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896] [0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896] [0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] parallel=4 [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896] [0.00024531296,-0.05799037,-0.0064362884,"...",-0.004672217,-0.016983667,-0.043087896] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] [0.0001809538,-0.05792252,-0.0065146764,"...",-0.004687789,-0.016948145,-0.043066468] ```

GiteaMirror commented

2026-05-03 23:21:26 -05:00

@PaulCapestany commented on GitHub (Apr 9, 2025):

@rick-github thanks, clearly in my haste I was missing many things 😅

I tried @jorgetrejo36's python script (with your improvements) while running my version of ollama with parallelization enabled for embeddings (via export OLLAMA_NUM_PARALLEL=4; ollama serve) and, yeah, the consistency/similarity results were atrocious. Results were totally fine with OLLAMA_NUM_PARALLEL=1 though.

I was curious about how to attempt to optimize embedding throughput even with OLLAMA_NUM_PARALLEL=1 limitation via chunk-sizing and/or batch-sizing, if anyone else is, here's what I saw in my testing (repo here, caveat emptor):

@PaulCapestany commented on GitHub (Apr 9, 2025): @rick-github thanks, clearly in my haste I was missing _many_ things 😅 I tried @jorgetrejo36's python script (with your improvements) while running my version of ollama with parallelization enabled for embeddings (via `export OLLAMA_NUM_PARALLEL=4; ollama serve`) and, yeah, the consistency/similarity results were atrocious. Results were totally fine with `OLLAMA_NUM_PARALLEL=1` though. I was curious about how to attempt to optimize embedding throughput even with `OLLAMA_NUM_PARALLEL=1` limitation via chunk-sizing and/or batch-sizing, if anyone else is, here's what I saw in my testing ([repo here](https://github.com/PaulCapestany/embed-tests), caveat emptor): ![Image](https://github.com/user-attachments/assets/cf89f359-45bd-406f-b936-5a6f1999afc5)

GiteaMirror commented

2026-05-03 23:21:27 -05:00

@vinipy12 commented on GitHub (Nov 17, 2025):

Is this still a thing? Or can we finally do parallel embeddings?

@vinipy12 commented on GitHub (Nov 17, 2025): Is this still a thing? Or can we finally do parallel embeddings?

GiteaMirror commented

2026-05-03 23:21:28 -05:00

@rick-github commented on GitHub (Nov 17, 2025):

Embedding models still do not support parallelism. A client can make parallel calls, but they will be serialized for processing.

@rick-github commented on GitHub (Nov 17, 2025): Embedding models still do not support parallelism. A client can make parallel calls, but they will be serialized for processing.

GiteaMirror commented

2026-05-03 23:21:30 -05:00

@vinipy12 commented on GitHub (Nov 17, 2025):

So, if I make multiple ollama.embeddings() call through an asyncio.semaphore, they will be queued and serialized right?

@vinipy12 commented on GitHub (Nov 17, 2025): So, if I make multiple ollama.embeddings() call through an asyncio.semaphore, they will be queued and serialized right?

Sign in to join this conversation.

Branches Tags

main

mxyng/docs-cloud

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#65957