[GH-ISSUE #11124] Ollama is using Too high resource with ChromaDB when vector_store.similarity_search_with_score is used. #33096

Closed
opened 2026-04-22 15:22:59 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @doyoungim999 on GitHub (Jun 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11124

What is the issue?

I am using chroma with OllamaEmbeddings for similarity_search_with_score.
I found Chroma and Ollama is using High CPU ( 20cores of 32 Cores) . Is this common?
My text is 5K bytes only.

Is there any alternative to use less CPU ?
I am using embedding_fution of Chroma. Ollama process uses too high CPU like 23 cores of 32 cores. This is too high. Is this normal with model llama3.2? Do I need to change model for embeddings?

embeddings = OllamaEmbeddings(
model='llama3.2',
)
vector_store = Chroma(embedding_function=embeddings, persist_directory=vectorstore_path, collection_name=collection_name)

def vector_search(query):

CONTEXT=""
results = vector_store.similarity_search_with_score(
query, k=4
)
CONTEXT = results
return CONTEXT

Relevant log output

a lot of CPU usage like 20core with Ollama.
ollama process used 23 core of 32 Cores.

OS

OpenShift with CoreOS

GPU

No.

CPU

No response

Ollama version

ollama version is 0.6.2

Originally created by @doyoungim999 on GitHub (Jun 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11124 ### What is the issue? I am using chroma with OllamaEmbeddings for similarity_search_with_score. I found Chroma and Ollama is using High CPU ( 20cores of 32 Cores) . Is this common? My text is 5K bytes only. Is there any alternative to use less CPU ? I am using embedding_fution of Chroma. Ollama process uses too high CPU like 23 cores of 32 cores. This is too high. Is this normal with model llama3.2? Do I need to change model for embeddings? embeddings = OllamaEmbeddings( model='llama3.2', ) vector_store = Chroma(embedding_function=embeddings, persist_directory=vectorstore_path, collection_name=collection_name) def vector_search(query): CONTEXT="" results = vector_store.similarity_search_with_score( query, k=4 ) CONTEXT = results return CONTEXT ### Relevant log output ```shell a lot of CPU usage like 20core with Ollama. ollama process used 23 core of 32 Cores. ``` ### OS OpenShift with CoreOS ### GPU No. ### CPU _No response_ ### Ollama version ollama version is 0.6.2
GiteaMirror added the bug label 2026-04-22 15:22:59 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 18, 2025):

Try using an embedding model for the embeddings. You can use any model for generating embeddings. but models dedicated to embeddings require less resources.

<!-- gh-comment-id:2986018860 --> @rick-github commented on GitHub (Jun 18, 2025): Try using an [embedding model](https://ollama.com/search?c=embedding) for the embeddings. You can use any model for generating embeddings. but models dedicated to embeddings require less resources.
Author
Owner

@doyoungim999 commented on GitHub (Jun 23, 2025):

Hi
I tested llama3.2 model and bge-m3 model for my data retrieving from chorma DB.

  • llama 3.2 is using almost 30% of my ubuntu CPU.
  • bge-m3 is using 5% of my ubuntu CPU.

What makes this big difference of CPU usage ?
bge-m3 does not support generate vs llama3.2 support generate and embeddings?

<!-- gh-comment-id:2995229376 --> @doyoungim999 commented on GitHub (Jun 23, 2025): Hi I tested llama3.2 model and bge-m3 model for my data retrieving from chorma DB. - llama 3.2 is using almost 30% of my ubuntu CPU. - bge-m3 is using 5% of my ubuntu CPU. What makes this big difference of CPU usage ? bge-m3 does not support generate vs llama3.2 support generate and embeddings?
Author
Owner

@rick-github commented on GitHub (Jun 23, 2025):

models dedicated to embeddings require less resources.

<!-- gh-comment-id:2995245873 --> @rick-github commented on GitHub (Jun 23, 2025): > models dedicated to embeddings require less resources.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33096