[GH-ISSUE #1572] Embeddings response too slow #62898

Closed
opened 2026-05-03 10:40:53 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @perezjnv on GitHub (Dec 17, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1572

I did an ingest with a CSV for fine tuning in a model called2-7b in .bin format, that worked well for me but when using ollma with a Modelfile that implements it the responses are too slow, any suggestions?

Originally created by @perezjnv on GitHub (Dec 17, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1572 I did an ingest with a CSV for fine tuning in a model called2-7b in .bin format, that worked well for me but when using ollma with a Modelfile that implements it the responses are too slow, any suggestions?
GiteaMirror added the embeddingsquestionperformance labels 2026-05-03 10:41:06 -05:00
Author
Owner

@igorschlum commented on GitHub (Dec 17, 2023):

@perezjnv can you provide a sample of the CSV file and the way to do it. I will check to see if it's too slow. I think it depends on the size of the CSV, the power of your computer and the expectation of what is too slow.

<!-- gh-comment-id:1859299824 --> @igorschlum commented on GitHub (Dec 17, 2023): @perezjnv can you provide a sample of the CSV file and the way to do it. I will check to see if it's too slow. I think it depends on the size of the CSV, the power of your computer and the expectation of what is too slow.
Author
Owner

@perezjnv commented on GitHub (Dec 18, 2023):

Hello Igor! Thanks for answering. I am doing a conceptual test for a school system, I need to do fine tuning of documents, the largest one has 126 pages, I am doing questions and answers to ingest a model with llama2-7b.
I have carried out tests by ingesting the file llama-2-7b-chat.ggmlv3.q8_0.bin.
The idea is to have several versions of the llama2 model separately for different study subjects or other topics.
I am using a Mac m1 for development, with 16RAM

<!-- gh-comment-id:1859448393 --> @perezjnv commented on GitHub (Dec 18, 2023): Hello Igor! Thanks for answering. I am doing a conceptual test for a school system, I need to do fine tuning of documents, the largest one has 126 pages, I am doing questions and answers to ingest a model with llama2-7b. I have carried out tests by ingesting the file llama-2-7b-chat.ggmlv3.q8_0.bin. The idea is to have several versions of the llama2 model separately for different study subjects or other topics. I am using a Mac m1 for development, with 16RAM
Author
Owner

@easp commented on GitHub (Dec 18, 2023):

So you are using a custom, fine-tuned 7b model? What quantization level? How large are the model weights?

<!-- gh-comment-id:1861266193 --> @easp commented on GitHub (Dec 18, 2023): So you are using a custom, fine-tuned 7b model? What quantization level? How large are the model weights?
Author
Owner

@mchiang0610 commented on GitHub (Mar 11, 2024):

Hi @perezjnv Sorry about this. How are you ingesting the CSV file? Are you using embeddings? Ollama just started supporting embeddings, and I'm wondering if you are still hitting the same problem?

Also wondering if you are using the EMBED feature within modelfile?

<!-- gh-comment-id:1989177254 --> @mchiang0610 commented on GitHub (Mar 11, 2024): Hi @perezjnv Sorry about this. How are you ingesting the CSV file? Are you using embeddings? Ollama just started supporting embeddings, and I'm wondering if you are still hitting the same problem? Also wondering if you are using the EMBED feature within modelfile?
Author
Owner

@jmorganca commented on GitHub (May 6, 2024):

Hi there new embedding models are available now that should be much, much faster:

https://ollama.com/library/all-minilm
https://ollama.com/library/nomic-embed-text

<!-- gh-comment-id:2097103173 --> @jmorganca commented on GitHub (May 6, 2024): Hi there new embedding models are available now that should be much, much faster: https://ollama.com/library/all-minilm https://ollama.com/library/nomic-embed-text
Author
Owner

@igorschlum commented on GitHub (May 7, 2024):

Hi @perezjnv now llama3 is out. I'm also working for a school. Have you seen version 0.1.33 of Ollama, you can now launch several Ollama and several modèles at the same time.
If you could share you project on Github, it could be interesting to share with other schools.

<!-- gh-comment-id:2097607451 --> @igorschlum commented on GitHub (May 7, 2024): Hi @perezjnv now llama3 is out. I'm also working for a school. Have you seen version 0.1.33 of Ollama, you can now launch several Ollama and several modèles at the same time. If you could share you project on Github, it could be interesting to share with other schools.
Author
Owner

@igorschlum commented on GitHub (Nov 30, 2024):

Hi @perezjnv, if you need help with this, I can assist as well. I'd be interested to know if you succeed.

<!-- gh-comment-id:2509401698 --> @igorschlum commented on GitHub (Nov 30, 2024): Hi @perezjnv, if you need help with this, I can assist as well. I'd be interested to know if you succeed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62898