[GH-ISSUE #9969] Some Questions about Using Embedding Models in Ollama #32289

Closed
opened 2026-04-22 13:24:57 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @20246688 on GitHub (Mar 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9969

Hello! I've noticed that when using vector generation models on Ollama, the speed is really fast. Currently, I've found two links where I can learn how to use vector generation models on Ollama. They are https://github.com/ollama/ollama-python/blob/main/examples/embed.py and https://ollama.com/blog/embedding-models. But I'm also really eager to learn if there are any other parameters that I need to pay attention to when setting them up. Are there any specific details about parameter configuration that I should be aware of?

Originally created by @20246688 on GitHub (Mar 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9969 Hello! I've noticed that when using vector generation models on Ollama, the speed is really fast. Currently, I've found two links where I can learn how to use vector generation models on Ollama. They are https://github.com/ollama/ollama-python/blob/main/examples/embed.py and https://ollama.com/blog/embedding-models. But I'm also really eager to learn if there are any other parameters that I need to pay attention to when setting them up. Are there any specific details about parameter configuration that I should be aware of?
Author
Owner

@rick-github commented on GitHub (Mar 26, 2025):

There's not much that needs to be tuned for embedding models. However, there is an outstanding issue regarding context size. TL;DR: explictly set num_ctx in the API call to the context size supported by the model. Setting it higher, or letting ollama use the default of 2048, can cause the runner to crash. The length of the text (chunk size) that you feed to the embedding model should be such that the number of tokens that the text is turned in to is less than the context size, otherwise you risk losing semantic content in the returned embedding.

<!-- gh-comment-id:2753011125 --> @rick-github commented on GitHub (Mar 26, 2025): There's not much that needs to be tuned for embedding models. However, there is an [outstanding issue](https://github.com/ollama/ollama/issues/7288) regarding context size. TL;DR: explictly set `num_ctx` in the API call to the context size supported by the model. Setting it higher, or letting ollama use the default of 2048, can cause the runner to crash. The length of the text (chunk size) that you feed to the embedding model should be such that the number of tokens that the text is turned in to is less than the context size, otherwise you risk losing semantic content in the returned embedding.
Author
Owner

@20246688 commented on GitHub (Mar 26, 2025):

There's not much that needs to be tuned for embedding models. However, there is an outstanding issue regarding context size. TL;DR: explictly set in the API call to the context size supported by the model. Setting it higher, or letting ollama use the default of 2048, can cause the runner to crash. The length of the text (chunk size) that you feed to the embedding model should be such that the number of tokens that the text is turned in to is less than the context size, otherwise you risk losing semantic content in the returned embedding.num_ctx

Thank you so much. I had a vague feeling that there would be context limitations before. I really appreciate your confirmation!

<!-- gh-comment-id:2753678893 --> @20246688 commented on GitHub (Mar 26, 2025): > There's not much that needs to be tuned for embedding models. However, there is an [outstanding issue](https://github.com/ollama/ollama/issues/7288) regarding context size. TL;DR: explictly set in the API call to the context size supported by the model. Setting it higher, or letting ollama use the default of 2048, can cause the runner to crash. The length of the text (chunk size) that you feed to the embedding model should be such that the number of tokens that the text is turned in to is less than the context size, otherwise you risk losing semantic content in the returned embedding.`num_ctx` Thank you so much. I had a vague feeling that there would be context limitations before. I really appreciate your confirmation!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32289