[GH-ISSUE #1624] Some questions about embedding api #26662

Closed
opened 2026-04-22 03:04:56 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @lingen on GitHub (Dec 20, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1624

Hi, I have some questions about embedding api of Ollama.

As Ollama document's guide, we can use embedding API, as

curl http://localhost:11434/api/embeddings -d '{
  "model": "llama2",
  "prompt": "Here is an article about llamas..."
}'

But I feel very strange about the API.

I know some llm model like 'llama2' is not an embedding model but for text generation. There are many special models just for embedding, the BGE embedding model for example.

And I know Every embedding model has its max token length limits and Dimension length.

So what does Ollama's embedding API mean?

If I use llama2 and Ollama in embedding API, What are the differents between the BGE Embedding model? Also what is the max token length of Ollama embedding?

If anyone can answer my questions, I would be very grateful.

Originally created by @lingen on GitHub (Dec 20, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1624 Hi, I have some questions about embedding api of Ollama. As Ollama document's guide, we can use embedding API, as ```shell curl http://localhost:11434/api/embeddings -d '{ "model": "llama2", "prompt": "Here is an article about llamas..." }' ``` But I feel very strange about the API. I know some llm model like 'llama2' is not an embedding model but for text generation. There are many special models just for embedding, the BGE embedding model for example. And I know Every embedding model has its max token length limits and Dimension length. So what does Ollama's embedding API mean? If I use llama2 and Ollama in embedding API, What are the differents between the BGE Embedding model? Also what is the max token length of Ollama embedding? If anyone can answer my questions, I would be very grateful.
Author
Owner

@Martin7-1 commented on GitHub (Dec 20, 2023):

I also want to know how llama2 get embeddings...It confuses me few days. In llama.cpp it looks like get embeddings in decode phase (see llama_decode_internal()), but I don't know exactly how it computes.

<!-- gh-comment-id:1864414579 --> @Martin7-1 commented on GitHub (Dec 20, 2023): I also want to know how `llama2` get embeddings...It confuses me few days. In `llama.cpp` it looks like get embeddings in decode phase (see [`llama_decode_internal()`](https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L6203)), but I don't know exactly how it computes.
Author
Owner

@iplayfast commented on GitHub (Dec 23, 2023):

From gpt-4:
User
How are embeddings used in large language models (llm). I have an api for a llm that will return the embeddings which looks like this:
curl http://localhost:11434/api/embeddings -d '{
"model": "llama2",
"prompt": "Here is an article about llamas..."
}'
my question is, how do I use it?

ChatGPT
Embeddings in large language models (LLMs) like GPT-3 or GPT-4 are numerical representations of text data. They are crucial for enabling the model to understand and process language. Here's a breakdown of how embeddings are used and how you can utilize them through an API:

Understanding Embeddings
Representation: Embeddings convert words, phrases, or entire text blocks into a vector of numbers. Each number in this vector represents a feature of the text, capturing semantic and syntactic information.

Dimensionality: These vectors are high-dimensional, meaning they contain many elements. This high dimensionality allows them to capture complex relationships in the language.

Contextual Understanding: In LLMs, embeddings are context-dependent. The same word can have different embeddings based on its context, allowing the model to understand nuances in language.

Using Embeddings via API
Given your example API call, here's a step-by-step guide on how to use it:

Prepare the Prompt:

The prompt field in your API call should contain the text for which you want embeddings. This could be a sentence, a paragraph, or any text snippet.
Example: "prompt": "Llamas are domesticated animals used in the Andes Mountains."
Specify the Model:

The model field is where you specify the language model you wish to use, in this case, llama2.
Making the API Call:

Use the provided curl command to make a request to the API. Replace the example text with your desired prompt.
Execute this command in your command line or terminal.
Receiving the Response:

The API will return a response containing embeddings for your text. This will be a numerical vector (or a set of vectors).
Using the Embeddings:

Text Similarity: Compare embeddings to find how similar different texts are. This is useful for applications like semantic search or content recommendation.
Text Classification: Use embeddings as input features for machine learning models to classify texts into categories.
Further NLP Tasks: Embeddings can be used for various NLP tasks like sentiment analysis, language translation, or summarization.
Practical Applications
Content Analysis: Analyzing articles, blogs, or social media posts for sentiment, trends, or user engagement.
Search and Recommendation Systems: Improving search algorithms by comparing the semantic similarity of search queries and documents.
Customer Support Automation: Analyzing customer queries to categorize and route them to appropriate departments.
Things to Note
Understanding the Output: The output vectors may be complex and high-dimensional, often requiring further processing or machine learning knowledge to be effectively utilized.
Customization: Depending on the capabilities of your LLM API, you might be able to customize certain parameters like the length of the embeddings or the level of detail.
In summary, embeddings provide a powerful way to quantitatively represent text, and through your API, you can leverage these representations for various advanced NLP tasks.

<!-- gh-comment-id:1868195977 --> @iplayfast commented on GitHub (Dec 23, 2023): From gpt-4: User How are embeddings used in large language models (llm). I have an api for a llm that will return the embeddings which looks like this: curl http://localhost:11434/api/embeddings -d '{ "model": "llama2", "prompt": "Here is an article about llamas..." }' my question is, how do I use it? ChatGPT Embeddings in large language models (LLMs) like GPT-3 or GPT-4 are numerical representations of text data. They are crucial for enabling the model to understand and process language. Here's a breakdown of how embeddings are used and how you can utilize them through an API: Understanding Embeddings Representation: Embeddings convert words, phrases, or entire text blocks into a vector of numbers. Each number in this vector represents a feature of the text, capturing semantic and syntactic information. Dimensionality: These vectors are high-dimensional, meaning they contain many elements. This high dimensionality allows them to capture complex relationships in the language. Contextual Understanding: In LLMs, embeddings are context-dependent. The same word can have different embeddings based on its context, allowing the model to understand nuances in language. Using Embeddings via API Given your example API call, here's a step-by-step guide on how to use it: Prepare the Prompt: The prompt field in your API call should contain the text for which you want embeddings. This could be a sentence, a paragraph, or any text snippet. Example: "prompt": "Llamas are domesticated animals used in the Andes Mountains." Specify the Model: The model field is where you specify the language model you wish to use, in this case, llama2. Making the API Call: Use the provided curl command to make a request to the API. Replace the example text with your desired prompt. Execute this command in your command line or terminal. Receiving the Response: The API will return a response containing embeddings for your text. This will be a numerical vector (or a set of vectors). Using the Embeddings: Text Similarity: Compare embeddings to find how similar different texts are. This is useful for applications like semantic search or content recommendation. Text Classification: Use embeddings as input features for machine learning models to classify texts into categories. Further NLP Tasks: Embeddings can be used for various NLP tasks like sentiment analysis, language translation, or summarization. Practical Applications Content Analysis: Analyzing articles, blogs, or social media posts for sentiment, trends, or user engagement. Search and Recommendation Systems: Improving search algorithms by comparing the semantic similarity of search queries and documents. Customer Support Automation: Analyzing customer queries to categorize and route them to appropriate departments. Things to Note Understanding the Output: The output vectors may be complex and high-dimensional, often requiring further processing or machine learning knowledge to be effectively utilized. Customization: Depending on the capabilities of your LLM API, you might be able to customize certain parameters like the length of the embeddings or the level of detail. In summary, embeddings provide a powerful way to quantitatively represent text, and through your API, you can leverage these representations for various advanced NLP tasks.
Author
Owner

@mikkel1156 commented on GitHub (Jan 6, 2024):

LLMs use tokenizers to convert your text to their respective tokens (numerical data) which the LLM can understand. The embeddings API seems to just be using the tokenizer of the respective models, this might be useful for showing or calculating token limit. So they are not for the same purpose as normal embedding models for use in vector databases for example.

To my knowledge there is no token limit in the tokenizer, it will process the text and give the token output, the token limit is on the LLM side to my knowledge (please correct me if I'm wrong on that).

However it would be cool to be able to use "normal" embedding models for use in vector databases (my use-case) and other areas.

<!-- gh-comment-id:1879695024 --> @mikkel1156 commented on GitHub (Jan 6, 2024): LLMs use tokenizers to convert your text to their respective tokens (numerical data) which the LLM can understand. The embeddings API seems to just be using the tokenizer of the respective models, this might be useful for showing or calculating token limit. So they are not for the same purpose as normal embedding models for use in vector databases for example. To my knowledge there is no token limit in the tokenizer, it will process the text and give the token output, the token limit is on the LLM side to my knowledge (please correct me if I'm wrong on that). However it would be cool to be able to use "normal" embedding models for use in vector databases (my use-case) and other areas.
Author
Owner

@logancyang commented on GitHub (Jan 11, 2024):

The embeddings API seems to just be using the tokenizer of the respective models

@mikkel1156 Interesting, did you search the source code to get to this conclusion? If this is true, then it's not really an embedding API that can be used for semantic search.

<!-- gh-comment-id:1885964854 --> @logancyang commented on GitHub (Jan 11, 2024): > The embeddings API seems to just be using the tokenizer of the respective models @mikkel1156 Interesting, did you search the source code to get to this conclusion? If this is true, then it's not really an embedding API that can be used for semantic search.
Author
Owner

@logancyang commented on GitHub (Jan 11, 2024):

I have this exact question, asked in Discord and didn't get a clear answer. I feel this should be in the Ollama doc.

<!-- gh-comment-id:1885967272 --> @logancyang commented on GitHub (Jan 11, 2024): I have this exact question, asked in Discord and didn't get a clear answer. I feel this should be in the Ollama doc.
Author
Owner

@lingen commented on GitHub (Jan 11, 2024):

@mikkel1156 Thank you for your explanation of this API. I already use BGE for vector embedding.

<!-- gh-comment-id:1886535115 --> @lingen commented on GitHub (Jan 11, 2024): @mikkel1156 Thank you for your explanation of this API. I already use BGE for vector embedding.
Author
Owner

@mikkel1156 commented on GitHub (Jan 11, 2024):

The embeddings API seems to just be using the tokenizer of the respective models

@mikkel1156 Interesting, did you search the source code to get to this conclusion? If this is true, then it's not really an embedding API that can be used for semantic search.

It was simply the logical conclusion since it takes in a LLM model as input and ollama doesnt (yet?) support "normal" embedding models. So the tokens/vectors would realistically come from the tokenizer.

<!-- gh-comment-id:1886911725 --> @mikkel1156 commented on GitHub (Jan 11, 2024): > > The embeddings API seems to just be using the tokenizer of the respective models > > @mikkel1156 Interesting, did you search the source code to get to this conclusion? If this is true, then it's not really an embedding API that can be used for semantic search. > > It was simply the logical conclusion since it takes in a LLM model as input and ollama doesnt (yet?) support "normal" embedding models. So the tokens/vectors would realistically come from the tokenizer.
Author
Owner

@mkhludnev commented on GitHub (Jun 18, 2024):

I think this is relevant https://github.com/ggerganov/llama.cpp/issues/899

<!-- gh-comment-id:2176394186 --> @mkhludnev commented on GitHub (Jun 18, 2024): I think this is relevant https://github.com/ggerganov/llama.cpp/issues/899
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26662