[PR #3642] [CLOSED] Batch embeddings #11230

Closed
opened 2026-04-12 23:24:56 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3642
Author: @jmorganca
Created: 4/15/2024
Status: Closed

Base: mainHead: jmorganca/batch-embeddings


📝 Commits (1)

📊 Changes

8 files changed (+241 additions, -70 deletions)

View changed files

📝 api/types.go (+6 -4)
📝 docs/api.md (+32 -1)
integration/embedding_test.go (+64 -0)
📝 integration/utils_test.go (+83 -0)
📝 llm/ext_server/server.cpp (+12 -39)
📝 llm/server.go (+12 -10)
📝 server/routes.go (+30 -14)
📝 server/sched_test.go (+2 -2)

📄 Description

This change makes it possible to generate embeddings for more than one string at once, making it much faster to process many chunks simultaneously with the /api/embeddings endpoint:

 curl http://localhost:11434/api/embeddings -d '{
  "model": "all-minilm",
  "prompt_batch": [
    "Here is an article about llamas...",
    "Here is another article about llamas...",
    "Here is a third article about llamas...",
    ...
    "Here is yet another article..."
  ]
}'

The API will then respond with an embeddings field that contains a list of vector embeddings:

{
  "embedding_batch": [
    [0.010080209001898766 ... 0.308726966381073],
    ...
    [-0.15468695759773254 ... 0.09876912087202072]
  ]
}

TODO:

  • Finalize API
  • Finalize behavior when empty prompt is provided
  • Add more tests

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3642 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 4/15/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `jmorganca/batch-embeddings` --- ### 📝 Commits (1) - [`ad7e641`](https://github.com/ollama/ollama/commit/ad7e64181525cd22f312f5d053aaf4eb84685b0b) add batch embeddings ### 📊 Changes **8 files changed** (+241 additions, -70 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+6 -4) 📝 `docs/api.md` (+32 -1) ➕ `integration/embedding_test.go` (+64 -0) 📝 `integration/utils_test.go` (+83 -0) 📝 `llm/ext_server/server.cpp` (+12 -39) 📝 `llm/server.go` (+12 -10) 📝 `server/routes.go` (+30 -14) 📝 `server/sched_test.go` (+2 -2) </details> ### 📄 Description This change makes it possible to generate embeddings for more than one string at once, making it much faster to process many chunks simultaneously with the `/api/embeddings` endpoint: ``` curl http://localhost:11434/api/embeddings -d '{ "model": "all-minilm", "prompt_batch": [ "Here is an article about llamas...", "Here is another article about llamas...", "Here is a third article about llamas...", ... "Here is yet another article..." ] }' ``` The API will then respond with an `embeddings` field that contains a list of vector embeddings: ``` { "embedding_batch": [ [0.010080209001898766 ... 0.308726966381073], ... [-0.15468695759773254 ... 0.09876912087202072] ] } ``` TODO: - [ ] Finalize API - [ ] Finalize behavior when empty prompt is provided - [ ] Add more tests --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:24:56 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11230