[PR #13429] [MERGED] embeddings: modified batch size #14203

Closed
opened 2026-04-13 00:48:16 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13429
Author: @npardal
Created: 12/11/2025
Status: Merged
Merged: 12/11/2025
Merged by: @npardal

Base: mainHead: nicole/batch


📝 Commits (6)

  • 16426e4 modified batch size for embedding models
  • 6b297ba simplified
  • 95dc47f added better logging and test cases
  • 1c6e060 udpated batch size
  • b246ea7 refactored server conditional
  • c8f7ae4 set llama.cpp ubatch to batch size

📊 Changes

5 files changed (+78 additions, -7 deletions)

View changed files

📝 integration/embed_test.go (+57 -0)
📝 llama/llama.go (+2 -1)
📝 llm/server.go (+7 -0)
📝 runner/llamarunner/runner.go (+1 -1)
📝 runner/ollamarunner/runner.go (+11 -5)

📄 Description

This PR detects embedding models and sets batch_size = context_size so the full input fits in a single batch.

Previously, if batch size was smaller than the input, tokens could be split across batches and cause a SIGTRAP crash.

In the old runner, embedding models were detected via pooling type.
In the new runner, we detect them by checking for the presence of a cache.

This change ensures all tokens stay in one batch and prevents crashes.

Fixes: #12938 #13054


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13429 **Author:** [@npardal](https://github.com/npardal) **Created:** 12/11/2025 **Status:** ✅ Merged **Merged:** 12/11/2025 **Merged by:** [@npardal](https://github.com/npardal) **Base:** `main` ← **Head:** `nicole/batch` --- ### 📝 Commits (6) - [`16426e4`](https://github.com/ollama/ollama/commit/16426e4573b98de08e0e8a9ff347604504df01a6) modified batch size for embedding models - [`6b297ba`](https://github.com/ollama/ollama/commit/6b297ba3427aba0316fd72338e69008f3c35ab33) simplified - [`95dc47f`](https://github.com/ollama/ollama/commit/95dc47ff642b0b08a680fda6d6166f080dfc42e5) added better logging and test cases - [`1c6e060`](https://github.com/ollama/ollama/commit/1c6e0606c51dd47362364a843b605640ba42fccc) udpated batch size - [`b246ea7`](https://github.com/ollama/ollama/commit/b246ea7e37861a912f9704664fc74dbc76d9f87b) refactored server conditional - [`c8f7ae4`](https://github.com/ollama/ollama/commit/c8f7ae43c6629823cac55272367d6fba32ce7b77) set llama.cpp ubatch to batch size ### 📊 Changes **5 files changed** (+78 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `integration/embed_test.go` (+57 -0) 📝 `llama/llama.go` (+2 -1) 📝 `llm/server.go` (+7 -0) 📝 `runner/llamarunner/runner.go` (+1 -1) 📝 `runner/ollamarunner/runner.go` (+11 -5) </details> ### 📄 Description This PR detects embedding models and sets batch_size = context_size so the full input fits in a single batch. Previously, if batch size was smaller than the input, tokens could be split across batches and cause a SIGTRAP crash. In the old runner, embedding models were detected via pooling type. In the new runner, we detect them by checking for the presence of a cache. This change ensures all tokens stay in one batch and prevents crashes. Fixes: #12938 #13054 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:48:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14203