[PR #15736] [MERGED] mlxrunner: batch the sampler across multiple sequences #77577

Closed
opened 2026-05-05 10:14:46 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15736
Author: @jessegross
Created: 4/21/2026
Status: Merged
Merged: 4/25/2026
Merged by: @jessegross

Base: mainHead: jessegross/sampler-batch


📝 Commits (2)

  • 987caa4 mlxrunner: track sampler history in a fixed-size ring buffer
  • 13f1ec9 mlxrunner: batch the sampler across multiple sequences

📊 Changes

8 files changed (+788 additions, -212 deletions)

View changed files

📝 x/mlxrunner/mlx/ops.go (+7 -3)
📝 x/mlxrunner/mlx/ops_extra.go (+3 -0)
📝 x/mlxrunner/pipeline.go (+36 -18)
📝 x/mlxrunner/runner.go (+5 -3)
📝 x/mlxrunner/sample/logprob_test.go (+53 -2)
📝 x/mlxrunner/sample/sample.go (+415 -122)
📝 x/mlxrunner/sample/sample_test.go (+267 -62)
📝 x/mlxrunner/server.go (+2 -2)

📄 Description

Register sequences with Add/Remove; each Sample call takes any subset of registered slots and samples one token per row, appending to each slot's ring-buffer history. When all slots share Options and penalty rings are full, one fused transform pass runs over the whole batch via a persistent pooled history tensor; otherwise calls fall back to per-slot serial processing indexed against the same pool.

Performance is unchanged for a single sequence, which is all that is exposed for now.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15736 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/21/2026 **Status:** ✅ Merged **Merged:** 4/25/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/sampler-batch` --- ### 📝 Commits (2) - [`987caa4`](https://github.com/ollama/ollama/commit/987caa4516bc14c5729d067afa43a3b098eb3b18) mlxrunner: track sampler history in a fixed-size ring buffer - [`13f1ec9`](https://github.com/ollama/ollama/commit/13f1ec9ced566fadeddeaaaf843482944ac1d7cb) mlxrunner: batch the sampler across multiple sequences ### 📊 Changes **8 files changed** (+788 additions, -212 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/mlx/ops.go` (+7 -3) 📝 `x/mlxrunner/mlx/ops_extra.go` (+3 -0) 📝 `x/mlxrunner/pipeline.go` (+36 -18) 📝 `x/mlxrunner/runner.go` (+5 -3) 📝 `x/mlxrunner/sample/logprob_test.go` (+53 -2) 📝 `x/mlxrunner/sample/sample.go` (+415 -122) 📝 `x/mlxrunner/sample/sample_test.go` (+267 -62) 📝 `x/mlxrunner/server.go` (+2 -2) </details> ### 📄 Description Register sequences with Add/Remove; each Sample call takes any subset of registered slots and samples one token per row, appending to each slot's ring-buffer history. When all slots share Options and penalty rings are full, one fused transform pass runs over the whole batch via a persistent pooled history tensor; otherwise calls fall back to per-slot serial processing indexed against the same pool. Performance is unchanged for a single sequence, which is all that is exposed for now. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:14:46 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77577