[PR #15736] mlxrunner: batch the sampler across multiple sequences #41156

Open
opened 2026-04-23 01:52:47 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15736
Author: @jessegross
Created: 4/21/2026
Status: 🔄 Open

Base: mainHead: jessegross/sampler-batch


📝 Commits (2)

  • 5264ba9 mlxrunner: track sampler history in a fixed-size ring buffer
  • a50199c mlxrunner: batch the sampler across multiple sequences

📊 Changes

8 files changed (+759 additions, -212 deletions)

View changed files

📝 x/mlxrunner/mlx/ops.go (+7 -3)
📝 x/mlxrunner/mlx/ops_extra.go (+3 -0)
📝 x/mlxrunner/pipeline.go (+36 -18)
📝 x/mlxrunner/runner.go (+5 -3)
📝 x/mlxrunner/sample/logprob_test.go (+39 -2)
📝 x/mlxrunner/sample/sample.go (+415 -122)
📝 x/mlxrunner/sample/sample_test.go (+252 -62)
📝 x/mlxrunner/server.go (+2 -2)

📄 Description

Register sequences with Add/Remove; each Sample call takes any subset of registered slots and samples one token per row, appending to each slot's ring-buffer history. When all slots share Options and penalty rings are full, one fused transform pass runs over the whole batch via a persistent pooled history tensor; otherwise calls fall back to per-slot serial processing indexed against the same pool.

Performance is unchanged for a single sequence, which is all that is exposed for now.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15736 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/21/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `jessegross/sampler-batch` --- ### 📝 Commits (2) - [`5264ba9`](https://github.com/ollama/ollama/commit/5264ba9194e12f9d35ba98a3881b9556b857625a) mlxrunner: track sampler history in a fixed-size ring buffer - [`a50199c`](https://github.com/ollama/ollama/commit/a50199cd70b0cfd4b404b3aabdf886c9dec5acd8) mlxrunner: batch the sampler across multiple sequences ### 📊 Changes **8 files changed** (+759 additions, -212 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/mlx/ops.go` (+7 -3) 📝 `x/mlxrunner/mlx/ops_extra.go` (+3 -0) 📝 `x/mlxrunner/pipeline.go` (+36 -18) 📝 `x/mlxrunner/runner.go` (+5 -3) 📝 `x/mlxrunner/sample/logprob_test.go` (+39 -2) 📝 `x/mlxrunner/sample/sample.go` (+415 -122) 📝 `x/mlxrunner/sample/sample_test.go` (+252 -62) 📝 `x/mlxrunner/server.go` (+2 -2) </details> ### 📄 Description Register sequences with Add/Remove; each Sample call takes any subset of registered slots and samples one token per row, appending to each slot's ring-buffer history. When all slots share Options and penalty rings are full, one fused transform pass runs over the whole batch via a persistent pooled history tensor; otherwise calls fall back to per-slot serial processing indexed against the same pool. Performance is unchanged for a single sequence, which is all that is exposed for now. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:52:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#41156