[PR #15658] [MERGED] MLX Sampler Improvements #41125

Closed
opened 2026-04-23 01:51:20 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15658
Author: @jessegross
Created: 4/17/2026
Status: Merged
Merged: 4/21/2026
Merged by: @jessegross

Base: mainHead: jessegross/sampler


📝 Commits (3)

  • 1e43d7f mlxrunner: add logprobs support
  • ea677b2 mlxrunner: use MaxAxis in the min-P sampler
  • 00a2cc3 mlxrunner: fuse top-P and top-K into a single sort pass

📊 Changes

12 files changed (+501 additions, -174 deletions)

View changed files

📝 integration/api_test.go (+0 -8)
📝 x/mlxrunner/client.go (+13 -29)
📝 x/mlxrunner/mlx/array.go (+6 -0)
📝 x/mlxrunner/mlx/ops.go (+6 -0)
📝 x/mlxrunner/pipeline.go (+78 -36)
📝 x/mlxrunner/runner.go (+1 -19)
x/mlxrunner/sample/logprob_test.go (+249 -0)
📝 x/mlxrunner/sample/sample.go (+124 -58)
📝 x/mlxrunner/sample/sample_test.go (+7 -8)
📝 x/mlxrunner/server.go (+13 -14)
📝 x/models/gemma4/gemma4_moe_test.go (+2 -0)
📝 x/models/nn/nn_test.go (+2 -2)

📄 Description

This makes several related improvements to the sampler:

  • Support logprobs in the MLX runner, connected to the same external interface as other runners
  • Avoid multiple sorts in the common case of top P and top K sampler filters both being used - improves overall generation throughput by ~1.5% with gemma4
  • Slight optimization of min P sampler filter

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15658 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/17/2026 **Status:** ✅ Merged **Merged:** 4/21/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/sampler` --- ### 📝 Commits (3) - [`1e43d7f`](https://github.com/ollama/ollama/commit/1e43d7fff96567cb042c9a5cae09373a77e4de69) mlxrunner: add logprobs support - [`ea677b2`](https://github.com/ollama/ollama/commit/ea677b268e34f4acc9dac61148f9a12346270a07) mlxrunner: use MaxAxis in the min-P sampler - [`00a2cc3`](https://github.com/ollama/ollama/commit/00a2cc39bd65cdf7b84d9d53c3938a9e6a0a3436) mlxrunner: fuse top-P and top-K into a single sort pass ### 📊 Changes **12 files changed** (+501 additions, -174 deletions) <details> <summary>View changed files</summary> 📝 `integration/api_test.go` (+0 -8) 📝 `x/mlxrunner/client.go` (+13 -29) 📝 `x/mlxrunner/mlx/array.go` (+6 -0) 📝 `x/mlxrunner/mlx/ops.go` (+6 -0) 📝 `x/mlxrunner/pipeline.go` (+78 -36) 📝 `x/mlxrunner/runner.go` (+1 -19) ➕ `x/mlxrunner/sample/logprob_test.go` (+249 -0) 📝 `x/mlxrunner/sample/sample.go` (+124 -58) 📝 `x/mlxrunner/sample/sample_test.go` (+7 -8) 📝 `x/mlxrunner/server.go` (+13 -14) 📝 `x/models/gemma4/gemma4_moe_test.go` (+2 -0) 📝 `x/models/nn/nn_test.go` (+2 -2) </details> ### 📄 Description This makes several related improvements to the sampler: - Support logprobs in the MLX runner, connected to the same external interface as other runners - Avoid multiple sorts in the common case of top P and top K sampler filters both being used - improves overall generation throughput by ~1.5% with gemma4 - Slight optimization of min P sampler filter --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:51:20 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#41125