[PR #14768] [MERGED] mlx: perf improvements #20096

Closed
opened 2026-04-16 07:26:28 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14768
Author: @dhiltgen
Created: 3/10/2026
Status: Merged
Merged: 3/12/2026
Merged by: @dhiltgen

Base: mainHead: mlx_perf


📝 Commits (2)

📊 Changes

7 files changed (+185 additions, -37 deletions)

View changed files

📝 x/imagegen/nn/nn_test.go (+5 -5)
📝 x/mlxrunner/mlx/dynamic.go (+16 -0)
📝 x/mlxrunner/mlx/ops_extra.go (+13 -0)
📝 x/models/gemma3/gemma3.go (+2 -6)
📝 x/models/llama/llama.go (+2 -6)
📝 x/models/nn/nn.go (+1 -20)
x/models/nn/nn_test.go (+146 -0)

📄 Description

The fix in nn.go doesn't currently have any models that exercise it, so I added a unit test to ensure correctness.

benchstat tmp/before.txt tmp/after.txt
goos: darwin
goarch: arm64
                                             │ tmp/before.txt │           tmp/after.txt            │
                                             │   sec/token    │  sec/token   vs base               │
Model/name=gemma3-4b:bf16-mlx/step=prefill        609.0µ ± 0%   607.9µ ± 0%   -0.19% (p=0.026 n=6)
Model/name=gemma3-4b:bf16-mlx/step=generate       22.10m ± 0%   22.21m ± 0%   +0.48% (p=0.002 n=6)
Model/name=gemma3-4b:int4-mlx/step=prefill        775.4µ ± 6%   737.1µ ± 3%   -4.95% (p=0.004 n=6)
Model/name=gemma3-4b:int4-mlx/step=generate       15.88m ± 1%   13.52m ± 0%  -14.88% (p=0.002 n=6)
Model/name=llama31-8b:bf16-mlx/step=prefill       1.332m ± 2%   1.346m ± 6%        ~ (p=0.240 n=6)
Model/name=llama31-8b:bf16-mlx/step=generate      48.34m ± 1%   42.71m ± 0%  -11.64% (p=0.002 n=6)
Model/name=llama31-8b:int4-mlx/step=prefill       1.387m ± 5%   1.366m ± 2%        ~ (p=0.132 n=6)
Model/name=llama31-8b:int4-mlx/step=generate      23.00m ± 2%   16.46m ± 1%  -28.42% (p=0.002 n=6)
geomean                                           4.915m        4.519m        -8.06%

                                             │ tmp/before.txt │           tmp/after.txt            │
                                             │   token/sec    │  token/sec   vs base               │
Model/name=gemma3-4b:bf16-mlx/step=prefill        1.642k ± 0%   1.645k ± 0%   +0.19% (p=0.022 n=6)
Model/name=gemma3-4b:bf16-mlx/step=generate        45.25 ± 0%    45.03 ± 0%   -0.48% (p=0.002 n=6)
Model/name=gemma3-4b:int4-mlx/step=prefill        1.290k ± 5%   1.357k ± 3%   +5.21% (p=0.004 n=6)
Model/name=gemma3-4b:int4-mlx/step=generate        62.98 ± 1%    73.98 ± 0%  +17.48% (p=0.002 n=6)
Model/name=llama31-8b:bf16-mlx/step=prefill        751.0 ± 2%    743.0 ± 5%        ~ (p=0.240 n=6)
Model/name=llama31-8b:bf16-mlx/step=generate       20.69 ± 1%    23.41 ± 0%  +13.17% (p=0.002 n=6)
Model/name=llama31-8b:int4-mlx/step=prefill        720.9 ± 4%    732.2 ± 2%        ~ (p=0.132 n=6)
Model/name=llama31-8b:int4-mlx/step=generate       43.48 ± 2%    60.75 ± 1%  +39.70% (p=0.002 n=6)
geomean                                            203.5         221.3        +8.76%

                                          │ tmp/before.txt │            tmp/after.txt            │
                                          │     sec/op     │    sec/op     vs base               │
Model/name=gemma3-4b:bf16-mlx/step=ttft         1.158 ± 1%    1.161 ±  0%        ~ (p=0.485 n=6)
Model/name=gemma3-4b:bf16-mlx/step=load        188.9m ± 4%   190.4m ±  4%        ~ (p=0.485 n=6)
Model/name=gemma3-4b:bf16-mlx/step=total        4.225 ± 0%    4.280 ±  0%   +1.30% (p=0.002 n=6)
Model/name=gemma3-4b:int4-mlx/step=ttft         1.273 ± 5%    1.206 ±  3%   -5.30% (p=0.004 n=6)
Model/name=gemma3-4b:int4-mlx/step=load        43.17m ± 5%   38.02m ± 10%  -11.91% (p=0.002 n=6)
Model/name=gemma3-4b:int4-mlx/step=total        3.308 ± 2%    2.933 ±  1%  -11.34% (p=0.002 n=6)
Model/name=llama31-8b:bf16-mlx/step=ttft        2.200 ± 2%    2.188 ±  6%        ~ (p=1.000 n=6)
Model/name=llama31-8b:bf16-mlx/step=load       28.62m ± 7%   26.00m ± 16%   -9.17% (p=0.041 n=6)
Model/name=llama31-8b:bf16-mlx/step=total       8.406 ± 1%    7.654 ±  2%   -8.94% (p=0.002 n=6)
Model/name=llama31-8b:int4-mlx/step=ttft        2.415 ± 4%    2.224 ±  2%   -7.94% (p=0.002 n=6)
Model/name=llama31-8b:int4-mlx/step=load       28.01m ± 3%   31.96m ± 21%        ~ (p=0.065 n=6)
Model/name=llama31-8b:int4-mlx/step=total       5.351 ± 2%    4.334 ±  1%  -19.02% (p=0.002 n=6)
geomean                                        751.0m        712.3m         -5.15%

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14768 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/10/2026 **Status:** ✅ Merged **Merged:** 3/12/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `mlx_perf` --- ### 📝 Commits (2) - [`66bf820`](https://github.com/ollama/ollama/commit/66bf820c023db54c08bd0a6f8cea2e645a5398b5) mlx: perf improvements - [`2087381`](https://github.com/ollama/ollama/commit/2087381d86bcfbe67e7944b2af6d1232b6c5fa61) review comments ### 📊 Changes **7 files changed** (+185 additions, -37 deletions) <details> <summary>View changed files</summary> 📝 `x/imagegen/nn/nn_test.go` (+5 -5) 📝 `x/mlxrunner/mlx/dynamic.go` (+16 -0) 📝 `x/mlxrunner/mlx/ops_extra.go` (+13 -0) 📝 `x/models/gemma3/gemma3.go` (+2 -6) 📝 `x/models/llama/llama.go` (+2 -6) 📝 `x/models/nn/nn.go` (+1 -20) ➕ `x/models/nn/nn_test.go` (+146 -0) </details> ### 📄 Description The fix in nn.go doesn't currently have any models that exercise it, so I added a unit test to ensure correctness. ``` benchstat tmp/before.txt tmp/after.txt goos: darwin goarch: arm64 │ tmp/before.txt │ tmp/after.txt │ │ sec/token │ sec/token vs base │ Model/name=gemma3-4b:bf16-mlx/step=prefill 609.0µ ± 0% 607.9µ ± 0% -0.19% (p=0.026 n=6) Model/name=gemma3-4b:bf16-mlx/step=generate 22.10m ± 0% 22.21m ± 0% +0.48% (p=0.002 n=6) Model/name=gemma3-4b:int4-mlx/step=prefill 775.4µ ± 6% 737.1µ ± 3% -4.95% (p=0.004 n=6) Model/name=gemma3-4b:int4-mlx/step=generate 15.88m ± 1% 13.52m ± 0% -14.88% (p=0.002 n=6) Model/name=llama31-8b:bf16-mlx/step=prefill 1.332m ± 2% 1.346m ± 6% ~ (p=0.240 n=6) Model/name=llama31-8b:bf16-mlx/step=generate 48.34m ± 1% 42.71m ± 0% -11.64% (p=0.002 n=6) Model/name=llama31-8b:int4-mlx/step=prefill 1.387m ± 5% 1.366m ± 2% ~ (p=0.132 n=6) Model/name=llama31-8b:int4-mlx/step=generate 23.00m ± 2% 16.46m ± 1% -28.42% (p=0.002 n=6) geomean 4.915m 4.519m -8.06% │ tmp/before.txt │ tmp/after.txt │ │ token/sec │ token/sec vs base │ Model/name=gemma3-4b:bf16-mlx/step=prefill 1.642k ± 0% 1.645k ± 0% +0.19% (p=0.022 n=6) Model/name=gemma3-4b:bf16-mlx/step=generate 45.25 ± 0% 45.03 ± 0% -0.48% (p=0.002 n=6) Model/name=gemma3-4b:int4-mlx/step=prefill 1.290k ± 5% 1.357k ± 3% +5.21% (p=0.004 n=6) Model/name=gemma3-4b:int4-mlx/step=generate 62.98 ± 1% 73.98 ± 0% +17.48% (p=0.002 n=6) Model/name=llama31-8b:bf16-mlx/step=prefill 751.0 ± 2% 743.0 ± 5% ~ (p=0.240 n=6) Model/name=llama31-8b:bf16-mlx/step=generate 20.69 ± 1% 23.41 ± 0% +13.17% (p=0.002 n=6) Model/name=llama31-8b:int4-mlx/step=prefill 720.9 ± 4% 732.2 ± 2% ~ (p=0.132 n=6) Model/name=llama31-8b:int4-mlx/step=generate 43.48 ± 2% 60.75 ± 1% +39.70% (p=0.002 n=6) geomean 203.5 221.3 +8.76% │ tmp/before.txt │ tmp/after.txt │ │ sec/op │ sec/op vs base │ Model/name=gemma3-4b:bf16-mlx/step=ttft 1.158 ± 1% 1.161 ± 0% ~ (p=0.485 n=6) Model/name=gemma3-4b:bf16-mlx/step=load 188.9m ± 4% 190.4m ± 4% ~ (p=0.485 n=6) Model/name=gemma3-4b:bf16-mlx/step=total 4.225 ± 0% 4.280 ± 0% +1.30% (p=0.002 n=6) Model/name=gemma3-4b:int4-mlx/step=ttft 1.273 ± 5% 1.206 ± 3% -5.30% (p=0.004 n=6) Model/name=gemma3-4b:int4-mlx/step=load 43.17m ± 5% 38.02m ± 10% -11.91% (p=0.002 n=6) Model/name=gemma3-4b:int4-mlx/step=total 3.308 ± 2% 2.933 ± 1% -11.34% (p=0.002 n=6) Model/name=llama31-8b:bf16-mlx/step=ttft 2.200 ± 2% 2.188 ± 6% ~ (p=1.000 n=6) Model/name=llama31-8b:bf16-mlx/step=load 28.62m ± 7% 26.00m ± 16% -9.17% (p=0.041 n=6) Model/name=llama31-8b:bf16-mlx/step=total 8.406 ± 1% 7.654 ± 2% -8.94% (p=0.002 n=6) Model/name=llama31-8b:int4-mlx/step=ttft 2.415 ± 4% 2.224 ± 2% -7.94% (p=0.002 n=6) Model/name=llama31-8b:int4-mlx/step=load 28.01m ± 3% 31.96m ± 21% ~ (p=0.065 n=6) Model/name=llama31-8b:int4-mlx/step=total 5.351 ± 2% 4.334 ± 1% -19.02% (p=0.002 n=6) geomean 751.0m 712.3m -5.15% ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:26:28 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#20096