[PR #14789] [MERGED] mlx: update as of 3/23 #14839

Closed
opened 2026-04-13 01:03:42 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14789
Author: @dhiltgen
Created: 3/12/2026
Status: Merged
Merged: 3/23/2026
Merged by: @dhiltgen

Base: mainHead: mlx_bump


📝 Commits (3)

  • 706dc9d mlx: update to HEAD on 3/23
  • b22896e CUDA Fast Gated Delta kernel
  • a28860f mlx: detect eval errors and panic

📊 Changes

18 files changed (+497 additions, -55 deletions)

View changed files

📝 Dockerfile (+1 -1)
MLX_CORE_VERSION (+0 -1)
MLX_C_VERSION (+1 -0)
📝 MLX_VERSION (+1 -1)
📝 x/imagegen/mlx/CMakeLists.txt (+13 -4)
📝 x/imagegen/mlx/mlx.c (+55 -15)
📝 x/imagegen/mlx/mlx.go (+4 -2)
📝 x/imagegen/mlx/mlx.h (+26 -10)
📝 x/mlxrunner/client.go (+10 -1)
📝 x/mlxrunner/mlx/CMakeLists.txt (+3 -1)
📝 x/mlxrunner/mlx/gated_delta.go (+258 -1)
📝 x/mlxrunner/mlx/generated.c (+16 -2)
📝 x/mlxrunner/mlx/generated.h (+45 -9)
📝 x/mlxrunner/mlx/include/mlx/c/README.md (+1 -1)
📝 x/mlxrunner/mlx/include/mlx/c/distributed_group.h (+4 -2)
📝 x/mlxrunner/mlx/include/mlx/c/ops.h (+8 -0)
📝 x/mlxrunner/mlx/mlx.go (+47 -2)
📝 x/mlxrunner/mlx/ops_extra.go (+4 -2)

📄 Description

Update to HEAD as of Mar 16 along with MLX-C. Also fixes a few misc vendoring bugs uncovered with this first update. This also renames the version files to make them clearer.

Added CUDA Fast Gated Delta kernel.

Performance results qwen 3.5 0.8b Windows 11 RTX5090 (2048 prompt tokens, 128 generate tokens, RTX 5090):

Model Metric Before (fallback) After (CUDA kernel) Speedup
int4-mlx Prefill 529 tok/s 3,376 tok/s 6.4x
int8-mlx Prefill 530 tok/s 3,278 tok/s 6.2x
bf16-mlx Prefill 605 tok/s 24,018 tok/s 39.7x
int4-mlx Generate 121 tok/s 137 tok/s 1.1x
bf16-mlx Generate 170 tok/s 177 tok/s 1.04x
bf16-mlx TTFT 2,741ms 125ms 22x

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14789 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/12/2026 **Status:** ✅ Merged **Merged:** 3/23/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `mlx_bump` --- ### 📝 Commits (3) - [`706dc9d`](https://github.com/ollama/ollama/commit/706dc9d3870814b9b224c8d26f1a9d8e801847e1) mlx: update to HEAD on 3/23 - [`b22896e`](https://github.com/ollama/ollama/commit/b22896e12ee5438221e708b1375cae580e1b7ea5) CUDA Fast Gated Delta kernel - [`a28860f`](https://github.com/ollama/ollama/commit/a28860fb004c179c947334f22a07e6ba21ff3907) mlx: detect eval errors and panic ### 📊 Changes **18 files changed** (+497 additions, -55 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile` (+1 -1) ➖ `MLX_CORE_VERSION` (+0 -1) ➕ `MLX_C_VERSION` (+1 -0) 📝 `MLX_VERSION` (+1 -1) 📝 `x/imagegen/mlx/CMakeLists.txt` (+13 -4) 📝 `x/imagegen/mlx/mlx.c` (+55 -15) 📝 `x/imagegen/mlx/mlx.go` (+4 -2) 📝 `x/imagegen/mlx/mlx.h` (+26 -10) 📝 `x/mlxrunner/client.go` (+10 -1) 📝 `x/mlxrunner/mlx/CMakeLists.txt` (+3 -1) 📝 `x/mlxrunner/mlx/gated_delta.go` (+258 -1) 📝 `x/mlxrunner/mlx/generated.c` (+16 -2) 📝 `x/mlxrunner/mlx/generated.h` (+45 -9) 📝 `x/mlxrunner/mlx/include/mlx/c/README.md` (+1 -1) 📝 `x/mlxrunner/mlx/include/mlx/c/distributed_group.h` (+4 -2) 📝 `x/mlxrunner/mlx/include/mlx/c/ops.h` (+8 -0) 📝 `x/mlxrunner/mlx/mlx.go` (+47 -2) 📝 `x/mlxrunner/mlx/ops_extra.go` (+4 -2) </details> ### 📄 Description Update to HEAD as of Mar 16 along with MLX-C. Also fixes a few misc vendoring bugs uncovered with this first update. This also renames the version files to make them clearer. Added CUDA Fast Gated Delta kernel. **Performance results qwen 3.5 0.8b Windows 11 RTX5090 (2048 prompt tokens, 128 generate tokens, RTX 5090):** | Model | Metric | Before (fallback) | After (CUDA kernel) | Speedup | |---|---|---|---|---| | int4-mlx | Prefill | 529 tok/s | **3,376 tok/s** | **6.4x** | | int8-mlx | Prefill | 530 tok/s | **3,278 tok/s** | **6.2x** | | bf16-mlx | Prefill | 605 tok/s | **24,018 tok/s** | **39.7x** | | int4-mlx | Generate | 121 tok/s | 137 tok/s | 1.1x | | bf16-mlx | Generate | 170 tok/s | 177 tok/s | 1.04x | | bf16-mlx | TTFT | 2,741ms | **125ms** | **22x** | --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:03:42 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14839