[PR #15793] mlx: update to 0.31.2 #46509

Open
opened 2026-04-25 01:55:43 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15793
Author: @pd95
Created: 4/24/2026
Status: 🔄 Open

Base: mainHead: codex/mlx-0.31.2-upgrade


📝 Commits (2)

  • 75f395c mlx: test addmm broadcast bias
  • bdd7fdd mlx: update to 0.31.2

📊 Changes

19 files changed (+1341 additions, -126 deletions)

View changed files

📝 MLX_C_VERSION (+1 -1)
📝 MLX_VERSION (+1 -1)
📝 x/imagegen/mlx/mlx.c (+339 -49)
📝 x/imagegen/mlx/mlx.h (+146 -30)
📝 x/mlxrunner/mlx/dynamic.go (+11 -0)
📝 x/mlxrunner/mlx/generated.c (+158 -5)
📝 x/mlxrunner/mlx/generated.h (+391 -25)
📝 x/mlxrunner/mlx/include/mlx/c/compile.h (+1 -0)
📝 x/mlxrunner/mlx/include/mlx/c/distributed_group.h (+23 -9)
📝 x/mlxrunner/mlx/include/mlx/c/fft.h (+20 -0)
x/mlxrunner/mlx/include/mlx/c/graph_utils.h (+61 -0)
📝 x/mlxrunner/mlx/include/mlx/c/io.h (+5 -0)
📝 x/mlxrunner/mlx/include/mlx/c/io_types.h (+46 -0)
📝 x/mlxrunner/mlx/include/mlx/c/mlx.h (+1 -0)
📝 x/mlxrunner/mlx/include/mlx/c/ops.h (+44 -0)
📝 x/mlxrunner/mlx/mlx.go (+1 -4)
x/mlxrunner/mlx/ops_test.go (+87 -0)
📝 x/mlxrunner/mlx/stream.go (+4 -2)
📝 x/mlxrunner/runner.go (+1 -0)

📄 Description

Note: This is my first contribution to Ollama. I used Codex to help investigate the MLX addmm issue, create
the focused regression test, and prepare this PR. I manually reproduced the fail/pass behavior on
macOS before marking this ready for review.

Summary

Updates the vendored MLX/MLX-C pins to MLX 0.31.2 and refreshes the generated C bindings used by the MLX runner.

This pulls in upstream MLX fixes, notably the NAX AddMM fix from ml-explore/mlx#3422, which is included in MLX v0.31.2.

This PR also adds an Ollama-side regression test for the observed addmm broadcast-bias failure. The commits are intentionally ordered so reviewers can reproduce the bug first, then verify that the MLX upgrade
fixes it:

  1. mlx: test addmm broadcast bias
  2. mlx: update to 0.31.2

Why this update is needed

Ollama currently exposes and uses MLX addmm for biased linear layers, so the upstream MLX AddMM fix is directly relevant to the runner rather than only a general dependency refresh.

The current call path is:

  • x/models/nn/nn.go: Linear.Forward uses l.Bias.Addmm(x, w, 1.0, 1.0) for dense linear layers with bias.
  • x/mlxrunner/mlx/nn.go: the MLX runner Linear.Forward also uses m.Bias.Addmm(x, w, 1.0, 1.0).
  • x/mlxrunner/mlx/ops.go: Array.Addmm maps that Go call to C.mlx_addmm(...).
  • x/imagegen/mlx/mlx.go: imagegen also exposes AddMM, which calls C.mlx_addmm(...).

Without this update, those biased MLX linear paths can drop the broadcast bias for batched inputs. Updating MLX/MLX-C pulls in the upstream fix at the dependency layer, preserving Ollama’s existing AddMM call
sites instead of carrying an Ollama-side matmul + add workaround.

Details

Pin updates:

  • MLX_VERSION: 38ad257088fb2193ad47e527cf6534a689f30943 -> 68cf2fddd8de5edd8ab3d926391772b2e2cedad8
  • MLX_C_VERSION: 0726ca922fc902c4c61ef9c27d94132be418e945 -> fba4470b89073180056c9ea46c443051375f7399

The CMake MLX build regenerated the vendored MLX-C headers and Go binding files under:

  • x/mlxrunner/mlx/generated.*
  • x/mlxrunner/mlx/include/mlx/c/*
  • x/imagegen/mlx/mlx.*

MLX 0.31 also makes default streams and Metal command encoders thread-local. Since Go goroutines can move between OS threads, a lazy MLX graph can otherwise be built using one thread’s stream and later evaluated
on another thread. That can fail with errors like:

mlx: There is no Stream(gpu, N) in current thread

To match MLX 0.31’s thread-local stream model, this PR updates the Go wrapper so MLX work pins the current goroutine to its OS thread and resolves the default stream per thread instead of caching it globally.

The OS-thread pin is intentionally not paired with runtime.UnlockOSThread(): MLX lazy graph construction and later evaluation need to stay on the same OS thread for the lifetime of that MLX goroutine.

Regression Test

Adds:

go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v

The test uses zero input to isolate the broadcast-bias term. On the old MLX version, bias.Addmm(x, w, 1, 1) matches matmul-only, proving the bias is dropped. With MLX 0.31.2, addmm matches matmul + add.

Reviewer reproduction flow:

git checkout <test-commit>
./scripts/build_darwin.sh -a arm64 build
go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v

Expected old behavior:

max diff addmm vs matmul+add: 3.000000
max diff addmm vs matmul-only: 0.000000

Then:

git checkout <mlx-upgrade-commit>
./scripts/build_darwin.sh -a arm64 build
go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v

Expected new behavior:

max diff addmm vs matmul+add: 0.000000
max diff addmm vs matmul-only: 3.000000

Validation

  • Added TestAddmmBroadcastBiasMatchesMatmulAdd.
  • Verified the test fails on the pre-MLX-0.31.2 commit.
  • Verified the test passes after the MLX 0.31.2 update.
  • In the Linux container, the same targeted go test command passes by skipping MLX runtime because the macOS MLX dylib is unavailable.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15793 **Author:** [@pd95](https://github.com/pd95) **Created:** 4/24/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `codex/mlx-0.31.2-upgrade` --- ### 📝 Commits (2) - [`75f395c`](https://github.com/ollama/ollama/commit/75f395c2c11e4adab54f46e12ab56a0f7f010c53) mlx: test addmm broadcast bias - [`bdd7fdd`](https://github.com/ollama/ollama/commit/bdd7fdd171be290099d368aacb747f9a1241299a) mlx: update to 0.31.2 ### 📊 Changes **19 files changed** (+1341 additions, -126 deletions) <details> <summary>View changed files</summary> 📝 `MLX_C_VERSION` (+1 -1) 📝 `MLX_VERSION` (+1 -1) 📝 `x/imagegen/mlx/mlx.c` (+339 -49) 📝 `x/imagegen/mlx/mlx.h` (+146 -30) 📝 `x/mlxrunner/mlx/dynamic.go` (+11 -0) 📝 `x/mlxrunner/mlx/generated.c` (+158 -5) 📝 `x/mlxrunner/mlx/generated.h` (+391 -25) 📝 `x/mlxrunner/mlx/include/mlx/c/compile.h` (+1 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/distributed_group.h` (+23 -9) 📝 `x/mlxrunner/mlx/include/mlx/c/fft.h` (+20 -0) ➕ `x/mlxrunner/mlx/include/mlx/c/graph_utils.h` (+61 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/io.h` (+5 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/io_types.h` (+46 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/mlx.h` (+1 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/ops.h` (+44 -0) 📝 `x/mlxrunner/mlx/mlx.go` (+1 -4) ➕ `x/mlxrunner/mlx/ops_test.go` (+87 -0) 📝 `x/mlxrunner/mlx/stream.go` (+4 -2) 📝 `x/mlxrunner/runner.go` (+1 -0) </details> ### 📄 Description > Note: This is my first contribution to Ollama. I used Codex to help investigate the MLX `addmm` issue, create > the focused regression test, and prepare this PR. I manually reproduced the fail/pass behavior on > macOS before marking this ready for review. ## Summary Updates the vendored MLX/MLX-C pins to MLX 0.31.2 and refreshes the generated C bindings used by the MLX runner. This pulls in upstream MLX fixes, notably the NAX AddMM fix from ml-explore/mlx#3422, which is included in MLX v0.31.2. This PR also adds an Ollama-side regression test for the observed `addmm` broadcast-bias failure. The commits are intentionally ordered so reviewers can reproduce the bug first, then verify that the MLX upgrade fixes it: 1. `mlx: test addmm broadcast bias` 2. `mlx: update to 0.31.2` ## Why this update is needed Ollama currently exposes and uses MLX `addmm` for biased linear layers, so the upstream MLX AddMM fix is directly relevant to the runner rather than only a general dependency refresh. The current call path is: - `x/models/nn/nn.go`: `Linear.Forward` uses `l.Bias.Addmm(x, w, 1.0, 1.0)` for dense linear layers with bias. - `x/mlxrunner/mlx/nn.go`: the MLX runner `Linear.Forward` also uses `m.Bias.Addmm(x, w, 1.0, 1.0)`. - `x/mlxrunner/mlx/ops.go`: `Array.Addmm` maps that Go call to `C.mlx_addmm(...)`. - `x/imagegen/mlx/mlx.go`: imagegen also exposes `AddMM`, which calls `C.mlx_addmm(...)`. Without this update, those biased MLX linear paths can drop the broadcast bias for batched inputs. Updating MLX/MLX-C pulls in the upstream fix at the dependency layer, preserving Ollama’s existing AddMM call sites instead of carrying an Ollama-side `matmul + add` workaround. ## Details Pin updates: - `MLX_VERSION`: `38ad257088fb2193ad47e527cf6534a689f30943` -> `68cf2fddd8de5edd8ab3d926391772b2e2cedad8` - `MLX_C_VERSION`: `0726ca922fc902c4c61ef9c27d94132be418e945` -> `fba4470b89073180056c9ea46c443051375f7399` The CMake MLX build regenerated the vendored MLX-C headers and Go binding files under: - `x/mlxrunner/mlx/generated.*` - `x/mlxrunner/mlx/include/mlx/c/*` - `x/imagegen/mlx/mlx.*` MLX 0.31 also makes default streams and Metal command encoders thread-local. Since Go goroutines can move between OS threads, a lazy MLX graph can otherwise be built using one thread’s stream and later evaluated on another thread. That can fail with errors like: ```text mlx: There is no Stream(gpu, N) in current thread ``` To match MLX 0.31’s thread-local stream model, this PR updates the Go wrapper so MLX work pins the current goroutine to its OS thread and resolves the default stream per thread instead of caching it globally. The OS-thread pin is intentionally not paired with runtime.UnlockOSThread(): MLX lazy graph construction and later evaluation need to stay on the same OS thread for the lifetime of that MLX goroutine. ## Regression Test Adds: ```sh go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v ``` The test uses zero input to isolate the broadcast-bias term. On the old MLX version, bias.Addmm(x, w, 1, 1) matches matmul-only, proving the bias is dropped. With MLX 0.31.2, addmm matches matmul + add. Reviewer reproduction flow: ```sh git checkout <test-commit> ./scripts/build_darwin.sh -a arm64 build go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v ``` Expected old behavior: ```text max diff addmm vs matmul+add: 3.000000 max diff addmm vs matmul-only: 0.000000 ``` Then: ```sh git checkout <mlx-upgrade-commit> ./scripts/build_darwin.sh -a arm64 build go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v ``` Expected new behavior: ```text max diff addmm vs matmul+add: 0.000000 max diff addmm vs matmul-only: 3.000000 ``` ## Validation - Added TestAddmmBroadcastBiasMatchesMatmulAdd. - Verified the test fails on the pre-MLX-0.31.2 commit. - Verified the test passes after the MLX 0.31.2 update. - In the Linux container, the same targeted go test command passes by skipping MLX runtime because the macOS MLX dylib is unavailable. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:55:43 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46509