[PR #15793] [CLOSED] mlx: update to 0.31.2 #77605

Closed
opened 2026-05-05 10:16:35 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15793
Author: @pd95
Created: 4/24/2026
Status: Closed

Base: mainHead: codex/mlx-0.31.2-upgrade


📝 Commits (3)

  • 75f395c mlx: test addmm broadcast bias
  • bdd7fdd mlx: update to 0.31.2
  • aca02ae darwin: package standalone MLX JACCL library

📊 Changes

21 files changed (+1353 additions, -128 deletions)

View changed files

📝 CMakeLists.txt (+8 -0)
📝 MLX_C_VERSION (+1 -1)
📝 MLX_VERSION (+1 -1)
📝 scripts/build_darwin.sh (+4 -2)
📝 x/imagegen/mlx/mlx.c (+339 -49)
📝 x/imagegen/mlx/mlx.h (+146 -30)
📝 x/mlxrunner/mlx/dynamic.go (+11 -0)
📝 x/mlxrunner/mlx/generated.c (+158 -5)
📝 x/mlxrunner/mlx/generated.h (+391 -25)
📝 x/mlxrunner/mlx/include/mlx/c/compile.h (+1 -0)
📝 x/mlxrunner/mlx/include/mlx/c/distributed_group.h (+23 -9)
📝 x/mlxrunner/mlx/include/mlx/c/fft.h (+20 -0)
x/mlxrunner/mlx/include/mlx/c/graph_utils.h (+61 -0)
📝 x/mlxrunner/mlx/include/mlx/c/io.h (+5 -0)
📝 x/mlxrunner/mlx/include/mlx/c/io_types.h (+46 -0)
📝 x/mlxrunner/mlx/include/mlx/c/mlx.h (+1 -0)
📝 x/mlxrunner/mlx/include/mlx/c/ops.h (+44 -0)
📝 x/mlxrunner/mlx/mlx.go (+1 -4)
x/mlxrunner/mlx/ops_test.go (+87 -0)
📝 x/mlxrunner/mlx/stream.go (+4 -2)

...and 1 more files

📄 Description

Note: This is my first contribution to Ollama. I used Codex to help investigate the MLX addmm issue, create
the focused regression test, and prepare this PR. I manually reproduced the fail/pass behavior on
macOS before marking this ready for review.

Summary

Updates the vendored MLX/MLX-C pins to MLX 0.31.2 and refreshes the generated C bindings used by the MLX runner.

This pulls in upstream MLX fixes, notably the NAX AddMM fix from ml-explore/mlx#3422, which is included in MLX v0.31.2.

This PR also adds an Ollama-side regression test for the observed addmm broadcast-bias failure. The commits are intentionally ordered so reviewers can reproduce the bug first, then verify that the MLX upgrade
fixes it:

  1. mlx: test addmm broadcast bias
  2. mlx: update to 0.31.2

Why this update is needed

Ollama currently exposes and uses MLX addmm for biased linear layers, so the upstream MLX AddMM fix is directly relevant to the runner rather than only a general dependency refresh.

The current call path is:

  • x/models/nn/nn.go: Linear.Forward uses l.Bias.Addmm(x, w, 1.0, 1.0) for dense linear layers with bias.
  • x/mlxrunner/mlx/nn.go: the MLX runner Linear.Forward also uses m.Bias.Addmm(x, w, 1.0, 1.0).
  • x/mlxrunner/mlx/ops.go: Array.Addmm maps that Go call to C.mlx_addmm(...).
  • x/imagegen/mlx/mlx.go: imagegen also exposes AddMM, which calls C.mlx_addmm(...).

Without this update, those biased MLX linear paths can drop the broadcast bias for batched inputs. Updating MLX/MLX-C pulls in the upstream fix at the dependency layer, preserving Ollama’s existing AddMM call
sites instead of carrying an Ollama-side matmul + add workaround.

Details

Pin updates:

  • MLX_VERSION: 38ad257088fb2193ad47e527cf6534a689f30943 -> 68cf2fddd8de5edd8ab3d926391772b2e2cedad8
  • MLX_C_VERSION: 0726ca922fc902c4c61ef9c27d94132be418e945 -> fba4470b89073180056c9ea46c443051375f7399

The CMake MLX build regenerated the vendored MLX-C headers and Go binding files under:

  • x/mlxrunner/mlx/generated.*
  • x/mlxrunner/mlx/include/mlx/c/*
  • x/imagegen/mlx/mlx.*

MLX 0.31 also makes default streams and Metal command encoders thread-local. Since Go goroutines can move between OS threads, a lazy MLX graph can otherwise be built using one thread’s stream and later evaluated
on another thread. That can fail with errors like:

mlx: There is no Stream(gpu, N) in current thread

To match MLX 0.31’s thread-local stream model, this PR updates the Go wrapper so MLX work pins the current goroutine to its OS thread and resolves the default stream per thread instead of caching it globally.

The OS-thread pin is intentionally not paired with runtime.UnlockOSThread(): MLX lazy graph construction and later evaluation need to stay on the same OS thread for the lifetime of that MLX goroutine.

Update: macOS app packaging note

Update after app-bundle testing: MLX 0.31.2 also changes the runtime library layout: JACCL is now a standalone library. On macOS arm64, libmlx.dylib depends on @rpath/libjaccl.dylib.

This matters for the signed Ollama.app bundle. The app could package libmlx.dylib, libmlxc.dylib, and mlx.metallib, but still fail to load MLX at runtime if libjaccl.dylib was not
installed into the same mlx_metal_v* resource directory.

Observed failure mode:

MLX not available: failed to load MLX dynamic library

with logs showing libmlxc.dylib was found under:

/Applications/Ollama.app/Contents/Resources/mlx_metal_v3/libmlxc.dylib

The fix installs the jaccl target as part of the MLX CMake install component and ensures the app packaging step carries auxiliary MLX runtime dylibs, such as libjaccl.dylib, into Contents/
Resources/mlx_metal_v*/.

Regression Test

Adds:

go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v

The test uses zero input to isolate the broadcast-bias term. On the old MLX version, bias.Addmm(x, w, 1, 1) matches matmul-only, proving the bias is dropped. With MLX 0.31.2, addmm matches matmul + add.

Reviewer reproduction flow:

git checkout <test-commit>
./scripts/build_darwin.sh -a arm64 build
go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v

Expected old behavior:

max diff addmm vs matmul+add: 3.000000
max diff addmm vs matmul-only: 0.000000

Then:

git checkout <mlx-upgrade-commit>
./scripts/build_darwin.sh -a arm64 build
go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v

Expected new behavior:

max diff addmm vs matmul+add: 0.000000
max diff addmm vs matmul-only: 3.000000

Validation

  • Added TestAddmmBroadcastBiasMatchesMatmulAdd.
  • Verified the test fails on the pre-MLX-0.31.2 commit.
  • Verified the test passes after the MLX 0.31.2 update.
  • In the Linux container, the same targeted go test command passes by skipping MLX runtime because the macOS MLX dylib is unavailable.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15793 **Author:** [@pd95](https://github.com/pd95) **Created:** 4/24/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `codex/mlx-0.31.2-upgrade` --- ### 📝 Commits (3) - [`75f395c`](https://github.com/ollama/ollama/commit/75f395c2c11e4adab54f46e12ab56a0f7f010c53) mlx: test addmm broadcast bias - [`bdd7fdd`](https://github.com/ollama/ollama/commit/bdd7fdd171be290099d368aacb747f9a1241299a) mlx: update to 0.31.2 - [`aca02ae`](https://github.com/ollama/ollama/commit/aca02ae7eaee279319f2be6eba542d3869426af3) darwin: package standalone MLX JACCL library ### 📊 Changes **21 files changed** (+1353 additions, -128 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+8 -0) 📝 `MLX_C_VERSION` (+1 -1) 📝 `MLX_VERSION` (+1 -1) 📝 `scripts/build_darwin.sh` (+4 -2) 📝 `x/imagegen/mlx/mlx.c` (+339 -49) 📝 `x/imagegen/mlx/mlx.h` (+146 -30) 📝 `x/mlxrunner/mlx/dynamic.go` (+11 -0) 📝 `x/mlxrunner/mlx/generated.c` (+158 -5) 📝 `x/mlxrunner/mlx/generated.h` (+391 -25) 📝 `x/mlxrunner/mlx/include/mlx/c/compile.h` (+1 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/distributed_group.h` (+23 -9) 📝 `x/mlxrunner/mlx/include/mlx/c/fft.h` (+20 -0) ➕ `x/mlxrunner/mlx/include/mlx/c/graph_utils.h` (+61 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/io.h` (+5 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/io_types.h` (+46 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/mlx.h` (+1 -0) 📝 `x/mlxrunner/mlx/include/mlx/c/ops.h` (+44 -0) 📝 `x/mlxrunner/mlx/mlx.go` (+1 -4) ➕ `x/mlxrunner/mlx/ops_test.go` (+87 -0) 📝 `x/mlxrunner/mlx/stream.go` (+4 -2) _...and 1 more files_ </details> ### 📄 Description > Note: This is my first contribution to Ollama. I used Codex to help investigate the MLX `addmm` issue, create > the focused regression test, and prepare this PR. I manually reproduced the fail/pass behavior on > macOS before marking this ready for review. ## Summary Updates the vendored MLX/MLX-C pins to MLX 0.31.2 and refreshes the generated C bindings used by the MLX runner. This pulls in upstream MLX fixes, notably the NAX AddMM fix from ml-explore/mlx#3422, which is included in MLX v0.31.2. This PR also adds an Ollama-side regression test for the observed `addmm` broadcast-bias failure. The commits are intentionally ordered so reviewers can reproduce the bug first, then verify that the MLX upgrade fixes it: 1. `mlx: test addmm broadcast bias` 2. `mlx: update to 0.31.2` ## Why this update is needed Ollama currently exposes and uses MLX `addmm` for biased linear layers, so the upstream MLX AddMM fix is directly relevant to the runner rather than only a general dependency refresh. The current call path is: - `x/models/nn/nn.go`: `Linear.Forward` uses `l.Bias.Addmm(x, w, 1.0, 1.0)` for dense linear layers with bias. - `x/mlxrunner/mlx/nn.go`: the MLX runner `Linear.Forward` also uses `m.Bias.Addmm(x, w, 1.0, 1.0)`. - `x/mlxrunner/mlx/ops.go`: `Array.Addmm` maps that Go call to `C.mlx_addmm(...)`. - `x/imagegen/mlx/mlx.go`: imagegen also exposes `AddMM`, which calls `C.mlx_addmm(...)`. Without this update, those biased MLX linear paths can drop the broadcast bias for batched inputs. Updating MLX/MLX-C pulls in the upstream fix at the dependency layer, preserving Ollama’s existing AddMM call sites instead of carrying an Ollama-side `matmul + add` workaround. ## Details Pin updates: - `MLX_VERSION`: `38ad257088fb2193ad47e527cf6534a689f30943` -> `68cf2fddd8de5edd8ab3d926391772b2e2cedad8` - `MLX_C_VERSION`: `0726ca922fc902c4c61ef9c27d94132be418e945` -> `fba4470b89073180056c9ea46c443051375f7399` The CMake MLX build regenerated the vendored MLX-C headers and Go binding files under: - `x/mlxrunner/mlx/generated.*` - `x/mlxrunner/mlx/include/mlx/c/*` - `x/imagegen/mlx/mlx.*` MLX 0.31 also makes default streams and Metal command encoders thread-local. Since Go goroutines can move between OS threads, a lazy MLX graph can otherwise be built using one thread’s stream and later evaluated on another thread. That can fail with errors like: ```text mlx: There is no Stream(gpu, N) in current thread ``` To match MLX 0.31’s thread-local stream model, this PR updates the Go wrapper so MLX work pins the current goroutine to its OS thread and resolves the default stream per thread instead of caching it globally. The OS-thread pin is intentionally not paired with runtime.UnlockOSThread(): MLX lazy graph construction and later evaluation need to stay on the same OS thread for the lifetime of that MLX goroutine. ## Update: macOS app packaging note Update after app-bundle testing: MLX 0.31.2 also changes the runtime library layout: JACCL is now a standalone library. On macOS arm64, `libmlx.dylib` depends on `@rpath/libjaccl.dylib`. This matters for the signed `Ollama.app` bundle. The app could package `libmlx.dylib`, `libmlxc.dylib`, and `mlx.metallib`, but still fail to load MLX at runtime if `libjaccl.dylib` was not installed into the same `mlx_metal_v*` resource directory. Observed failure mode: ```text MLX not available: failed to load MLX dynamic library ``` with logs showing libmlxc.dylib was found under: /Applications/Ollama.app/Contents/Resources/mlx_metal_v3/libmlxc.dylib The fix installs the jaccl target as part of the MLX CMake install component and ensures the app packaging step carries auxiliary MLX runtime dylibs, such as libjaccl.dylib, into Contents/ Resources/mlx_metal_v*/. ## Regression Test Adds: ```sh go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v ``` The test uses zero input to isolate the broadcast-bias term. On the old MLX version, bias.Addmm(x, w, 1, 1) matches matmul-only, proving the bias is dropped. With MLX 0.31.2, addmm matches matmul + add. Reviewer reproduction flow: ```sh git checkout <test-commit> ./scripts/build_darwin.sh -a arm64 build go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v ``` Expected old behavior: ```text max diff addmm vs matmul+add: 3.000000 max diff addmm vs matmul-only: 0.000000 ``` Then: ```sh git checkout <mlx-upgrade-commit> ./scripts/build_darwin.sh -a arm64 build go test ./x/mlxrunner/mlx -run TestAddmmBroadcastBiasMatchesMatmulAdd -count=1 -v ``` Expected new behavior: ```text max diff addmm vs matmul+add: 0.000000 max diff addmm vs matmul-only: 3.000000 ``` ## Validation - Added TestAddmmBroadcastBiasMatchesMatmulAdd. - Verified the test fails on the pre-MLX-0.31.2 commit. - Verified the test passes after the MLX 0.31.2 update. - In the Linux container, the same targeted go test command passes by skipping MLX runtime because the macOS MLX dylib is unavailable. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:16:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77605