[PR #15581] ggml-metal: fix mixed bf16/f16 cooperative tensor operand order #41088

Open
opened 2026-04-23 01:49:19 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15581
Author: @Horacehxw
Created: 4/14/2026
Status: 🔄 Open

Base: mainHead: fix-metal-cooperative-tensor-order


📝 Commits (2)

  • a542fdc fix(metal): match cooperative tensor operand order
  • c95a23a fix(ci): carry cooperative tensor fix in llama patch stack

📊 Changes

3 files changed (+37 additions, -4 deletions)

View changed files

llama/patches/0037-ggml-metal-match-cooperative-tensor-operand-order.patch (+33 -0)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal (+2 -2)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal (+2 -2)

📄 Description

Summary

This fixes a mixed bf16/f16 cooperative tensor compile failure in the Metal backend on Apple M5 / Metal 4 systems.

The failure shows up as:

static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>'
"Input types must match cooperative tensor types"

Root cause

In both kernel_mul_mm and kernel_mul_mm_id, the destination cooperative tensor is instantiated as if the operand order were (tA, tB):

auto cT = mm.get_destination_cooperative_tensor<decltype(tA), decltype(tB), float>();

but the actual matmul call uses (sB, sA, cT):

mm.run(sB, sA, cT);

That mismatch is benign for same-type pairs, but it breaks mixed bf16/f16 cooperative tensor instantiations under the stricter MetalPerformancePrimitives checks on Apple10 / Metal 4.

This patch makes the destination tensor type match the actual operand order passed to mm.run.

Scope

Files changed:

  • ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal
  • ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal

Relationship to existing work

This addresses the same user-visible failure as #14432.

It is also related to #14604, but takes a different approach:

  • #14604 works around the problem by removing the mixed bf16/f16 kernels and forcing fallback behavior
  • this PR fixes the operand-order mismatch directly, so the mixed cooperative tensor path can compile and run

Verification

Tested locally on:

  • Apple M5 Pro
  • macOS 26.4
  • latest main as of 2026-04-15

Reproduction on unpatched main

Built dist/darwin-arm64/ollama, then ran:

env -u GGML_METAL_TENSOR_DISABLE \
  ./dist/darwin-arm64/ollama runner --ollama-engine \
  --model ~/.ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df \
  --port 64150

curl http://127.0.0.1:64150/info

Result:

  • runner crashes during tensor probe / Metal library initialization
  • stderr contains the Input types must match cooperative tensor types static_assert from MPPTensorOpsMatMul2dImpl.h

Verification on this patch

After rebuilding with this change, the same /info request succeeds.

Observed runner log:

ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_device_init: testing tensor API for bfloat support
ggml_metal_library_init: loaded in 7.085 sec
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = true

I also verified model load on the patched runner using the same Gemma 4 26B blob:

  • fit succeeds
  • alloc succeeds
  • commit succeeds
  • /health reaches {"status":0,"progress":1}
  • log shows:
offloaded 31/31 layers to GPU

This was verified with GGML_METAL_TENSOR_DISABLE unset.

Fixes #14432.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15581 **Author:** [@Horacehxw](https://github.com/Horacehxw) **Created:** 4/14/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix-metal-cooperative-tensor-order` --- ### 📝 Commits (2) - [`a542fdc`](https://github.com/ollama/ollama/commit/a542fdcfaa2c4bcac96489770b6df75a1238e6f8) fix(metal): match cooperative tensor operand order - [`c95a23a`](https://github.com/ollama/ollama/commit/c95a23ac3411cf9ac834ba4870ae2d796d8424a5) fix(ci): carry cooperative tensor fix in llama patch stack ### 📊 Changes **3 files changed** (+37 additions, -4 deletions) <details> <summary>View changed files</summary> ➕ `llama/patches/0037-ggml-metal-match-cooperative-tensor-operand-order.patch` (+33 -0) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal` (+2 -2) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal` (+2 -2) </details> ### 📄 Description ## Summary This fixes a mixed `bf16`/`f16` cooperative tensor compile failure in the Metal backend on Apple M5 / Metal 4 systems. The failure shows up as: ```text static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" ``` ## Root cause In both `kernel_mul_mm` and `kernel_mul_mm_id`, the destination cooperative tensor is instantiated as if the operand order were `(tA, tB)`: ```cpp auto cT = mm.get_destination_cooperative_tensor<decltype(tA), decltype(tB), float>(); ``` but the actual matmul call uses `(sB, sA, cT)`: ```cpp mm.run(sB, sA, cT); ``` That mismatch is benign for same-type pairs, but it breaks mixed `bf16`/`f16` cooperative tensor instantiations under the stricter MetalPerformancePrimitives checks on Apple10 / Metal 4. This patch makes the destination tensor type match the actual operand order passed to `mm.run`. ## Scope Files changed: - `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal` - `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal` ## Relationship to existing work This addresses the same user-visible failure as #14432. It is also related to #14604, but takes a different approach: - `#14604` works around the problem by removing the mixed `bf16`/`f16` kernels and forcing fallback behavior - this PR fixes the operand-order mismatch directly, so the mixed cooperative tensor path can compile and run ## Verification Tested locally on: - Apple M5 Pro - macOS 26.4 - latest `main` as of 2026-04-15 ### Reproduction on unpatched `main` Built `dist/darwin-arm64/ollama`, then ran: ```bash env -u GGML_METAL_TENSOR_DISABLE \ ./dist/darwin-arm64/ollama runner --ollama-engine \ --model ~/.ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df \ --port 64150 curl http://127.0.0.1:64150/info ``` Result: - runner crashes during tensor probe / Metal library initialization - stderr contains the `Input types must match cooperative tensor types` static_assert from `MPPTensorOpsMatMul2dImpl.h` ### Verification on this patch After rebuilding with this change, the same `/info` request succeeds. Observed runner log: ```text ggml_metal_device_init: testing tensor API for f16 support ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_init: loaded in 7.085 sec ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ``` I also verified model load on the patched runner using the same Gemma 4 26B blob: - `fit` succeeds - `alloc` succeeds - `commit` succeeds - `/health` reaches `{"status":0,"progress":1}` - log shows: ```text offloaded 31/31 layers to GPU ``` This was verified with `GGML_METAL_TENSOR_DISABLE` unset. Fixes #14432. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:49:19 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#41088