[PR #15244] Gemma4 on MLX #15091

Open
opened 2026-04-13 01:10:09 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15244
Author: @dhiltgen
Created: 4/2/2026
Status: 🔄 Open

Base: mainHead: gemma4-mlx


📝 Commits (8)

  • 9c24982 mlx: add op wrappers for Conv2d, Pad, activations, trig, and masked SDPA
  • 818ce1b review comments
  • 2e5a5cb mlx: fix ScaledDotProductAttentionMasked to consult the mask argument
  • 8189a73 testutil: add MLX model porting and perplexity validation tooling
  • f3a155f qwen3_5: add forward-pass per-position test against PyTorch reference
  • da2b1c4 gemma4: implement Gemma 4 model for MLX (text-only runtime)
  • 08cba1e mlx: Support NVIDIA TensorRT Model Optimizer import
  • 1927d36 gemma4: add forward, perplexity, and tokenizer test fixtures

📊 Changes

34 files changed (+7113 additions, -113 deletions)

View changed files

x/cmd/ppl/README.md (+52 -0)
x/cmd/ppl/ppl.go (+403 -0)
📝 x/create/client/create.go (+35 -13)
📝 x/create/client/create_test.go (+2 -1)
📝 x/create/client/quantize.go (+92 -29)
📝 x/create/create.go (+193 -17)
📝 x/create/dtype.go (+4 -0)
x/create/gemma4.go (+264 -0)
x/create/gemma4_test.go (+191 -0)
📝 x/imagegen/safetensors/safetensors.go (+2 -0)
📝 x/mlxrunner/imports.go (+1 -0)
📝 x/mlxrunner/mlx/act.go (+51 -10)
📝 x/mlxrunner/mlx/mlx.go (+7 -0)
📝 x/mlxrunner/mlx/ops_extra.go (+198 -28)
📝 x/mlxrunner/model/linear.go (+16 -7)
📝 x/mlxrunner/model/root.go (+18 -0)
x/models/PORTING_GUIDE.md (+133 -0)
x/models/gemma4/forward_test.go (+205 -0)
x/models/gemma4/gemma4.go (+1546 -0)
x/models/gemma4/gemma4_moe_test.go (+118 -0)

...and 14 more files

📄 Description

Port the Gemma4 model to the MLX engine. Text only initially.

For testing, use the models uploaded to https://ollama.com/dhiltgen/gemma4 (final weights published from Google, but still experimenting with quant strategy so subject to change)

Carries:


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15244 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/2/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `gemma4-mlx` --- ### 📝 Commits (8) - [`9c24982`](https://github.com/ollama/ollama/commit/9c249829fc397e852560731ecaa8df68894295a6) mlx: add op wrappers for Conv2d, Pad, activations, trig, and masked SDPA - [`818ce1b`](https://github.com/ollama/ollama/commit/818ce1b1ec3a330e0ea861941e647d5ba1ace3d4) review comments - [`2e5a5cb`](https://github.com/ollama/ollama/commit/2e5a5cb502478021a1c0c2cc0d6a6149eefa29e3) mlx: fix ScaledDotProductAttentionMasked to consult the mask argument - [`8189a73`](https://github.com/ollama/ollama/commit/8189a7329c09f119cb08af816206694d086a57a9) testutil: add MLX model porting and perplexity validation tooling - [`f3a155f`](https://github.com/ollama/ollama/commit/f3a155f4706520087acb81160e8018408a4bf66e) qwen3_5: add forward-pass per-position test against PyTorch reference - [`da2b1c4`](https://github.com/ollama/ollama/commit/da2b1c4b82f109a04e16bf92dc410bb42460ef58) gemma4: implement Gemma 4 model for MLX (text-only runtime) - [`08cba1e`](https://github.com/ollama/ollama/commit/08cba1efccca78c6a3e58c24010dc00d87cf5b7d) mlx: Support NVIDIA TensorRT Model Optimizer import - [`1927d36`](https://github.com/ollama/ollama/commit/1927d36c7e9257a9582cf03f4d77a38ccac0b2d3) gemma4: add forward, perplexity, and tokenizer test fixtures ### 📊 Changes **34 files changed** (+7113 additions, -113 deletions) <details> <summary>View changed files</summary> ➕ `x/cmd/ppl/README.md` (+52 -0) ➕ `x/cmd/ppl/ppl.go` (+403 -0) 📝 `x/create/client/create.go` (+35 -13) 📝 `x/create/client/create_test.go` (+2 -1) 📝 `x/create/client/quantize.go` (+92 -29) 📝 `x/create/create.go` (+193 -17) 📝 `x/create/dtype.go` (+4 -0) ➕ `x/create/gemma4.go` (+264 -0) ➕ `x/create/gemma4_test.go` (+191 -0) 📝 `x/imagegen/safetensors/safetensors.go` (+2 -0) 📝 `x/mlxrunner/imports.go` (+1 -0) 📝 `x/mlxrunner/mlx/act.go` (+51 -10) 📝 `x/mlxrunner/mlx/mlx.go` (+7 -0) 📝 `x/mlxrunner/mlx/ops_extra.go` (+198 -28) 📝 `x/mlxrunner/model/linear.go` (+16 -7) 📝 `x/mlxrunner/model/root.go` (+18 -0) ➕ `x/models/PORTING_GUIDE.md` (+133 -0) ➕ `x/models/gemma4/forward_test.go` (+205 -0) ➕ `x/models/gemma4/gemma4.go` (+1546 -0) ➕ `x/models/gemma4/gemma4_moe_test.go` (+118 -0) _...and 14 more files_ </details> ### 📄 Description Port the Gemma4 model to the MLX engine. Text only initially. For testing, use the models uploaded to https://ollama.com/dhiltgen/gemma4 (final weights published from Google, but still experimenting with quant strategy so subject to change) Carries: - #15409 - #15120 - #14913 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:10:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#15091