[PR #15587] [MERGED] mlx: Improve gemma4 performance with fused operations #77513

Closed
opened 2026-05-05 10:11:06 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15587
Author: @dhiltgen
Created: 4/14/2026
Status: Merged
Merged: 4/15/2026
Merged by: @dhiltgen

Base: mainHead: gemma4-mlx-perf


📝 Commits (2)

  • abe85fa mlx: Improve gemma4 performance with fused operations
  • 7e8191a review comments

📊 Changes

2 files changed (+29 additions, -15 deletions)

View changed files

📝 x/mlxrunner/mlx/act.go (+20 -0)
📝 x/models/gemma4/gemma4.go (+9 -15)

📄 Description

go run cmd/bench/bench.go -model gemma4:XXX-nvfp4 -prompt-tokens 2048 -max-tokens 128 -epochs 5 -warmup 1

  ┌──────┬─────────┬──────────┬────────────────┬────────┐
  │ Size │ Metric  │ Baseline │ New (compiled) │   Δ    │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ E2B  │ prefill │   18,777 │         21,901 │ +16.6% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ E2B  │ gen     │    153.8 │          176.9 │ +15.0% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ E4B  │ prefill │    6,980 │          8,086 │ +15.8% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ E4B  │ gen     │     99.1 │          110.6 │ +11.6% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ 26B  │ prefill │    3,957 │          4,372 │ +10.5% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ 26B  │ gen     │    101.3 │          107.5 │  +6.1% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ 31B  │ prefill │      531 │            593 │ +11.7% │
  ├──────┼─────────┼──────────┼────────────────┼────────┤
  │ 31B  │ gen     │     21.4 │           22.4 │  +4.8% │
  └──────┴─────────┴──────────┴────────────────┴────────┘

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15587 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/14/2026 **Status:** ✅ Merged **Merged:** 4/15/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `gemma4-mlx-perf` --- ### 📝 Commits (2) - [`abe85fa`](https://github.com/ollama/ollama/commit/abe85fa4dade289e7e24088b416f077646f3825a) mlx: Improve gemma4 performance with fused operations - [`7e8191a`](https://github.com/ollama/ollama/commit/7e8191a9f2ae0fdacf0edc2cebec9dfddf076325) review comments ### 📊 Changes **2 files changed** (+29 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/mlx/act.go` (+20 -0) 📝 `x/models/gemma4/gemma4.go` (+9 -15) </details> ### 📄 Description `go run cmd/bench/bench.go -model gemma4:XXX-nvfp4 -prompt-tokens 2048 -max-tokens 128 -epochs 5 -warmup 1` ``` ┌──────┬─────────┬──────────┬────────────────┬────────┐ │ Size │ Metric │ Baseline │ New (compiled) │ Δ │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ E2B │ prefill │ 18,777 │ 21,901 │ +16.6% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ E2B │ gen │ 153.8 │ 176.9 │ +15.0% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ E4B │ prefill │ 6,980 │ 8,086 │ +15.8% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ E4B │ gen │ 99.1 │ 110.6 │ +11.6% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ 26B │ prefill │ 3,957 │ 4,372 │ +10.5% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ 26B │ gen │ 101.3 │ 107.5 │ +6.1% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ 31B │ prefill │ 531 │ 593 │ +11.7% │ ├──────┼─────────┼──────────┼────────────────┼────────┤ │ 31B │ gen │ 21.4 │ 22.4 │ +4.8% │ └──────┴─────────┴──────────┴────────────────┴────────┘ ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:11:06 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77513