[PR #15659] [MERGED] mlx: fuse sigmoid router head in glm4_moe_lite #77547

Closed
opened 2026-05-05 10:13:21 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15659
Author: @jessegross
Created: 4/18/2026
Status: Merged
Merged: 4/20/2026
Merged by: @jessegross

Base: mainHead: jessegross/compile


📝 Commits (1)

  • d8e055e mlx: fuse sigmoid router head in glm4_moe_lite

📊 Changes

2 files changed (+28 additions, -6 deletions)

View changed files

📝 x/mlxrunner/mlx/act.go (+22 -0)
📝 x/models/glm4_moe_lite/glm4_moe_lite.go (+6 -6)

📄 Description

DeepSeek-V2-style aux-loss-free routing computes sigmoid(gates) once but needs it twice: the raw sigmoid output is gathered after top-k, while the post-bias negation is the argpartition key. Fuse into a single multi-output Compiled kernel returning both, saving two launches on the routing path per token. Exposed as a general SigmoidRouter since the same pattern is shared across DeepSeek-V2 descendants.

Improves glm4.7 generation performance by approximately 1%.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15659 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/18/2026 **Status:** ✅ Merged **Merged:** 4/20/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/compile` --- ### 📝 Commits (1) - [`d8e055e`](https://github.com/ollama/ollama/commit/d8e055e2f4b1459821bc215466d74bdf333c6da8) mlx: fuse sigmoid router head in glm4_moe_lite ### 📊 Changes **2 files changed** (+28 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/mlx/act.go` (+22 -0) 📝 `x/models/glm4_moe_lite/glm4_moe_lite.go` (+6 -6) </details> ### 📄 Description DeepSeek-V2-style aux-loss-free routing computes sigmoid(gates) once but needs it twice: the raw sigmoid output is gathered after top-k, while the post-bias negation is the argpartition key. Fuse into a single multi-output Compiled kernel returning both, saving two launches on the routing path per token. Exposed as a general SigmoidRouter since the same pattern is shared across DeepSeek-V2 descendants. Improves glm4.7 generation performance by approximately 1%. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:13:21 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77547