[PR #13870] [MERGED] Re-apply "model: add MLA absorption for glm4moelite" with fix #45675

Closed
opened 2026-04-25 01:19:45 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13870
Author: @jmorganca
Created: 1/24/2026
Status: Merged
Merged: 1/24/2026
Merged by: @jmorganca

Base: mainHead: revert-13869-revert-13810-glm4moelite-mla-absorption


📝 Commits (2)

  • 639d882 Revert "Revert "model: add MLA absorption for glm4moelite (#13810)" (#13869)"
  • 9b5b439 ggml: fix zero-sized array error in fattn-tile nvidia_fp32 config

📊 Changes

16 files changed (+526 additions, -24 deletions)

View changed files

📝 convert/convert_glm4moelite.go (+114 -0)
llama/patches/0032-ggml-enable-MLA-flash-attention-for-GLM-4.7-flash.patch (+251 -0)
📝 ml/backend/ggml/ggml/src/ggml-cuda/fattn-mma-f16.cuh (+12 -3)
📝 ml/backend/ggml/ggml/src/ggml-cuda/fattn-tile.cuh (+17 -1)
📝 ml/backend/ggml/ggml/src/ggml-cuda/fattn.cu (+8 -4)
📝 ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu (+1 -0)
📝 ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu (+1 -0)
📝 ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu (+1 -0)
📝 ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu (+1 -0)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-device.m (+2 -6)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal (+1 -0)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-ops.cpp (+1 -1)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal (+1 -0)
📝 model/model.go (+14 -0)
📝 model/models/glm4moelite/model.go (+28 -9)
model/models/glm4moelite/model_test.go (+73 -0)

📄 Description

No description provided


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13870 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 1/24/2026 **Status:** ✅ Merged **Merged:** 1/24/2026 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `revert-13869-revert-13810-glm4moelite-mla-absorption` --- ### 📝 Commits (2) - [`639d882`](https://github.com/ollama/ollama/commit/639d88279c9bf26f93bf8167ef7f5b3a23067e8a) Revert "Revert "model: add MLA absorption for glm4moelite (#13810)" (#13869)" - [`9b5b439`](https://github.com/ollama/ollama/commit/9b5b4398d9d37372d6c0cf4657e521a0142c6d70) ggml: fix zero-sized array error in fattn-tile nvidia_fp32 config ### 📊 Changes **16 files changed** (+526 additions, -24 deletions) <details> <summary>View changed files</summary> 📝 `convert/convert_glm4moelite.go` (+114 -0) ➕ `llama/patches/0032-ggml-enable-MLA-flash-attention-for-GLM-4.7-flash.patch` (+251 -0) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/fattn-mma-f16.cuh` (+12 -3) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/fattn-tile.cuh` (+17 -1) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/fattn.cu` (+8 -4) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu` (+1 -0) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu` (+1 -0) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu` (+1 -0) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu` (+1 -0) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-device.m` (+2 -6) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal` (+1 -0) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-ops.cpp` (+1 -1) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal` (+1 -0) 📝 `model/model.go` (+14 -0) 📝 `model/models/glm4moelite/model.go` (+28 -9) ➕ `model/models/glm4moelite/model_test.go` (+73 -0) </details> ### 📄 Description _No description provided_ --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:19:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#45675