[PR #14054] [CLOSED] ggml: add MLA flash attention config for GLM-4.7-flash #14487

Closed
opened 2026-04-13 00:55:40 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14054
Author: @jmorganca
Created: 2/3/2026
Status: Closed

Base: mainHead: fix-glm-4.7-flash-mla-config


📝 Commits (1)

  • 55746e3 ggml: add MLA flash attention config for GLM-4.7-flash

📊 Changes

2 files changed (+72 additions, -3 deletions)

View changed files

llama/patches/0032-ggml-add-MLA-flash-attention-config-for-GLM-4.7-flas.patch (+64 -0)
📝 ml/backend/ggml/ggml/src/ggml-cuda/fattn-mma-f16.cuh (+8 -3)

📄 Description

GLM-4.7-flash has gqa_ratio=20, which requires ncols=4 support for the 576x512 (MLA) flash attention kernel. The kernel selection logic and extern declarations exist but the CONFIG_CASE entries were missing, causing performance regression.

Fixes #14045


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14054 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 2/3/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix-glm-4.7-flash-mla-config` --- ### 📝 Commits (1) - [`55746e3`](https://github.com/ollama/ollama/commit/55746e31fa2bceb5e585f9ac69be1a4d1fdf709d) ggml: add MLA flash attention config for GLM-4.7-flash ### 📊 Changes **2 files changed** (+72 additions, -3 deletions) <details> <summary>View changed files</summary> ➕ `llama/patches/0032-ggml-add-MLA-flash-attention-config-for-GLM-4.7-flas.patch` (+64 -0) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/fattn-mma-f16.cuh` (+8 -3) </details> ### 📄 Description GLM-4.7-flash has gqa_ratio=20, which requires ncols=4 support for the 576x512 (MLA) flash attention kernel. The kernel selection logic and extern declarations exist but the CONFIG_CASE entries were missing, causing performance regression. Fixes #14045 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:55:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14487