[PR #13874] [MERGED] llama: fix CUDA release build issues #14430

Closed
opened 2026-04-13 00:53:50 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13874
Author: @jmorganca
Created: 1/24/2026
Status: Merged
Merged: 1/24/2026
Merged by: @jmorganca

Base: mainHead: fix-cuda-fattn-mma


📝 Commits (1)

  • 58bc945 ggml-cuda: fix fattn-mma-f16 build for GLM 4.7 flash

📊 Changes

2 files changed (+111 additions, -37 deletions)

View changed files

📝 llama/patches/0032-ggml-enable-MLA-flash-attention-for-GLM-4.7-flash.patch (+86 -27)
📝 ml/backend/ggml/ggml/src/ggml-cuda/fattn-mma-f16.cuh (+25 -10)

📄 Description

Tested against CUDA 12 environment across older architectures causing the failures on main


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13874 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 1/24/2026 **Status:** ✅ Merged **Merged:** 1/24/2026 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `fix-cuda-fattn-mma` --- ### 📝 Commits (1) - [`58bc945`](https://github.com/ollama/ollama/commit/58bc94528fff1fdf76da2897a9fa496d8019480b) ggml-cuda: fix fattn-mma-f16 build for GLM 4.7 flash ### 📊 Changes **2 files changed** (+111 additions, -37 deletions) <details> <summary>View changed files</summary> 📝 `llama/patches/0032-ggml-enable-MLA-flash-attention-for-GLM-4.7-flash.patch` (+86 -27) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/fattn-mma-f16.cuh` (+25 -10) </details> ### 📄 Description Tested against CUDA 12 environment across older architectures causing the failures on main --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:53:50 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14430