[PR #15296] [MERGED] gemma4: enable flash attention #20376

Closed
opened 2026-04-16 07:34:39 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15296
Author: @dhiltgen
Created: 4/3/2026
Status: Merged
Merged: 4/3/2026
Merged by: @dhiltgen

Base: mainHead: fix_cude_prediction


📝 Commits (1)

  • 67a4af3 enable flash attention for gemma4

📊 Changes

1 file changed (+1 additions, -0 deletions)

View changed files

📝 fs/ggml/ggml.go (+1 -0)

📄 Description

This patches additional code paths in the GGML CUDA backend for the memory prediction flow.

Fixes #15249

Enable flash attention for gemma4


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15296 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/3/2026 **Status:** ✅ Merged **Merged:** 4/3/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `fix_cude_prediction` --- ### 📝 Commits (1) - [`67a4af3`](https://github.com/ollama/ollama/commit/67a4af39372555b7da2fddd1f17f22bd5d188cb0) enable flash attention for gemma4 ### 📊 Changes **1 file changed** (+1 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `fs/ggml/ggml.go` (+1 -0) </details> ### 📄 Description ~~This patches additional code paths in the GGML CUDA backend for the memory prediction flow.~~ ~~Fixes #15249~~ Enable flash attention for gemma4 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:34:39 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#20376