[PR #15311] [MERGED] Revert "enable flash attention for gemma4 (#15296)" #61807

Closed
opened 2026-04-29 16:49:14 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15311
Author: @dhiltgen
Created: 4/4/2026
Status: Merged
Merged: 4/4/2026
Merged by: @dhiltgen

Base: mainHead: fa


📝 Commits (1)

📊 Changes

1 file changed (+0 additions, -1 deletions)

View changed files

📝 fs/ggml/ggml.go (+0 -1)

📄 Description

This reverts commit c8e0878814.

Fixes a performance regression - Perf run comparison:

                                    │ /tmp/0.20.0.log │       /tmp/0.20.1-flash.log        │     /tmp/0.20.1-no-flash.log      │
                                    │    token/sec    │  token/sec   vs base               │  token/sec   vs base              │
Model/name=gemma4:e2b/step=prefill        3.035k ± 1%   1.227k ± 2%  -59.57% (p=0.002 n=6)   3.040k ± 1%       ~ (p=0.310 n=6)
Model/name=gemma4:e2b/step=generate       144.92 ± 3%    95.09 ± 6%  -34.39% (p=0.002 n=6)   146.20 ± 2%  +0.89% (p=0.026 n=6)
Model/name=gemma4:e4b/step=prefill        1508.1 ± 5%    827.4 ± 3%  -45.14% (p=0.002 n=6)   1533.3 ± 2%       ~ (p=0.180 n=6)
Model/name=gemma4:e4b/step=generate        93.55 ± 1%    66.75 ± 5%  -28.64% (p=0.002 n=6)    94.75 ± 2%  +1.29% (p=0.026 n=6)
Model/name=gemma4:26b/step=prefill        1497.5 ± 1%    689.7 ± 1%  -53.95% (p=0.002 n=6)   1556.9 ± 1%  +3.97% (p=0.002 n=6)
Model/name=gemma4:26b/step=generate        86.95 ± 3%    70.63 ± 4%  -18.77% (p=0.002 n=6)    88.78 ± 1%  +2.09% (p=0.009 n=6)
geomean                                    447.9         260.7       -41.80%                  455.4       +1.67%

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15311 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/4/2026 **Status:** ✅ Merged **Merged:** 4/4/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `fa` --- ### 📝 Commits (1) - [`9e725f3`](https://github.com/ollama/ollama/commit/9e725f32dec43afe8730a5a000dc4c4c60e99b93) Revert "enable flash attention for gemma4 (#15296)" ### 📊 Changes **1 file changed** (+0 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `fs/ggml/ggml.go` (+0 -1) </details> ### 📄 Description This reverts commit c8e0878814b4d19200d65571d3d2d35b4b48fd3e. Fixes a performance regression - Perf run comparison: ``` │ /tmp/0.20.0.log │ /tmp/0.20.1-flash.log │ /tmp/0.20.1-no-flash.log │ │ token/sec │ token/sec vs base │ token/sec vs base │ Model/name=gemma4:e2b/step=prefill 3.035k ± 1% 1.227k ± 2% -59.57% (p=0.002 n=6) 3.040k ± 1% ~ (p=0.310 n=6) Model/name=gemma4:e2b/step=generate 144.92 ± 3% 95.09 ± 6% -34.39% (p=0.002 n=6) 146.20 ± 2% +0.89% (p=0.026 n=6) Model/name=gemma4:e4b/step=prefill 1508.1 ± 5% 827.4 ± 3% -45.14% (p=0.002 n=6) 1533.3 ± 2% ~ (p=0.180 n=6) Model/name=gemma4:e4b/step=generate 93.55 ± 1% 66.75 ± 5% -28.64% (p=0.002 n=6) 94.75 ± 2% +1.29% (p=0.026 n=6) Model/name=gemma4:26b/step=prefill 1497.5 ± 1% 689.7 ± 1% -53.95% (p=0.002 n=6) 1556.9 ± 1% +3.97% (p=0.002 n=6) Model/name=gemma4:26b/step=generate 86.95 ± 3% 70.63 ± 4% -18.77% (p=0.002 n=6) 88.78 ± 1% +2.09% (p=0.009 n=6) geomean 447.9 260.7 -41.80% 455.4 +1.67% ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:49:14 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61807