[PR #12231] [MERGED] Flash attention & KV cache quantization validation fixes #60438

Closed
opened 2026-04-29 15:24:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12231
Author: @jessegross
Created: 9/9/2025
Status: Merged
Merged: 9/10/2025
Merged by: @jessegross

Base: mainHead: jessegross/flash


📝 Commits (2)

  • 478aab4 llm: Remove unneeded warning with flash attention enabled
  • 3575a92 ggml: Disable flash attention for gemma2

📊 Changes

3 files changed (+13 additions, -5 deletions)

View changed files

📝 fs/ggml/ggml.go (+11 -3)
📝 llm/memory.go (+1 -1)
📝 llm/server.go (+1 -1)

📄 Description

Our validation checks for flash attention and KV cache quantization currently enable models that aren't supported and also create spurious warnings.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12231 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 9/9/2025 **Status:** ✅ Merged **Merged:** 9/10/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/flash` --- ### 📝 Commits (2) - [`478aab4`](https://github.com/ollama/ollama/commit/478aab4cc663652c10056b0e0b486952d433d8cb) llm: Remove unneeded warning with flash attention enabled - [`3575a92`](https://github.com/ollama/ollama/commit/3575a92213b4d23de60f79282dc7d17673602c9c) ggml: Disable flash attention for gemma2 ### 📊 Changes **3 files changed** (+13 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `fs/ggml/ggml.go` (+11 -3) 📝 `llm/memory.go` (+1 -1) 📝 `llm/server.go` (+1 -1) </details> ### 📄 Description Our validation checks for flash attention and KV cache quantization currently enable models that aren't supported and also create spurious warnings. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 15:24:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#60438