[GH-ISSUE #15881] Flash Attention gating for deepseek2 appears controlled by GGUF metadata #72179

Open
opened 2026-05-05 03:35:52 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @gotnochill815-web on GitHub (Apr 29, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15881

It looks like Flash Attention support for GLM4.7-flash (unsloth version) is not enabled, even though similar models (e.g. using glmmoelite arch) do support it.

From investigation in #15855:

  • Renaming the architecture does not enable FA
  • The difference appears to come from GGUF metadata rather than architecture
  • Models with deepseek2 arch do not trigger FA

This suggests FA eligibility is controlled by specific GGUF metadata fields.

Questions:

  • Which GGUF metadata keys determine whether Flash Attention is enabled?
  • Are there known constraints (attention type, head dims, KV layout, etc.) that block FA for deepseek2?

If this is just a metadata mismatch, aligning those fields might allow FA support without changing model weights.

Originally created by @gotnochill815-web on GitHub (Apr 29, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15881 It looks like Flash Attention support for GLM4.7-flash (unsloth version) is not enabled, even though similar models (e.g. using `glmmoelite` arch) do support it. From investigation in #15855: - Renaming the architecture does not enable FA - The difference appears to come from GGUF metadata rather than architecture - Models with `deepseek2` arch do not trigger FA This suggests FA eligibility is controlled by specific GGUF metadata fields. Questions: - Which GGUF metadata keys determine whether Flash Attention is enabled? - Are there known constraints (attention type, head dims, KV layout, etc.) that block FA for `deepseek2`? If this is just a metadata mismatch, aligning those fields might allow FA support without changing model weights.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72179