[PR #13052] [MERGED] flash attn: add auto mode for llama engine #14048

Closed
opened 2026-04-13 00:43:22 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13052
Author: @dhiltgen
Created: 11/11/2025
Status: Merged
Merged: 12/12/2025
Merged by: @dhiltgen

Base: mainHead: fa_auto


📝 Commits (3)

  • 6de1889 flash attn: add auto mode for llama engine
  • 21de4e8 review comments
  • 595d318 ensure kv cache quantized types have FA explicitly enabled

📊 Changes

7 files changed (+100 additions, -24 deletions)

View changed files

📝 fs/ggml/ggml.go (+11 -2)
📝 llama/llama.go (+8 -5)
📝 llm/server.go (+49 -12)
📝 ml/backend.go (+1 -1)
📝 ml/backend/ggml/ggml.go (+3 -3)
📝 ml/device.go (+26 -0)
📝 runner/llamarunner/runner.go (+2 -1)

📄 Description

If the user does not specify fa in the environment, use auto-mode.

Draft until I can verify more models


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13052 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 11/11/2025 **Status:** ✅ Merged **Merged:** 12/12/2025 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `fa_auto` --- ### 📝 Commits (3) - [`6de1889`](https://github.com/ollama/ollama/commit/6de188958d6c2af129edac064d93dfbc317c8ce7) flash attn: add auto mode for llama engine - [`21de4e8`](https://github.com/ollama/ollama/commit/21de4e819bd4697e3b5c5419225bc7e61359177d) review comments - [`595d318`](https://github.com/ollama/ollama/commit/595d31836bf100bce53156e3bf94c703b0c39e7b) ensure kv cache quantized types have FA explicitly enabled ### 📊 Changes **7 files changed** (+100 additions, -24 deletions) <details> <summary>View changed files</summary> 📝 `fs/ggml/ggml.go` (+11 -2) 📝 `llama/llama.go` (+8 -5) 📝 `llm/server.go` (+49 -12) 📝 `ml/backend.go` (+1 -1) 📝 `ml/backend/ggml/ggml.go` (+3 -3) 📝 `ml/device.go` (+26 -0) 📝 `runner/llamarunner/runner.go` (+2 -1) </details> ### 📄 Description If the user does not specify fa in the environment, use auto-mode. Draft until I can verify more models --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:43:22 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14048