[PR #4120] [MERGED] feat: add support for flash_attn #21930

Closed
opened 2026-04-19 15:57:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4120
Author: @sammcj
Created: 5/3/2024
Status: Merged
Merged: 5/20/2024
Merged by: @jmorganca

Base: mainHead: main


📝 Commits (8)

  • 7857efa feat: enable flash attention if supported
  • f8dbbee feat: enable flash attention if supported
  • 02c6e37 Merge branch 'main' into main
  • 69474d1 feat: enable flash attention if supported
  • e9ffccb Merge branch 'main' into main
  • 0c08573 feat: add flash_attn support
  • 9b134a9 Merge branch 'main' into main
  • 24c4dae Merge branch 'main' into main

📊 Changes

2 files changed (+28 additions, -3 deletions)

View changed files

📝 llm/ext_server/server.cpp (+11 -3)
📝 llm/server.go (+17 -0)

📄 Description

  • Add Flash Attention support #4051

Only enabled by default on a supported CUDA version or Metal is detected, configurable via params and the API.

Credit to @wanderingmeow who took my broken idea and made it work 🎉

Fixes #4051


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4120 **Author:** [@sammcj](https://github.com/sammcj) **Created:** 5/3/2024 **Status:** ✅ Merged **Merged:** 5/20/2024 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (8) - [`7857efa`](https://github.com/ollama/ollama/commit/7857efa56b40baad629d3170e11afbea32ed9b3c) feat: enable flash attention if supported - [`f8dbbee`](https://github.com/ollama/ollama/commit/f8dbbee223c80342c29662dc90bc913e07b18a9e) feat: enable flash attention if supported - [`02c6e37`](https://github.com/ollama/ollama/commit/02c6e37cf2e04910f75299fe1c51e3343c325ae8) Merge branch 'main' into main - [`69474d1`](https://github.com/ollama/ollama/commit/69474d114681fb3e0ba01fb2a8dbb32e478f6b72) feat: enable flash attention if supported - [`e9ffccb`](https://github.com/ollama/ollama/commit/e9ffccb9cd8bf9b9099224209c0d70d3e483c055) Merge branch 'main' into main - [`0c08573`](https://github.com/ollama/ollama/commit/0c08573f43a1f5b3622c47ea04b6d11dd16898df) feat: add flash_attn support - [`9b134a9`](https://github.com/ollama/ollama/commit/9b134a9bd82a2114108c45106b02b360aa42afb2) Merge branch 'main' into main - [`24c4dae`](https://github.com/ollama/ollama/commit/24c4dae88c94a7f894a2fed220a8df6df7be676d) Merge branch 'main' into main ### 📊 Changes **2 files changed** (+28 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `llm/ext_server/server.cpp` (+11 -3) 📝 `llm/server.go` (+17 -0) </details> ### 📄 Description - Add Flash Attention support #4051 Only enabled by default on a supported CUDA version or Metal is detected, configurable via params and the API. Credit to @wanderingmeow who [took my broken idea and made it work](https://github.com/ollama/ollama/issues/4051#issuecomment-2092430887) 🎉 Fixes #4051 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 15:57:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#21930