[GH-ISSUE #4718] Ensuring Flash Attention Support in Official Docker Images by Setting Environment Variables #49486

Closed
opened 2026-04-28 12:00:25 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @00010110 on GitHub (May 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4718

What is the issue?

Does the official Docker image have a runtime environment for Flash Attention? Is it sufficient to directly set the environment variable OLLAMA_FLASH_ATTENTION to configure it?

OS

No response

GPU

No response

CPU

No response

Ollama version

v0.1.39

Originally created by @00010110 on GitHub (May 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4718 ### What is the issue? Does the official Docker image have a runtime environment for Flash Attention? Is it sufficient to directly set the environment variable OLLAMA_FLASH_ATTENTION to configure it? ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version v0.1.39
GiteaMirror added the bug label 2026-04-28 12:00:25 -05:00
Author
Owner

@jmorganca commented on GitHub (May 30, 2024):

Yes setting OLLAMA_FLASH_ATTENTION=1 will enable it. Keep in mind only GPUs with compute capability 7+ support flash attention

<!-- gh-comment-id:2140148975 --> @jmorganca commented on GitHub (May 30, 2024): Yes setting `OLLAMA_FLASH_ATTENTION=1` will enable it. Keep in mind only GPUs with compute capability 7+ support flash attention
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49486