[GH-ISSUE #6953] AMD ROCm Card can not use flash attention #30161

Open
opened 2026-04-22 09:39:37 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @superligen on GitHub (Sep 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6953

What is the issue?

My cards is w7900, and rocm driver is 6.3 , I found the llama-cpp server started by Ollama always without -fa flag.

I check the code , found :

           // only cuda (compute capability 7+) and metal support flash attention
	if g.Library != "metal" && (g.Library != "cuda" || g.DriverMajor < 7) {
		flashAttnEnabled = false
	}

This code sames wrong.

Ref: https://github.com/Dao-AILab/flash-attention/pull/1010 Support for RoCM has been added tk flash attention 2

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

No response

Originally created by @superligen on GitHub (Sep 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6953 ### What is the issue? My cards is w7900, and rocm driver is 6.3 , I found the llama-cpp server started by Ollama always without -fa flag. I check the code , found : // only cuda (compute capability 7+) and metal support flash attention if g.Library != "metal" && (g.Library != "cuda" || g.DriverMajor < 7) { flashAttnEnabled = false } This code sames wrong. Ref: https://github.com/Dao-AILab/flash-attention/pull/1010 Support for RoCM has been added tk flash attention 2 ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version _No response_
GiteaMirror added the feature requestamd labels 2026-04-22 09:39:37 -05:00
Author
Owner

@dhiltgen commented on GitHub (Sep 25, 2024):

It looks like support may be limited to the MI200 and MI300 GPUs - https://github.com/ROCm/flash-attention?tab=readme-ov-file#amd-rocm-support

<!-- gh-comment-id:2375148194 --> @dhiltgen commented on GitHub (Sep 25, 2024): It looks like support may be limited to the MI200 and MI300 GPUs - https://github.com/ROCm/flash-attention?tab=readme-ov-file#amd-rocm-support
Author
Owner

@awz commented on GitHub (Sep 30, 2024):

I would love to see this work with multiple w7900s. I have 3x w7900 + 1x 6800 working without flash attention. I want 4x w7900 working with flash attention.

<!-- gh-comment-id:2381858784 --> @awz commented on GitHub (Sep 30, 2024): I would love to see this work with multiple w7900s. I have 3x w7900 + 1x 6800 working without flash attention. I want 4x w7900 working with flash attention.
Author
Owner

@LunNova commented on GitHub (Oct 9, 2024):

May be fixed when/if https://github.com/Dao-AILab/flash-attention/pull/1203 gets merged.

<!-- gh-comment-id:2403381031 --> @LunNova commented on GitHub (Oct 9, 2024): May be fixed when/if https://github.com/Dao-AILab/flash-attention/pull/1203 gets merged.
Author
Owner

@Crandel commented on GitHub (Dec 19, 2024):

May be fixed when/if Dao-AILab/flash-attention#1203 gets merged.

It was merged. Would it be possible to have it in Ollama?

<!-- gh-comment-id:2555631069 --> @Crandel commented on GitHub (Dec 19, 2024): > May be fixed when/if [Dao-AILab/flash-attention#1203](https://github.com/Dao-AILab/flash-attention/pull/1203) gets merged. It was merged. Would it be possible to have it in Ollama?
Author
Owner

@rtaic-coder commented on GitHub (Mar 28, 2025):

I would like this to be added as well.

<!-- gh-comment-id:2762757042 --> @rtaic-coder commented on GitHub (Mar 28, 2025): I would like this to be added as well.
Author
Owner

@xNefas commented on GitHub (Aug 7, 2025):

Friendly reminder, if you find this issue important, leave an upvote, not a comment! Numbers can go a long way in open source motivation!

<!-- gh-comment-id:3162150905 --> @xNefas commented on GitHub (Aug 7, 2025): Friendly reminder, if you find this issue important, leave an upvote, not a comment! Numbers can go a long way in open source motivation!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30161