[PR #13672] fix: check ComputeMajor instead of DriverMajor for flash attention support #24869

Open
opened 2026-04-19 17:51:15 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13672
Author: @dmealing
Created: 1/11/2026
Status: 🔄 Open

Base: mainHead: fix/flash-attention-compute-capability


📝 Commits (1)

  • 4286ceb fix: check ComputeMajor instead of DriverMajor for flash attention support

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 ml/device.go (+1 -1)

📄 Description

Summary

Fixes a bug where FlashAttentionSupported() checks gpu.DriverMajor (CUDA driver version) instead of gpu.ComputeMajor (GPU compute capability), causing crashes on older NVIDIA GPUs.

Problem

On Maxwell-era GPUs like the GTX 970:

  • Driver version: 12.2 (passes the >= 7 check )
  • Compute capability: 5.2 (should fail the check )

This causes flash attention to be incorrectly enabled on GPUs that don't support it, resulting in crashes with "exit status 2" when running any model on Ollama 0.12.6+.

The Fix

- (gpu.Library == "CUDA" && gpu.DriverMajor >= 7 && !(gpu.ComputeMajor == 7 && gpu.ComputeMinor == 2)) ||
+ (gpu.Library == "CUDA" && gpu.ComputeMajor >= 7 && !(gpu.ComputeMajor == 7 && gpu.ComputeMinor == 2)) ||

Flash attention on CUDA requires compute capability 7.0+ (Volta architecture and newer). The existing exclusion for CC 7.2 (Jetson Xavier) is preserved.

Affected GPUs

  • Maxwell (CC 5.x): GTX 970, GTX 980, GTX Titan X, etc.
  • Pascal (CC 6.x): GTX 1080, GTX 1070, GTX 1060, etc.

Testing

Tested on NVIDIA GeForce GTX 970 (CC 5.2) with CUDA driver 12.2:

  • Before fix: Crashes with "exit status 2" on any model
  • After fix: Works correctly (flash attention auto-disabled)

Workaround for affected users

Until this fix is released, set OLLAMA_FLASH_ATTENTION=false in your environment or systemd service file.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13672 **Author:** [@dmealing](https://github.com/dmealing) **Created:** 1/11/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/flash-attention-compute-capability` --- ### 📝 Commits (1) - [`4286ceb`](https://github.com/ollama/ollama/commit/4286ceb58925d31c2a790ca71930f4a0dd774174) fix: check ComputeMajor instead of DriverMajor for flash attention support ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `ml/device.go` (+1 -1) </details> ### 📄 Description ## Summary Fixes a bug where `FlashAttentionSupported()` checks `gpu.DriverMajor` (CUDA driver version) instead of `gpu.ComputeMajor` (GPU compute capability), causing crashes on older NVIDIA GPUs. ## Problem On Maxwell-era GPUs like the GTX 970: - **Driver version**: 12.2 (passes the `>= 7` check ✅) - **Compute capability**: 5.2 (should fail the check ❌) This causes flash attention to be incorrectly enabled on GPUs that don't support it, resulting in crashes with "exit status 2" when running any model on Ollama 0.12.6+. ## The Fix ```diff - (gpu.Library == "CUDA" && gpu.DriverMajor >= 7 && !(gpu.ComputeMajor == 7 && gpu.ComputeMinor == 2)) || + (gpu.Library == "CUDA" && gpu.ComputeMajor >= 7 && !(gpu.ComputeMajor == 7 && gpu.ComputeMinor == 2)) || ``` Flash attention on CUDA requires compute capability 7.0+ (Volta architecture and newer). The existing exclusion for CC 7.2 (Jetson Xavier) is preserved. ## Affected GPUs - Maxwell (CC 5.x): GTX 970, GTX 980, GTX Titan X, etc. - Pascal (CC 6.x): GTX 1080, GTX 1070, GTX 1060, etc. ## Testing Tested on NVIDIA GeForce GTX 970 (CC 5.2) with CUDA driver 12.2: - **Before fix**: Crashes with "exit status 2" on any model - **After fix**: Works correctly (flash attention auto-disabled) ## Workaround for affected users Until this fix is released, set `OLLAMA_FLASH_ATTENTION=false` in your environment or systemd service file. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:51:15 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#24869