[PR #15403] [MERGED] gemma4: Disable FA on older CUDA GPUs #25678

Closed
opened 2026-04-19 18:20:57 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15403
Author: @dhiltgen
Created: 4/7/2026
Status: Merged
Merged: 4/7/2026
Merged by: @dhiltgen

Base: mainHead: gemma4-fa


📝 Commits (1)

  • 18e246e gemma4: Disable FA on older GPUs where it doesn't work

📊 Changes

1 file changed (+12 additions, -0 deletions)

View changed files

📝 llm/server.go (+12 -0)

📄 Description

CUDA older than 7.5 lack the support to enable flash attention for the model.

/ml/backend/ggml/ggml/src/ggml-cuda/template-instances/../fattn-tile.cuh:1247: fatal error
/tmp/daniel_ollama_test/lib/ollama/libggml-base.so.0(+0x1bae8)[0x7f7dfc02cae8]
/tmp/daniel_ollama_test/lib/ollama/libggml-base.so.0(ggml_print_backtrace+0x1e6)[0x7f7dfc02ceb6]
/tmp/daniel_ollama_test/lib/ollama/libggml-base.so.0(ggml_abort+0x11d)[0x7f7dfc02d03d]
/tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(_Z34ggml_cuda_flash_attn_ext_tile_caseILi512ELi512EEvR25ggml_backend_cuda_contextP11ggml_tensor+0x31a)[0x7f7d757fd3ca]
/tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(+0x16bf9d)[0x7f7d7562cf9d]
/tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(+0x16ed0a)[0x7f7d7562fd0a]
/tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(+0x171e56)[0x7f7d75632e56]
/tmp/daniel_ollama_test/bin/ollama(+0x15856d1)[0x557f231926d1]
/tmp/daniel_ollama_test/bin/ollama(+0x14f9f8b)[0x557f23106f8b]
/tmp/daniel_ollama_test/bin/ollama(+0x4001a1)[0x557f2200d1a1]
SIGABRT: abort

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15403 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/7/2026 **Status:** ✅ Merged **Merged:** 4/7/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `gemma4-fa` --- ### 📝 Commits (1) - [`18e246e`](https://github.com/ollama/ollama/commit/18e246e3b98992d34139501ac44c0dabb1575a9d) gemma4: Disable FA on older GPUs where it doesn't work ### 📊 Changes **1 file changed** (+12 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+12 -0) </details> ### 📄 Description CUDA older than 7.5 lack the support to enable flash attention for the model. ``` /ml/backend/ggml/ggml/src/ggml-cuda/template-instances/../fattn-tile.cuh:1247: fatal error /tmp/daniel_ollama_test/lib/ollama/libggml-base.so.0(+0x1bae8)[0x7f7dfc02cae8] /tmp/daniel_ollama_test/lib/ollama/libggml-base.so.0(ggml_print_backtrace+0x1e6)[0x7f7dfc02ceb6] /tmp/daniel_ollama_test/lib/ollama/libggml-base.so.0(ggml_abort+0x11d)[0x7f7dfc02d03d] /tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(_Z34ggml_cuda_flash_attn_ext_tile_caseILi512ELi512EEvR25ggml_backend_cuda_contextP11ggml_tensor+0x31a)[0x7f7d757fd3ca] /tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(+0x16bf9d)[0x7f7d7562cf9d] /tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(+0x16ed0a)[0x7f7d7562fd0a] /tmp/daniel_ollama_test/lib/ollama/cuda_v12/libggml-cuda.so(+0x171e56)[0x7f7d75632e56] /tmp/daniel_ollama_test/bin/ollama(+0x15856d1)[0x557f231926d1] /tmp/daniel_ollama_test/bin/ollama(+0x14f9f8b)[0x557f23106f8b] /tmp/daniel_ollama_test/bin/ollama(+0x4001a1)[0x557f2200d1a1] SIGABRT: abort ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:20:57 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25678