[PR #15276] ggml: add -ffast-math to HIP build for CUDA parity #77394

Open
opened 2026-05-05 10:04:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15276
Author: @jodendaal
Created: 4/3/2026
Status: 🔄 Open

Base: mainHead: ggml-hip-ffast-math


📝 Commits (1)

  • cd3e2a8 ggml: add -ffast-math to HIP build for CUDA parity

📊 Changes

1 file changed (+4 additions, -0 deletions)

View changed files

📝 ml/backend/ggml/ggml/src/ggml-hip/CMakeLists.txt (+4 -0)

📄 Description

Summary

CUDA builds use -use_fast_math (set in ggml-cuda/CMakeLists.txt) but HIP builds do not have the equivalent flag. This adds -ffast-math to CMAKE_HIP_FLAGS in ggml-hip/CMakeLists.txt, enabling:

  • Fused multiply-add (FMA) instructions
  • Flush denormals to zero
  • Fast approximations for math functions (exp, log, sin, cos, etc.)

This brings HIP/ROCm builds to parity with what CUDA already does.

Benchmarks

Tested on AMD RX 7900 XTX (gfx1100, RDNA3) with ROCm 6.4, using qwen2.5-coder:14b (Q4_K_M):

Benchmark Baseline (gen t/s) With -ffast-math (gen t/s) Change
code-gen 60.81 64.20 +5.6%
short-explanation 61.01 63.44 +4.0%
list-generation 61.59 63.68 +3.4%

Prompt evaluation throughput was unchanged.

Change

One line added to ml/backend/ggml/ggml/src/ggml-hip/CMakeLists.txt:

set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} -ffast-math")

Testing

  • Built and ran on ROCm 6.4, gfx1100
  • No correctness issues observed across multiple models and prompt types
  • Output quality is unchanged (fast math approximations are well within acceptable tolerance for inference)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15276 **Author:** [@jodendaal](https://github.com/jodendaal) **Created:** 4/3/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `ggml-hip-ffast-math` --- ### 📝 Commits (1) - [`cd3e2a8`](https://github.com/ollama/ollama/commit/cd3e2a86d4b37bd63acdff02d699d2cc90192e0a) ggml: add -ffast-math to HIP build for CUDA parity ### 📊 Changes **1 file changed** (+4 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `ml/backend/ggml/ggml/src/ggml-hip/CMakeLists.txt` (+4 -0) </details> ### 📄 Description ## Summary CUDA builds use `-use_fast_math` (set in `ggml-cuda/CMakeLists.txt`) but HIP builds do not have the equivalent flag. This adds `-ffast-math` to `CMAKE_HIP_FLAGS` in `ggml-hip/CMakeLists.txt`, enabling: - Fused multiply-add (FMA) instructions - Flush denormals to zero - Fast approximations for math functions (exp, log, sin, cos, etc.) This brings HIP/ROCm builds to parity with what CUDA already does. ## Benchmarks Tested on **AMD RX 7900 XTX** (gfx1100, RDNA3) with ROCm 6.4, using `qwen2.5-coder:14b` (Q4_K_M): | Benchmark | Baseline (gen t/s) | With -ffast-math (gen t/s) | Change | |---|---|---|---| | code-gen | 60.81 | 64.20 | **+5.6%** | | short-explanation | 61.01 | 63.44 | **+4.0%** | | list-generation | 61.59 | 63.68 | **+3.4%** | Prompt evaluation throughput was unchanged. ## Change One line added to `ml/backend/ggml/ggml/src/ggml-hip/CMakeLists.txt`: ```cmake set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} -ffast-math") ``` ## Testing - Built and ran on ROCm 6.4, gfx1100 - No correctness issues observed across multiple models and prompt types - Output quality is unchanged (fast math approximations are well within acceptable tolerance for inference) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:04:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77394