[PR #12947] [CLOSED] ggml-cpu: Enable matrix-math accelerator for power10 #14008

Closed
opened 2026-04-13 00:42:30 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12947
Author: @shalinib-ibm
Created: 11/4/2025
Status: Closed

Base: mainHead: enable_mma


📝 Commits (1)

  • a5b5c5b ggml-cpu: Enable matrix math accelerator for power10

📊 Changes

2 files changed (+14 additions, -0 deletions)

View changed files

ml/backend/ggml/ggml/src/ggml-cpu/llamafile/llamafile_ppc64le_power10.go (+7 -0)
ml/backend/ggml/ggml/src/ggml-cpu/llamafile/llamafile_ppc64le_power9.go (+7 -0)

📄 Description

Adding -mcpu=power10 improves matrix multiplication performance when running Ollama on PowerPC-based hardware. -mcpu=power10 needs to be added in llamafile.go, so that powerpc optimized code(using Matrix Multiply Assist) for llamafile_sgemm is enabled and is available in ollama binary.

This changes adds -mcpu=power10 flag when built with build tag ppc64le.power10 and this enables mma optimizations in ollama binary

-mcpu=power9 flag is added when built with build tag ppc64le.power9 and this enables vsx optimizations in ollama binary.

When building on power10 machine use
go build --tags ppc64le.power10 .

When building on power9 machine use
go build --tags ppc64le.power9 .

Performance Impact:

Improved performance on Power10 Chips for Q4_0,Q8_0,FP32,BF16 Models. Inference time with ollama run llama3:8b ( Q4_0 Model) ( ~ 30% less time for a 50 word summarization of a prompt with 512 tokens.
with MMA enabled : 6.05 sec
without MMA (Base) : 8.45 sec


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12947 **Author:** [@shalinib-ibm](https://github.com/shalinib-ibm) **Created:** 11/4/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `enable_mma` --- ### 📝 Commits (1) - [`a5b5c5b`](https://github.com/ollama/ollama/commit/a5b5c5b98e2abef14158cc5eeb9293c4083a8205) ggml-cpu: Enable matrix math accelerator for power10 ### 📊 Changes **2 files changed** (+14 additions, -0 deletions) <details> <summary>View changed files</summary> ➕ `ml/backend/ggml/ggml/src/ggml-cpu/llamafile/llamafile_ppc64le_power10.go` (+7 -0) ➕ `ml/backend/ggml/ggml/src/ggml-cpu/llamafile/llamafile_ppc64le_power9.go` (+7 -0) </details> ### 📄 Description Adding -mcpu=power10 improves matrix multiplication performance when running Ollama on PowerPC-based hardware. -mcpu=power10 needs to be added in llamafile.go, so that powerpc optimized code(using Matrix Multiply Assist) for llamafile_sgemm is enabled and is available in ollama binary. This changes adds -mcpu=power10 flag when built with build tag ppc64le.power10 and this enables mma optimizations in ollama binary -mcpu=power9 flag is added when built with build tag ppc64le.power9 and this enables vsx optimizations in ollama binary. When building on power10 machine use go build --tags ppc64le.power10 . When building on power9 machine use go build --tags ppc64le.power9 . Performance Impact: Improved performance on Power10 Chips for Q4_0,Q8_0,FP32,BF16 Models. Inference time with ollama run llama3:8b ( Q4_0 Model) ( ~ 30% less time for a 50 word summarization of a prompt with 512 tokens. with MMA enabled : 6.05 sec without MMA (Base) : 8.45 sec --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:42:30 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14008