[PR #9118] [CLOSED] Draft MLX go backend for new engine #44114

Closed
opened 2026-04-24 23:38:48 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9118
Author: @dhiltgen
Created: 2/14/2025
Status: Closed

Base: mainHead: next-mlx


📝 Commits (2)

📊 Changes

24 files changed (+2208 additions, -272 deletions)

View changed files

📝 .gitignore (+2 -0)
📝 CMakeLists.txt (+5 -0)
📝 kvcache/causal.go (+90 -59)
📝 kvcache/causal_test.go (+59 -30)
📝 llm/status.go (+1 -0)
📝 ml/backend.go (+55 -4)
📝 ml/backend/backend.go (+1 -0)
📝 ml/backend/ggml/ggml.go (+285 -87)
ml/backend/mlx/CMakeLists.txt (+36 -0)
ml/backend/mlx/mlx.go (+1099 -0)
ml/backend/mlx/quant.go (+328 -0)
📝 ml/nn/attention.go (+10 -10)
📝 ml/nn/linear.go (+3 -1)
model/README.md (+62 -0)
📝 model/model.go (+11 -0)
📝 model/models/gemma2/model.go (+11 -11)
📝 model/models/gemma3/model.go (+6 -6)
📝 model/models/gemma3/model_text.go (+10 -10)
📝 model/models/gemma3/model_vision.go (+5 -7)
📝 model/models/llama/model.go (+8 -9)

...and 4 more files

📄 Description

Replaces #8490 on main.

Carries #9115 which should merge first.

A few key points:

  • Q/K tensor adjustments are applied globally which is incorrect (should be model specific)
  • The cache implementation on MLX is partially functional, but seems to drift after multiple forward passes and needs more work
  • Only llama3 fp16 models load currently - more work needed to get the other models working, and support more quantizations
  • Temporary env var to toggle which backend OLLAMA_BACKEND set to ggml or mlx

To see it working:

cmake -S . -B build
cmake --build build -j 
go build .
OLLAMA_NEW_ENGINE=1 OLLAMA_BACKEND=mlx ollama serve

Then

ollama run llama3.1:8b-instruct-fp16

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9118 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 2/14/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `next-mlx` --- ### 📝 Commits (2) - [`1296b39`](https://github.com/ollama/ollama/commit/1296b3999ec5d4c15f32f5ac8311da94cdb808c4) Row Order model definitions - [`d1a7607`](https://github.com/ollama/ollama/commit/d1a7607a232fec84b147b0fb007f3c04870b03e1) Add MLX Backend POC ### 📊 Changes **24 files changed** (+2208 additions, -272 deletions) <details> <summary>View changed files</summary> 📝 `.gitignore` (+2 -0) 📝 `CMakeLists.txt` (+5 -0) 📝 `kvcache/causal.go` (+90 -59) 📝 `kvcache/causal_test.go` (+59 -30) 📝 `llm/status.go` (+1 -0) 📝 `ml/backend.go` (+55 -4) 📝 `ml/backend/backend.go` (+1 -0) 📝 `ml/backend/ggml/ggml.go` (+285 -87) ➕ `ml/backend/mlx/CMakeLists.txt` (+36 -0) ➕ `ml/backend/mlx/mlx.go` (+1099 -0) ➕ `ml/backend/mlx/quant.go` (+328 -0) 📝 `ml/nn/attention.go` (+10 -10) 📝 `ml/nn/linear.go` (+3 -1) ➕ `model/README.md` (+62 -0) 📝 `model/model.go` (+11 -0) 📝 `model/models/gemma2/model.go` (+11 -11) 📝 `model/models/gemma3/model.go` (+6 -6) 📝 `model/models/gemma3/model_text.go` (+10 -10) 📝 `model/models/gemma3/model_vision.go` (+5 -7) 📝 `model/models/llama/model.go` (+8 -9) _...and 4 more files_ </details> ### 📄 Description Replaces #8490 on main. Carries #9115 which should merge first. A few key points: * Q/K tensor adjustments are applied globally which is incorrect (should be model specific) * The cache implementation on MLX is partially functional, but seems to drift after multiple forward passes and needs more work * Only llama3 fp16 models load currently - more work needed to get the other models working, and support more quantizations * Temporary env var to toggle which backend OLLAMA_BACKEND set to ggml or mlx To see it working: ``` cmake -S . -B build cmake --build build -j go build . OLLAMA_NEW_ENGINE=1 OLLAMA_BACKEND=mlx ollama serve ``` Then ``` ollama run llama3.1:8b-instruct-fp16 ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 23:38:48 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44114