[PR #8490] [CLOSED] Draft MLX go backend for new engine #23256

Closed
opened 2026-04-19 16:52:24 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/8490
Author: @dhiltgen
Created: 1/19/2025
Status: Closed

Base: jessegross/new_runnerHead: next-mlx


📝 Commits (10+)

  • fddb4fc next
  • 35a36eb fix linter
  • 9e4dd6c refactor prcess text tests
  • 65e4872 add model/cmd/main.go as test
  • 0a449dd rm long_test
  • 034d64b model: benchmark bpe split
  • f5726b0 ggml-backend: Let GGML allocate context memory
  • 83c9c04 ggml-backend: Support graph computation that does not return an output
  • 6727e25 ggml-backend.go: Don't do async computation when returning data
  • c636bec backend: Don't return an error on Close

📊 Changes

87 files changed (+479298 additions, -521 deletions)

View changed files

📝 .gitignore (+2 -0)
📝 CMakeLists.txt (+5 -0)
📝 cmd/cmd.go (+5 -2)
📝 cmd/runner/main.go (+1 -1)
📝 convert/convert.go (+16 -16)
📝 convert/convert_bert.go (+5 -5)
📝 convert/convert_commandr.go (+5 -5)
📝 convert/convert_gemma.go (+5 -5)
📝 convert/convert_gemma2.go (+2 -4)
📝 convert/convert_gemma2_adapter.go (+5 -5)
📝 convert/convert_llama.go (+6 -6)
📝 convert/convert_llama_adapter.go (+5 -5)
📝 convert/convert_mixtral.go (+5 -5)
📝 convert/convert_phi3.go (+7 -7)
📝 convert/convert_qwen2.go (+5 -5)
📝 convert/convert_test.go (+6 -6)
📝 envconfig/config.go (+3 -0)
📝 fs/ggml/ggml.go (+111 -95)
📝 fs/ggml/gguf.go (+6 -7)
📝 fs/ggml/type.go (+2 -7)

...and 67 more files

📄 Description

Replaced by #9118 on main.

Updated to carry #8731 which should merge first to implement row-order model definitions.

A few key points:

  • Q/K tensor adjustments are applied globally which is incorrect (should be model specific)
  • The cache implementation on MLX is partially functional, but seems to drift after multiple forward passes and needs more work
  • Needs performance tuning
  • Temporary env var to toggle which backend OLLAMA_BACKEND set to ggml or mlx

To see it working:

cmake -S . -B build
cmake --build build -j 
go build .
OLLAMA_BACKEND=mlx ollama serve

Then

ollama run llama3.1:8b-instruct-fp16

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/8490 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 1/19/2025 **Status:** ❌ Closed **Base:** `jessegross/new_runner` ← **Head:** `next-mlx` --- ### 📝 Commits (10+) - [`fddb4fc`](https://github.com/ollama/ollama/commit/fddb4fc1d96efffe0068c7f750036aace4adf798) next - [`35a36eb`](https://github.com/ollama/ollama/commit/35a36ebb7e739b08d8deb26ee6017f5f4ab082a0) fix linter - [`9e4dd6c`](https://github.com/ollama/ollama/commit/9e4dd6c640add2189fe9136f1515249974f8ec33) refactor prcess text tests - [`65e4872`](https://github.com/ollama/ollama/commit/65e487291ac0e904153239b9120194b409421b65) add model/cmd/main.go as test - [`0a449dd`](https://github.com/ollama/ollama/commit/0a449dd3fd211820154a555fc4b1eb88ff5c4216) rm long_test - [`034d64b`](https://github.com/ollama/ollama/commit/034d64b98c863b5ced4512741ff7d79f4036be02) model: benchmark bpe split - [`f5726b0`](https://github.com/ollama/ollama/commit/f5726b00cb8c7d1496cf63f002354c528cee345c) ggml-backend: Let GGML allocate context memory - [`83c9c04`](https://github.com/ollama/ollama/commit/83c9c04ba4989d19aaa4810b728ce3c8d32ed5f9) ggml-backend: Support graph computation that does not return an output - [`6727e25`](https://github.com/ollama/ollama/commit/6727e25ad0575208c8f496d1837acca11b86f4d6) ggml-backend.go: Don't do async computation when returning data - [`c636bec`](https://github.com/ollama/ollama/commit/c636bec6d0718f852d28be88ba67d801bbba5f35) backend: Don't return an error on Close ### 📊 Changes **87 files changed** (+479298 additions, -521 deletions) <details> <summary>View changed files</summary> 📝 `.gitignore` (+2 -0) 📝 `CMakeLists.txt` (+5 -0) 📝 `cmd/cmd.go` (+5 -2) 📝 `cmd/runner/main.go` (+1 -1) 📝 `convert/convert.go` (+16 -16) 📝 `convert/convert_bert.go` (+5 -5) 📝 `convert/convert_commandr.go` (+5 -5) 📝 `convert/convert_gemma.go` (+5 -5) 📝 `convert/convert_gemma2.go` (+2 -4) 📝 `convert/convert_gemma2_adapter.go` (+5 -5) 📝 `convert/convert_llama.go` (+6 -6) 📝 `convert/convert_llama_adapter.go` (+5 -5) 📝 `convert/convert_mixtral.go` (+5 -5) 📝 `convert/convert_phi3.go` (+7 -7) 📝 `convert/convert_qwen2.go` (+5 -5) 📝 `convert/convert_test.go` (+6 -6) 📝 `envconfig/config.go` (+3 -0) 📝 `fs/ggml/ggml.go` (+111 -95) 📝 `fs/ggml/gguf.go` (+6 -7) 📝 `fs/ggml/type.go` (+2 -7) _...and 67 more files_ </details> ### 📄 Description Replaced by #9118 on main. Updated to carry #8731 which should merge first to implement row-order model definitions. A few key points: - Q/K tensor adjustments are applied globally which is incorrect (should be model specific) - The cache implementation on MLX is partially functional, but seems to drift after multiple forward passes and needs more work - Needs performance tuning - Temporary env var to toggle which backend `OLLAMA_BACKEND` set to `ggml` or `mlx` To see it working: ``` cmake -S . -B build cmake --build build -j go build . OLLAMA_BACKEND=mlx ollama serve ``` Then ``` ollama run llama3.1:8b-instruct-fp16 ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 16:52:24 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23256