[PR #15409] [MERGED] mlx: mixed-precision quant and capability detection improvements #61841

Closed
opened 2026-04-29 16:50:46 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15409
Author: @dhiltgen
Created: 4/8/2026
Status: Merged
Merged: 4/13/2026
Merged by: @dhiltgen

Base: mainHead: mlx-create-improvements


📝 Commits (1)

  • b80896a mlx: mixed-precision quant and capability detection improvements

📊 Changes

7 files changed (+368 additions, -87 deletions)

View changed files

📝 x/create/client/create.go (+23 -13)
📝 x/create/client/create_test.go (+89 -1)
📝 x/create/client/quantize.go (+91 -28)
📝 x/create/create.go (+42 -45)
📝 x/create/create_test.go (+98 -0)
📝 x/mlxrunner/mlx/mlx.go (+7 -0)
📝 x/mlxrunner/model/root.go (+18 -0)

📄 Description

Improve the MLX model creation pipeline with several model-agnostic changes:

  • Rewrite supportsVision to use vision_config instead of architecture name
  • Add supportsAudio for audio encoder detection
  • Add alignment checking (isAligned) for quantization group sizes
  • Support per-projection mixed quantization in MoE expert packing
  • Record per-tensor quant metadata in safetensors blobs
  • Parse per-tensor quant metadata at model load time
  • Validate quantize output is non-empty before storing
  • Fix pin/unpin cleanup in expert group quantization
  • Promote v_proj/k_proj/down_proj to INT8 for INT4 base quant
  • Add MetalIsAvailable() utility
  • Skip audio encoder tensors from quantization

Extracted from #15244


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15409 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/8/2026 **Status:** ✅ Merged **Merged:** 4/13/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `mlx-create-improvements` --- ### 📝 Commits (1) - [`b80896a`](https://github.com/ollama/ollama/commit/b80896a5aaee92caa3d4f0aaf25249e40d5010f1) mlx: mixed-precision quant and capability detection improvements ### 📊 Changes **7 files changed** (+368 additions, -87 deletions) <details> <summary>View changed files</summary> 📝 `x/create/client/create.go` (+23 -13) 📝 `x/create/client/create_test.go` (+89 -1) 📝 `x/create/client/quantize.go` (+91 -28) 📝 `x/create/create.go` (+42 -45) 📝 `x/create/create_test.go` (+98 -0) 📝 `x/mlxrunner/mlx/mlx.go` (+7 -0) 📝 `x/mlxrunner/model/root.go` (+18 -0) </details> ### 📄 Description Improve the MLX model creation pipeline with several model-agnostic changes: - Rewrite supportsVision to use vision_config instead of architecture name - Add supportsAudio for audio encoder detection - Add alignment checking (isAligned) for quantization group sizes - Support per-projection mixed quantization in MoE expert packing - Record per-tensor quant metadata in safetensors blobs - Parse per-tensor quant metadata at model load time - Validate quantize output is non-empty before storing - Fix pin/unpin cleanup in expert group quantization - Promote v_proj/k_proj/down_proj to INT8 for INT4 base quant - Add MetalIsAvailable() utility - Skip audio encoder tensors from quantization Extracted from #15244 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:50:46 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61841