[PR #14417] [MERGED] Add qwen3.5-next-moe support to MLX runner and models #14669

Closed
opened 2026-04-13 01:00:24 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14417
Author: @pdevine
Created: 2/25/2026
Status: Merged
Merged: 3/4/2026
Merged by: @pdevine

Base: mainHead: pdevine/mem-qwen


📝 Commits (9)

📊 Changes

14 files changed (+2405 additions, -49 deletions)

View changed files

📝 x/create/create.go (+23 -4)
📝 x/create/create_test.go (+42 -0)
📝 x/mlxrunner/cache.go (+112 -22)
📝 x/mlxrunner/cache/cache.go (+12 -6)
x/mlxrunner/cache/recurrent.go (+161 -0)
📝 x/mlxrunner/imports.go (+2 -0)
x/mlxrunner/mlx/gated_delta.go (+370 -0)
📝 x/mlxrunner/mlx/mlx.go (+1 -1)
📝 x/mlxrunner/mlx/ops_extra.go (+61 -1)
📝 x/mlxrunner/pipeline.go (+25 -15)
📝 x/models/nn/nn.go (+34 -0)
x/models/qwen3_5/qwen3_5.go (+1387 -0)
x/models/qwen3_5/qwen3_5_test.go (+159 -0)
x/models/qwen3_5_moe/qwen3_5_moe.go (+16 -0)

📄 Description

This change:

  • adds support for qwen3.5-next-moe models (qwen3-next/qwen3.5-next/qwen3-coder) to the MLX runner
  • introduces recurrent cache support and related MLX ops
  • a new hybrid cache type for storing mixed cache types
  • updates pipeline/runner integration and adds tests
  • properly quantizes stacked expert tensors
  • a Gated Delta Metal kernel for fast SSM inference
  • new MLX calls for Conv1d, DepthwideConv1d, Contiguous, Exp, Log, SoftmaxAxis

There's also additional code for performance metrics and tuning which can be turned on.

Supercedes #14343


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14417 **Author:** [@pdevine](https://github.com/pdevine) **Created:** 2/25/2026 **Status:** ✅ Merged **Merged:** 3/4/2026 **Merged by:** [@pdevine](https://github.com/pdevine) **Base:** `main` ← **Head:** `pdevine/mem-qwen` --- ### 📝 Commits (9) - [`c1da264`](https://github.com/ollama/ollama/commit/c1da26494ab4c09ec338da03092619882490d420) smaller recurrent cache - [`1ca5581`](https://github.com/ollama/ollama/commit/1ca558116875ed81b8019d99fefb65c8f4c0f71b) add qwen3.5 - [`2e340cb`](https://github.com/ollama/ollama/commit/2e340cb910eac8dd83b66fbaa7895519fd9ea5cf) cleanup - [`d014247`](https://github.com/ollama/ollama/commit/d01424768e714268994b48832b576f66af5054e2) address comments - [`62f43d9`](https://github.com/ollama/ollama/commit/62f43d9d75d5af533abcd89f79f8848fa36933aa) add MaxContextLength() - [`2c1bb0f`](https://github.com/ollama/ollama/commit/2c1bb0f51cef42df7882701262258d69af10d225) comments - [`4c94527`](https://github.com/ollama/ollama/commit/4c94527ba80cf02d476d174478e4ad0a7cd068d2) replace Materialize() - [`786e420`](https://github.com/ollama/ollama/commit/786e420a24c12a8e8a4b636d28224b3ab504ce6e) avoid copy - [`46c79a3`](https://github.com/ollama/ollama/commit/46c79a378185daf7889daad918fd54fe0d81e5ff) comments ### 📊 Changes **14 files changed** (+2405 additions, -49 deletions) <details> <summary>View changed files</summary> 📝 `x/create/create.go` (+23 -4) 📝 `x/create/create_test.go` (+42 -0) 📝 `x/mlxrunner/cache.go` (+112 -22) 📝 `x/mlxrunner/cache/cache.go` (+12 -6) ➕ `x/mlxrunner/cache/recurrent.go` (+161 -0) 📝 `x/mlxrunner/imports.go` (+2 -0) ➕ `x/mlxrunner/mlx/gated_delta.go` (+370 -0) 📝 `x/mlxrunner/mlx/mlx.go` (+1 -1) 📝 `x/mlxrunner/mlx/ops_extra.go` (+61 -1) 📝 `x/mlxrunner/pipeline.go` (+25 -15) 📝 `x/models/nn/nn.go` (+34 -0) ➕ `x/models/qwen3_5/qwen3_5.go` (+1387 -0) ➕ `x/models/qwen3_5/qwen3_5_test.go` (+159 -0) ➕ `x/models/qwen3_5_moe/qwen3_5_moe.go` (+16 -0) </details> ### 📄 Description This change: * adds support for qwen3.5-next-moe models (qwen3-next/qwen3.5-next/qwen3-coder) to the MLX runner * introduces recurrent cache support and related MLX ops * a new hybrid cache type for storing mixed cache types * updates pipeline/runner integration and adds tests * properly quantizes stacked expert tensors * a Gated Delta Metal kernel for fast SSM inference * new MLX calls for Conv1d, DepthwideConv1d, Contiguous, Exp, Log, SoftmaxAxis There's also additional code for performance metrics and tuning which can be turned on. Supercedes #14343 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:00:24 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14669