[PR #15899] llama: add qwen35/qwen35moe architecture support for community GGUFs #77642

Open
opened 2026-05-05 10:19:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15899
Author: @ArkaD171717
Created: 4/30/2026
Status: 🔄 Open

Base: mainHead: fix/qwen35moe-arch-support


📝 Commits (1)

  • 4b8af48 llama: add qwen35/qwen35moe architecture support for community GGUFs

📊 Changes

6 files changed (+1038 additions, -0 deletions)

View changed files

📝 llama/llama.cpp/src/llama-arch.cpp (+67 -0)
📝 llama/llama.cpp/src/llama-arch.h (+4 -0)
📝 llama/llama.cpp/src/llama-model.cpp (+162 -0)
📝 llama/llama.cpp/src/llama-model.h (+5 -0)
📝 llama/llama.cpp/src/models/models.h (+51 -0)
llama/llama.cpp/src/models/qwen35.cpp (+749 -0)

📄 Description

Community GGUFs (e.g. bartowski) use upstream llama.cpp's converter which writes "qwen35moe" as the architecture string. Ollama's vendored llama.cpp only recognizes "qwen3next", causing "unknown model architecture: 'qwen35moe'" errors when loading these files

This adds full graph-building support for the qwen35 and qwen35moe architectures. The key differences from qwen3next are:

  • Separate attn_qkv (QKV) + attn_gate (Z) projections instead of combined ssm_in (QKVZ)
  • Separate ssm_alpha and ssm_beta tensors instead of combined ssm_beta_alpha
  • IMROPE (ggml_rope_multi with sections) instead of NEOX (ggml_rope_ext)

The delta-net chunked/autoregressive math, conv1d pipeline, gated normalization, and MoE FFN logic are identical to qwen3next

Fixes #15898

Note: Will be superseded by #15122 when it lands, intended as a stopgap for users blocked on this


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15899 **Author:** [@ArkaD171717](https://github.com/ArkaD171717) **Created:** 4/30/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/qwen35moe-arch-support` --- ### 📝 Commits (1) - [`4b8af48`](https://github.com/ollama/ollama/commit/4b8af487e450b18c24f5a522441fffdd53506f6a) llama: add qwen35/qwen35moe architecture support for community GGUFs ### 📊 Changes **6 files changed** (+1038 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.cpp/src/llama-arch.cpp` (+67 -0) 📝 `llama/llama.cpp/src/llama-arch.h` (+4 -0) 📝 `llama/llama.cpp/src/llama-model.cpp` (+162 -0) 📝 `llama/llama.cpp/src/llama-model.h` (+5 -0) 📝 `llama/llama.cpp/src/models/models.h` (+51 -0) ➕ `llama/llama.cpp/src/models/qwen35.cpp` (+749 -0) </details> ### 📄 Description Community GGUFs (e.g. bartowski) use upstream llama.cpp's converter which writes "qwen35moe" as the architecture string. Ollama's vendored llama.cpp only recognizes "qwen3next", causing "unknown model architecture: 'qwen35moe'" errors when loading these files This adds full graph-building support for the qwen35 and qwen35moe architectures. The key differences from qwen3next are: - Separate attn_qkv (QKV) + attn_gate (Z) projections instead of combined ssm_in (QKVZ) - Separate ssm_alpha and ssm_beta tensors instead of combined ssm_beta_alpha - IMROPE (ggml_rope_multi with sections) instead of NEOX (ggml_rope_ext) The delta-net chunked/autoregressive math, conv1d pipeline, gated normalization, and MoE FFN logic are identical to qwen3next Fixes #15898 Note: Will be superseded by #15122 when it lands, intended as a stopgap for users blocked on this --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:19:08 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77642