[GH-ISSUE #14587] qwen3next: layer 0 missing attn_qkv/attn_gate projections #55971

New Issue

GiteaMirror · 2026-04-29T10:05:14-05:00

GiteaMirror commented

2026-04-29 10:05:14 -05:00

Originally created by @fcorneli on GitHub (Mar 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14587

What is the issue?

Since Ollama 0.17.5 I get the following error:

ollama run qwen3-next:80b-a3b-instruct-q4_K_M
Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

Relevant log output

OS

Linux

GPU

NVIDIA RTX PRO 6000 Blackwell Workstation Edition

CPU

Intel(R) Core(TM) Ultra 9 285K

Ollama version

0.17.5

Originally created by @fcorneli on GitHub (Mar 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14587 ### What is the issue? Since Ollama 0.17.5 I get the following error: ``` ollama run qwen3-next:80b-a3b-instruct-q4_K_M Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ``` ### Relevant log output ```shell ``` ### OS Linux ### GPU NVIDIA RTX PRO 6000 Blackwell Workstation Edition ### CPU Intel(R) Core(TM) Ultra 9 285K ### Ollama version 0.17.5

GiteaMirror added the bug label 2026-04-29 10:05:14 -05:00

GiteaMirror closed this issue

2026-04-29 10:05:15 -05:00

GiteaMirror commented

2026-04-29 10:05:16 -05:00

@YetheSamartaka commented on GitHub (Mar 3, 2026):

I have the same issue with qwen3-next:80b-a3b-thinking-q8_0. Issue appeared on version 0.17.5

@YetheSamartaka commented on GitHub (Mar 3, 2026): I have the same issue with qwen3-next:80b-a3b-thinking-q8_0. Issue appeared on version 0.17.5

GiteaMirror commented

2026-04-29 10:05:18 -05:00

@fcorneli commented on GitHub (Mar 3, 2026):

Indeed:

ollama run qwen3-next:80b-a3b-thinking-q8_0
Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

@fcorneli commented on GitHub (Mar 3, 2026): Indeed: ``` ollama run qwen3-next:80b-a3b-thinking-q8_0 Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ```

GiteaMirror commented

2026-04-29 10:05:19 -05:00

@Nyx1197 commented on GitHub (Mar 4, 2026):

qwen3-next on 0.17.4 working fine. but can't run qwen3.5.

@Nyx1197 commented on GitHub (Mar 4, 2026): qwen3-next on 0.17.4 working fine. but can't run qwen3.5.

GiteaMirror commented

2026-04-29 10:05:19 -05:00

@fcorneli commented on GitHub (Mar 4, 2026):

Still the same error on ollama version 0.17.6

@fcorneli commented on GitHub (Mar 4, 2026): Still the same error on ollama version 0.17.6

GiteaMirror commented

2026-04-29 10:05:20 -05:00

@D337z commented on GitHub (Mar 5, 2026):

I can't seem to use any of the Qwen Next and Qwen3.5 models since the update. It's been completely broken.

@D337z commented on GitHub (Mar 5, 2026): I can't seem to use any of the Qwen Next and Qwen3.5 models since the update. It's been completely broken.

GiteaMirror commented

2026-04-29 10:05:20 -05:00

@marcinm1234 commented on GitHub (Mar 5, 2026):

[BUG] qwen3-next:80b-a3b-thinking-q8_0 fails to load on GPU after update to 0.17.5 — layer 0 missing attn_qkv/attn_gate projections

Bug Description

After updating Ollama to version 0.17.5, model qwen3-next:80b-a3b-thinking-q8_0
fails to initialize on GPU with the following error:

failed to initialize model: layer 0 missing attn_qkv/attn_gate projections (status code: 500)

Both verification attempts fail, resulting in complete inability to run the model.
The model worked correctly on Ollama 0.17.4 and earlier. This is a regression
introduced in 0.17.5.

Steps to Reproduce

Install Ollama 0.17.5
Run: ollama run qwen3-next:80b-a3b-thinking-q8_0
Observe error in logs on both GPU initialization attempts

Expected Behavior

Model loads and runs correctly on GPU, as it did on Ollama 0.17.4.

Actual Behavior

2026-03-05 10:11:23,113 | ERROR   | GPU Ollama call failed for model
  qwen3-next:80b-a3b-thinking-q8_0: failed to initialize model:
  layer 0 missing attn_qkv/attn_gate projections (status code: 500)

Root Cause Analysis

qwen3-next uses a hybrid SSM+Transformer architecture (DeltaNet-style).
Some layers do not contain attn_qkv / attn_gate projections — they use SSM
tensors instead. It appears Ollama 0.17.5 introduced stricter tensor validation
for hybrid model architectures that incorrectly rejects these layers, causing
initialization to fail entirely.

The model GGUF files have not changed — the regression is on Ollama's side.

Environment

Component	Details
Ollama	0.17.5 (regression from 0.17.4)
OS	Windows 10 IoT Enterprise LTSC (Build 19044)
GPU	4× NVIDIA GeForce RTX 3090
VRAM	4× 24,576 MiB (96 GB total)
GPU Driver	591.74
CPU	Intel Xeon E5-2699A v4 @ 2.40 GHz, 22 cores
RAM	512 GB

GPU Memory at Time of Error

GPU	Free VRAM	State
GPU 0	23,896 MiB	Nearly idle
GPU 1	1,382 MiB	In use
GPU 2	1,796 MiB	In use
GPU 3	690 MiB	In use

Workaround

Downgrading to Ollama 0.17.4 fully resolves the issue.

Regression introduced in: 0.17.5
Last working version: 0.17.4

@marcinm1234 commented on GitHub (Mar 5, 2026): **[BUG] qwen3-next:80b-a3b-thinking-q8_0 fails to load on GPU after update to 0.17.5 — layer 0 missing attn_qkv/attn_gate projections** --- ## Bug Description After updating Ollama to **version 0.17.5**, model `qwen3-next:80b-a3b-thinking-q8_0` fails to initialize on GPU with the following error: > failed to initialize model: layer 0 missing attn_qkv/attn_gate projections (status code: 500) Both verification attempts fail, resulting in complete inability to run the model. The model worked correctly on Ollama **0.17.4** and earlier. This is a regression introduced in 0.17.5. --- ## Steps to Reproduce 1. Install Ollama 0.17.5 2. Run: `ollama run qwen3-next:80b-a3b-thinking-q8_0` 3. Observe error in logs on both GPU initialization attempts --- ## Expected Behavior Model loads and runs correctly on GPU, as it did on Ollama 0.17.4. --- ## Actual Behavior ``` 2026-03-05 10:11:23,113 | ERROR | GPU Ollama call failed for model qwen3-next:80b-a3b-thinking-q8_0: failed to initialize model: layer 0 missing attn_qkv/attn_gate projections (status code: 500) ``` --- ## Root Cause Analysis `qwen3-next` uses a **hybrid SSM+Transformer architecture** (DeltaNet-style). Some layers do not contain `attn_qkv` / `attn_gate` projections — they use SSM tensors instead. It appears Ollama 0.17.5 introduced stricter tensor validation for hybrid model architectures that incorrectly rejects these layers, causing initialization to fail entirely. The model GGUF files have not changed — the regression is on Ollama's side. --- ## Environment | Component | Details | |--------------|---------------------------------------------------| | Ollama | 0.17.5 (regression from 0.17.4) | | OS | Windows 10 IoT Enterprise LTSC (Build 19044) | | GPU | 4× NVIDIA GeForce RTX 3090 | | VRAM | 4× 24,576 MiB (96 GB total) | | GPU Driver | 591.74 | | CPU | Intel Xeon E5-2699A v4 @ 2.40 GHz, 22 cores | | RAM | 512 GB | ### GPU Memory at Time of Error | GPU | Free VRAM | State | |-------|---------------|---------------| | GPU 0 | 23,896 MiB | Nearly idle | | GPU 1 | 1,382 MiB | In use | | GPU 2 | 1,796 MiB | In use | | GPU 3 | 690 MiB | In use | --- ## Workaround Downgrading to **Ollama 0.17.4** fully resolves the issue. --- ## Related Issues - #14587 - #14433 --- **Regression introduced in:** 0.17.5 **Last working version:** 0.17.4

GiteaMirror commented

2026-04-29 10:05:22 -05:00

@fcorneli commented on GitHub (Mar 5, 2026):

It appears Ollama 0.17.5 introduced stricter tensor validation for hybrid model architectures that incorrectly rejects these layers, causing initialization to fail entirely.

The code where it actually fails:
https://github.com/ollama/ollama/blob/main/model/models/qwen3next/model.go#L457

@fcorneli commented on GitHub (Mar 5, 2026): > It appears Ollama 0.17.5 introduced stricter tensor validation for hybrid model architectures that incorrectly rejects these layers, causing initialization to fail entirely. The code where it actually fails: https://github.com/ollama/ollama/blob/main/model/models/qwen3next/model.go#L457

GiteaMirror commented

2026-04-29 10:05:22 -05:00

@MarkMuravev commented on GitHub (Mar 6, 2026):

+1

@MarkMuravev commented on GitHub (Mar 6, 2026): +1

GiteaMirror commented

2026-04-29 10:05:22 -05:00

@MarkMuravev commented on GitHub (Mar 6, 2026):

Still the same error on ollama version 0.17.7

@MarkMuravev commented on GitHub (Mar 6, 2026): Still the same error on ollama version 0.17.7

GiteaMirror commented

2026-04-29 10:05:23 -05:00

@fcorneli commented on GitHub (Mar 6, 2026):

Quick-and-dirty patch:

diff --git a/model/models/qwen3next/model.go b/model/models/qwen3next/model.go
index 9681efda..5bc724c2 100644
--- a/model/models/qwen3next/model.go
+++ b/model/models/qwen3next/model.go
@@ -454,9 +454,9 @@ func (m *Model) Validate() error {
                if !ok || gdn == nil {
                        return fmt.Errorf("qwen3next: layer %d expected recurrent operator", i)
                }
-               if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil {
-                       return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i)
-               }
+               //if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil {
+               //      return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i)
+               //}
                if gdn.SSMBetaAlpha == nil && (gdn.SSMBeta == nil || gdn.SSMAlpha == nil) {
                        return fmt.Errorf("qwen3next: layer %d missing linear attention beta/alpha projections", i)

does not help. Now it gets stuck somewhere else:

panic: failed to build graph: qwen3next: missing attn_qkv/attn_gate projections (legacy ssm_in is not supported)

@fcorneli commented on GitHub (Mar 6, 2026): Quick-and-dirty patch: ```patch diff --git a/model/models/qwen3next/model.go b/model/models/qwen3next/model.go index 9681efda..5bc724c2 100644 --- a/model/models/qwen3next/model.go +++ b/model/models/qwen3next/model.go @@ -454,9 +454,9 @@ func (m *Model) Validate() error { if !ok || gdn == nil { return fmt.Errorf("qwen3next: layer %d expected recurrent operator", i) } - if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil { - return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i) - } + //if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil { + // return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i) + //} if gdn.SSMBetaAlpha == nil && (gdn.SSMBeta == nil || gdn.SSMAlpha == nil) { return fmt.Errorf("qwen3next: layer %d missing linear attention beta/alpha projections", i) ``` does not help. Now it gets stuck somewhere else: ``` panic: failed to build graph: qwen3next: missing attn_qkv/attn_gate projections (legacy ssm_in is not supported) ```

GiteaMirror commented

2026-04-29 10:05:24 -05:00

@fcorneli commented on GitHub (Mar 6, 2026):

I've let Claude Code go wild on this one. It works somehow.
https://github.com/ollama/ollama/pull/14675

@fcorneli commented on GitHub (Mar 6, 2026): I've let Claude Code go wild on this one. It works somehow. https://github.com/ollama/ollama/pull/14675

GiteaMirror commented

2026-04-29 10:05:24 -05:00

@fcorneli commented on GitHub (Mar 6, 2026):

Below what this thing had to say:

qwen3next: layer 0 missing attn_qkv/attn_gate projections

Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b.
We get the following error:

Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

On Ollama version 0.17.4 this still worked.

The issue has been reported at:
https://github.com/ollama/ollama/issues/14587

Verification

Run the Ollama server via:

go run . serve

Via:

go run . run qwen3-next:80b

you can verify whether the model loads or not.

Analysis and Patch

Root Cause

Commit 8da09b1e ("qwen3next: add compatibility with imported GGUF models") introduced:

A Validate() function that runs after tensor loading to check for required tensors.
A compatibility path in inferRecurrentLayers() for imported GGUFs where
attention.head_count_kv is stored as a scalar (all-non-zero after expansion).

For GGUFs imported from llama.cpp (which is what qwen3-next:80b on Ollama Hub is),
the attention.head_count_kv scalar causes the compatibility path to correctly
infer which layers are recurrent. However, the imported GGUF uses
blk.N.ssm_in.weight — a combined tensor that interleaves Q, K, V, Z per K-head —
instead of the split blk.N.attn_qkv.weight + blk.N.attn_gate.weight tensors
that Ollama-native conversions produce.

Because populateFields cannot find blk.0.attn_qkv.weight, it leaves
GatedDeltaNet.SSMQKV = nil. The new Validate() then immediately fails with
"layer 0 missing attn_qkv/attn_gate projections".

In v0.17.4, the old New() code required a mix of zero and non-zero values in
headCountKV. For imported GGUFs (scalar, all-non-zero), it failed earlier with
"invalid attention.head_count_kv array; expected mix of zero and non-zero values".
The 8da09b1e commit fixed that first failure but exposed the tensor-name mismatch.

ssm_in layout

For each K-head j (j = 0 .. numKHeads−1), ssm_in.weight stores
qkvzDim = 2*headKDim + 2*vPerHead rows:

rows [j*qkvzDim .. j*qkvzDim + headKDim)           → Q for K-head j
rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j
rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V
rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim)          → Z

where vPerHead = headVDim * numVHeads / numKHeads.

Fix

Two small changes:

model/models/qwen3next/deltanet.go

Added SSMIn *nn.Linear \gguf:"ssm_in"`toGatedDeltaNet`.
Replaced the hard failure on missing SSMQKV/SSMQKVGate with a switch:
- If SSMQKV + SSMQKVGate are present: existing fast path (unchanged).
- If SSMIn is present: reshape the combined projection to
  [qkvzDim, numKHeads, nSeqTokens, nSeqs], then use Slice + Contiguous
  to extract Q, K, V, Z in the same memory layout that attn_qkv/attn_gate
  produce, and concatenate Q+K+V into qkvMixed. The rest of Forward() is
  unchanged.

model/models/qwen3next/model.go

In Validate(), changed the check to:

if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) {

so that a layer carrying ssm_in is accepted.

@fcorneli commented on GitHub (Mar 6, 2026): Below what this thing had to say: # qwen3next: layer 0 missing attn_qkv/attn_gate projections Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b. We get the following error: ``` Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ``` On Ollama version 0.17.4 this still worked. The issue has been reported at: https://github.com/ollama/ollama/issues/14587 # Verification Run the Ollama server via: ``` go run . serve ``` Via: ``` go run . run qwen3-next:80b ``` you can verify whether the model loads or not. # Analysis and Patch ## Root Cause Commit `8da09b1e` ("qwen3next: add compatibility with imported GGUF models") introduced: 1. A `Validate()` function that runs after tensor loading to check for required tensors. 2. A compatibility path in `inferRecurrentLayers()` for imported GGUFs where `attention.head_count_kv` is stored as a **scalar** (all-non-zero after expansion). For GGUFs imported from llama.cpp (which is what `qwen3-next:80b` on Ollama Hub is), the `attention.head_count_kv` scalar causes the compatibility path to correctly infer which layers are recurrent. However, the imported GGUF uses `blk.N.ssm_in.weight` — a combined tensor that interleaves Q, K, V, Z per K-head — instead of the split `blk.N.attn_qkv.weight` + `blk.N.attn_gate.weight` tensors that Ollama-native conversions produce. Because `populateFields` cannot find `blk.0.attn_qkv.weight`, it leaves `GatedDeltaNet.SSMQKV = nil`. The new `Validate()` then immediately fails with "layer 0 missing attn_qkv/attn_gate projections". In v0.17.4, the old `New()` code required a mix of zero and non-zero values in `headCountKV`. For imported GGUFs (scalar, all-non-zero), it failed earlier with "invalid attention.head_count_kv array; expected mix of zero and non-zero values". The `8da09b1e` commit fixed that first failure but exposed the tensor-name mismatch. ## ssm_in layout For each K-head `j` (j = 0 .. numKHeads−1), `ssm_in.weight` stores `qkvzDim = 2*headKDim + 2*vPerHead` rows: ``` rows [j*qkvzDim .. j*qkvzDim + headKDim) → Q for K-head j rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim) → Z ``` where `vPerHead = headVDim * numVHeads / numKHeads`. ## Fix Two small changes: **`model/models/qwen3next/deltanet.go`** - Added `SSMIn *nn.Linear \`gguf:"ssm_in"\`` to `GatedDeltaNet`. - Replaced the hard failure on missing `SSMQKV`/`SSMQKVGate` with a `switch`: - If `SSMQKV` + `SSMQKVGate` are present: existing fast path (unchanged). - If `SSMIn` is present: reshape the combined projection to `[qkvzDim, numKHeads, nSeqTokens, nSeqs]`, then use `Slice` + `Contiguous` to extract Q, K, V, Z in the same memory layout that `attn_qkv`/`attn_gate` produce, and concatenate Q+K+V into `qkvMixed`. The rest of `Forward()` is unchanged. **`model/models/qwen3next/model.go`** - In `Validate()`, changed the check to: ```go if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) { ``` so that a layer carrying `ssm_in` is accepted.

GiteaMirror commented

2026-04-29 10:05:25 -05:00

@marcinm1234 commented on GitHub (Mar 13, 2026):

aaaa, this still is not fixed?? Tried latest version, re-dl'ed qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4

@marcinm1234 commented on GitHub (Mar 13, 2026): aaaa, this still is not fixed?? Tried latest version, re-dl'ed qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4

GiteaMirror commented

2026-04-29 10:05:25 -05:00

@D337z commented on GitHub (Mar 13, 2026):

There are just a few things that need to be fixed in the code. Don't
worry, I am actively working on a patch for it, but I have a life of my own
to handle. On top of that, there's no guarantee that the patch will be
accepted when I'm done as I'm not the creator nor maintainer of the
project. I'm also working on the Vulkan fixes to properly address dynamic
memory boundary shifts as well as possibly adding a dynamic anti-lag method
to allow the system to remain stable during heavy inference, but that's on
the back burner as it requires a lot more tie-ins and a rework of the
memory management that would not be easy. But don't worry, it isn't being
ignored entirely.

On Fri, Mar 13, 2026 at 2:25 AM marcinm1234 @.***>
wrote:

marcinm1234 left a comment (ollama/ollama#14587)
https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676

aaaa, this still is not fixed?? Tried latest version, re-dl'ed
qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4

—
Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ATOTXI4QRM5LNTVO5TTOUID4QOZWHAVCNFSM6AAAAACWFJQNLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANJTGI4TGNRXGY
.
You are receiving this because you commented.Message ID:
@.***>

@D337z commented on GitHub (Mar 13, 2026): There are just a few things that need to be fixed in the code. Don't worry, I am actively working on a patch for it, but I have a life of my own to handle. On top of that, there's no guarantee that the patch will be accepted when I'm done as I'm not the creator nor maintainer of the project. I'm also working on the Vulkan fixes to properly address dynamic memory boundary shifts as well as possibly adding a dynamic anti-lag method to allow the system to remain stable during heavy inference, but that's on the back burner as it requires a lot more tie-ins and a rework of the memory management that would not be easy. But don't worry, it isn't being ignored entirely. On Fri, Mar 13, 2026 at 2:25 AM marcinm1234 ***@***.***> wrote: > *marcinm1234* left a comment (ollama/ollama#14587) > <https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676> > > aaaa, this still is not fixed?? Tried latest version, re-dl'ed > qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4 > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ATOTXI4QRM5LNTVO5TTOUID4QOZWHAVCNFSM6AAAAACWFJQNLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANJTGI4TGNRXGY> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

GiteaMirror commented

2026-04-29 10:05:26 -05:00

@Glitch3dPenguin commented on GitHub (Mar 14, 2026):

I can also confirm that Ollama version 0.18.0 and Open WebUI version 0.8.10 this is still an issue. Thanks for all the hard work in this issue!

@Glitch3dPenguin commented on GitHub (Mar 14, 2026): I can also confirm that Ollama version 0.18.0 and Open WebUI version 0.8.10 this is still an issue. Thanks for all the hard work in this issue!

GiteaMirror commented

2026-04-29 10:05:26 -05:00

@heapsoftware commented on GitHub (Mar 14, 2026):

Same issue here running with 2 5090s still on v0.17.8-rc4

@heapsoftware commented on GitHub (Mar 14, 2026): Same issue here running with 2 5090s still on v0.17.8-rc4

GiteaMirror commented

2026-04-29 10:05:26 -05:00

@Node0 commented on GitHub (Mar 25, 2026):

Below what this thing had to say:

qwen3next: layer 0 missing attn_qkv/attn_gate projections

Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b. We get the following error:
Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections
On Ollama version 0.17.4 this still worked.

The issue has been reported at: #14587

Verification

Run the Ollama server via:
go run . serve
Via:
go run . run qwen3-next:80b
you can verify whether the model loads or not.

Analysis and Patch

Root Cause

Commit 8da09b1e ("qwen3next: add compatibility with imported GGUF models") introduced:

A Validate() function that runs after tensor loading to check for required tensors.

A compatibility path in inferRecurrentLayers() for imported GGUFs where
attention.head_count_kv is stored as a scalar (all-non-zero after expansion).

For GGUFs imported from llama.cpp (which is what qwen3-next:80b on Ollama Hub is), the attention.head_count_kv scalar causes the compatibility path to correctly infer which layers are recurrent. However, the imported GGUF uses blk.N.ssm_in.weight — a combined tensor that interleaves Q, K, V, Z per K-head — instead of the split blk.N.attn_qkv.weight + blk.N.attn_gate.weight tensors that Ollama-native conversions produce.

Because populateFields cannot find blk.0.attn_qkv.weight, it leaves GatedDeltaNet.SSMQKV = nil. The new Validate() then immediately fails with "layer 0 missing attn_qkv/attn_gate projections".

In v0.17.4, the old New() code required a mix of zero and non-zero values in headCountKV. For imported GGUFs (scalar, all-non-zero), it failed earlier with "invalid attention.head_count_kv array; expected mix of zero and non-zero values". The 8da09b1e commit fixed that first failure but exposed the tensor-name mismatch.

ssm_in layout

For each K-head j (j = 0 .. numKHeads−1), ssm_in.weight stores qkvzDim = 2*headKDim + 2*vPerHead rows:
rows [j*qkvzDim .. j*qkvzDim + headKDim)           → Q for K-head j
rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j
rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V
rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim)          → Z
where vPerHead = headVDim * numVHeads / numKHeads.

Fix

Two small changes:

model/models/qwen3next/deltanet.go

Added SSMIn *nn.Linear \gguf:"ssm_in"``toGatedDeltaNet.

Replaced the hard failure on missing SSMQKV/SSMQKVGate with a switch:

If SSMQKV + SSMQKVGate are present: existing fast path (unchanged).

If SSMIn is present: reshape the combined projection to
[qkvzDim, numKHeads, nSeqTokens, nSeqs], then use Slice + Contiguous
to extract Q, K, V, Z in the same memory layout that attn_qkv/attn_gate
produce, and concatenate Q+K+V into qkvMixed. The rest of Forward() is
unchanged.

model/models/qwen3next/model.go

In Validate(), changed the check to:
if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) {

so that a layer carrying ssm_in is accepted.

Thanks for your work on this.
I'll check out your changes and attempt to test them as well.
We need to get this model back up on its feet, especially after what happened at Tongyi Lab with its creators leaving for Meta (unlikely to see evolution of this Qwen3-Next family from the source).

@Node0 commented on GitHub (Mar 25, 2026): > Below what this thing had to say: > > # qwen3next: layer 0 missing attn_qkv/attn_gate projections > Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b. We get the following error: > > ``` > Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections > ``` > > On Ollama version 0.17.4 this still worked. > > The issue has been reported at: [#14587](https://github.com/ollama/ollama/issues/14587) > > # Verification > Run the Ollama server via: > > ``` > go run . serve > ``` > > Via: > > ``` > go run . run qwen3-next:80b > ``` > > you can verify whether the model loads or not. > > # Analysis and Patch > ## Root Cause > Commit `8da09b1e` ("qwen3next: add compatibility with imported GGUF models") introduced: > > 1. A `Validate()` function that runs after tensor loading to check for required tensors. > 2. A compatibility path in `inferRecurrentLayers()` for imported GGUFs where > `attention.head_count_kv` is stored as a **scalar** (all-non-zero after expansion). > > For GGUFs imported from llama.cpp (which is what `qwen3-next:80b` on Ollama Hub is), the `attention.head_count_kv` scalar causes the compatibility path to correctly infer which layers are recurrent. However, the imported GGUF uses `blk.N.ssm_in.weight` — a combined tensor that interleaves Q, K, V, Z per K-head — instead of the split `blk.N.attn_qkv.weight` + `blk.N.attn_gate.weight` tensors that Ollama-native conversions produce. > > Because `populateFields` cannot find `blk.0.attn_qkv.weight`, it leaves `GatedDeltaNet.SSMQKV = nil`. The new `Validate()` then immediately fails with "layer 0 missing attn_qkv/attn_gate projections". > > In v0.17.4, the old `New()` code required a mix of zero and non-zero values in `headCountKV`. For imported GGUFs (scalar, all-non-zero), it failed earlier with "invalid attention.head_count_kv array; expected mix of zero and non-zero values". The `8da09b1e` commit fixed that first failure but exposed the tensor-name mismatch. > > ## ssm_in layout > For each K-head `j` (j = 0 .. numKHeads−1), `ssm_in.weight` stores `qkvzDim = 2*headKDim + 2*vPerHead` rows: > > ``` > rows [j*qkvzDim .. j*qkvzDim + headKDim) → Q for K-head j > rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j > rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V > rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim) → Z > ``` > > where `vPerHead = headVDim * numVHeads / numKHeads`. > > ## Fix > Two small changes: > > **`model/models/qwen3next/deltanet.go`** > > * Added `SSMIn *nn.Linear \`gguf:"ssm_in"``to`GatedDeltaNet`. > * Replaced the hard failure on missing `SSMQKV`/`SSMQKVGate` with a `switch`: > > * If `SSMQKV` + `SSMQKVGate` are present: existing fast path (unchanged). > * If `SSMIn` is present: reshape the combined projection to > `[qkvzDim, numKHeads, nSeqTokens, nSeqs]`, then use `Slice` + `Contiguous` > to extract Q, K, V, Z in the same memory layout that `attn_qkv`/`attn_gate` > produce, and concatenate Q+K+V into `qkvMixed`. The rest of `Forward()` is > unchanged. > > **`model/models/qwen3next/model.go`** > > * In `Validate()`, changed the check to: > if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) { > > > > > > > > > > so that a layer carrying `ssm_in` is accepted. Thanks for your work on this. I'll check out your changes and attempt to test them as well. We need to get this model back up on its feet, especially after what happened at Tongyi Lab with its creators leaving for Meta (unlikely to see evolution of this Qwen3-Next family from the source).

GiteaMirror commented

2026-04-29 10:05:27 -05:00

@fcorneli commented on GitHub (Mar 29, 2026):

@jmorganca Thanks!

@fcorneli commented on GitHub (Mar 29, 2026): @jmorganca Thanks!

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#55971