[GH-ISSUE #14587] qwen3next: layer 0 missing attn_qkv/attn_gate projections #55971

Closed
opened 2026-04-29 10:05:14 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @fcorneli on GitHub (Mar 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14587

What is the issue?

Since Ollama 0.17.5 I get the following error:

ollama run qwen3-next:80b-a3b-instruct-q4_K_M
Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

Relevant log output


OS

Linux

GPU

NVIDIA RTX PRO 6000 Blackwell Workstation Edition

CPU

Intel(R) Core(TM) Ultra 9 285K

Ollama version

0.17.5

Originally created by @fcorneli on GitHub (Mar 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14587 ### What is the issue? Since Ollama 0.17.5 I get the following error: ``` ollama run qwen3-next:80b-a3b-instruct-q4_K_M Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ``` ### Relevant log output ```shell ``` ### OS Linux ### GPU NVIDIA RTX PRO 6000 Blackwell Workstation Edition ### CPU Intel(R) Core(TM) Ultra 9 285K ### Ollama version 0.17.5
GiteaMirror added the bug label 2026-04-29 10:05:14 -05:00
Author
Owner

@YetheSamartaka commented on GitHub (Mar 3, 2026):

I have the same issue with qwen3-next:80b-a3b-thinking-q8_0. Issue appeared on version 0.17.5

<!-- gh-comment-id:3991534492 --> @YetheSamartaka commented on GitHub (Mar 3, 2026): I have the same issue with qwen3-next:80b-a3b-thinking-q8_0. Issue appeared on version 0.17.5
Author
Owner

@fcorneli commented on GitHub (Mar 3, 2026):

Indeed:

ollama run qwen3-next:80b-a3b-thinking-q8_0
Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections
<!-- gh-comment-id:3992014459 --> @fcorneli commented on GitHub (Mar 3, 2026): Indeed: ``` ollama run qwen3-next:80b-a3b-thinking-q8_0 Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ```
Author
Owner

@Nyx1197 commented on GitHub (Mar 4, 2026):

qwen3-next on 0.17.4 working fine. but can't run qwen3.5.

<!-- gh-comment-id:3994956851 --> @Nyx1197 commented on GitHub (Mar 4, 2026): qwen3-next on 0.17.4 working fine. but can't run qwen3.5.
Author
Owner

@fcorneli commented on GitHub (Mar 4, 2026):

Still the same error on ollama version 0.17.6

<!-- gh-comment-id:3997178383 --> @fcorneli commented on GitHub (Mar 4, 2026): Still the same error on ollama version 0.17.6
Author
Owner

@D337z commented on GitHub (Mar 5, 2026):

I can't seem to use any of the Qwen Next and Qwen3.5 models since the update. It's been completely broken.

<!-- gh-comment-id:4002032674 --> @D337z commented on GitHub (Mar 5, 2026): I can't seem to use any of the Qwen Next and Qwen3.5 models since the update. It's been completely broken.
Author
Owner

@marcinm1234 commented on GitHub (Mar 5, 2026):

[BUG] qwen3-next:80b-a3b-thinking-q8_0 fails to load on GPU after update to 0.17.5 — layer 0 missing attn_qkv/attn_gate projections


Bug Description

After updating Ollama to version 0.17.5, model qwen3-next:80b-a3b-thinking-q8_0
fails to initialize on GPU with the following error:

failed to initialize model: layer 0 missing attn_qkv/attn_gate projections (status code: 500)

Both verification attempts fail, resulting in complete inability to run the model.
The model worked correctly on Ollama 0.17.4 and earlier. This is a regression
introduced in 0.17.5.


Steps to Reproduce

  1. Install Ollama 0.17.5
  2. Run: ollama run qwen3-next:80b-a3b-thinking-q8_0
  3. Observe error in logs on both GPU initialization attempts

Expected Behavior

Model loads and runs correctly on GPU, as it did on Ollama 0.17.4.


Actual Behavior

2026-03-05 10:11:23,113 | ERROR   | GPU Ollama call failed for model
  qwen3-next:80b-a3b-thinking-q8_0: failed to initialize model:
  layer 0 missing attn_qkv/attn_gate projections (status code: 500)


Root Cause Analysis

qwen3-next uses a hybrid SSM+Transformer architecture (DeltaNet-style).
Some layers do not contain attn_qkv / attn_gate projections — they use SSM
tensors instead. It appears Ollama 0.17.5 introduced stricter tensor validation
for hybrid model architectures that incorrectly rejects these layers, causing
initialization to fail entirely.

The model GGUF files have not changed — the regression is on Ollama's side.


Environment

Component Details
Ollama 0.17.5 (regression from 0.17.4)
OS Windows 10 IoT Enterprise LTSC (Build 19044)
GPU 4× NVIDIA GeForce RTX 3090
VRAM 4× 24,576 MiB (96 GB total)
GPU Driver 591.74
CPU Intel Xeon E5-2699A v4 @ 2.40 GHz, 22 cores
RAM 512 GB

GPU Memory at Time of Error

GPU Free VRAM State
GPU 0 23,896 MiB Nearly idle
GPU 1 1,382 MiB In use
GPU 2 1,796 MiB In use
GPU 3 690 MiB In use

Workaround

Downgrading to Ollama 0.17.4 fully resolves the issue.



Regression introduced in: 0.17.5
Last working version: 0.17.4

<!-- gh-comment-id:4004343383 --> @marcinm1234 commented on GitHub (Mar 5, 2026): **[BUG] qwen3-next:80b-a3b-thinking-q8_0 fails to load on GPU after update to 0.17.5 — layer 0 missing attn_qkv/attn_gate projections** --- ## Bug Description After updating Ollama to **version 0.17.5**, model `qwen3-next:80b-a3b-thinking-q8_0` fails to initialize on GPU with the following error: > failed to initialize model: layer 0 missing attn_qkv/attn_gate projections (status code: 500) Both verification attempts fail, resulting in complete inability to run the model. The model worked correctly on Ollama **0.17.4** and earlier. This is a regression introduced in 0.17.5. --- ## Steps to Reproduce 1. Install Ollama 0.17.5 2. Run: `ollama run qwen3-next:80b-a3b-thinking-q8_0` 3. Observe error in logs on both GPU initialization attempts --- ## Expected Behavior Model loads and runs correctly on GPU, as it did on Ollama 0.17.4. --- ## Actual Behavior ``` 2026-03-05 10:11:23,113 | ERROR | GPU Ollama call failed for model qwen3-next:80b-a3b-thinking-q8_0: failed to initialize model: layer 0 missing attn_qkv/attn_gate projections (status code: 500) ``` --- ## Root Cause Analysis `qwen3-next` uses a **hybrid SSM+Transformer architecture** (DeltaNet-style). Some layers do not contain `attn_qkv` / `attn_gate` projections — they use SSM tensors instead. It appears Ollama 0.17.5 introduced stricter tensor validation for hybrid model architectures that incorrectly rejects these layers, causing initialization to fail entirely. The model GGUF files have not changed — the regression is on Ollama's side. --- ## Environment | Component | Details | |--------------|---------------------------------------------------| | Ollama | 0.17.5 (regression from 0.17.4) | | OS | Windows 10 IoT Enterprise LTSC (Build 19044) | | GPU | 4× NVIDIA GeForce RTX 3090 | | VRAM | 4× 24,576 MiB (96 GB total) | | GPU Driver | 591.74 | | CPU | Intel Xeon E5-2699A v4 @ 2.40 GHz, 22 cores | | RAM | 512 GB | ### GPU Memory at Time of Error | GPU | Free VRAM | State | |-------|---------------|---------------| | GPU 0 | 23,896 MiB | Nearly idle | | GPU 1 | 1,382 MiB | In use | | GPU 2 | 1,796 MiB | In use | | GPU 3 | 690 MiB | In use | --- ## Workaround Downgrading to **Ollama 0.17.4** fully resolves the issue. --- ## Related Issues - #14587 - #14433 --- **Regression introduced in:** 0.17.5 **Last working version:** 0.17.4
Author
Owner

@fcorneli commented on GitHub (Mar 5, 2026):

It appears Ollama 0.17.5 introduced stricter tensor validation for hybrid model architectures that incorrectly rejects these layers, causing initialization to fail entirely.

The code where it actually fails:
https://github.com/ollama/ollama/blob/main/model/models/qwen3next/model.go#L457

<!-- gh-comment-id:4005314241 --> @fcorneli commented on GitHub (Mar 5, 2026): > It appears Ollama 0.17.5 introduced stricter tensor validation for hybrid model architectures that incorrectly rejects these layers, causing initialization to fail entirely. The code where it actually fails: https://github.com/ollama/ollama/blob/main/model/models/qwen3next/model.go#L457
Author
Owner

@MarkMuravev commented on GitHub (Mar 6, 2026):

+1

<!-- gh-comment-id:4011776636 --> @MarkMuravev commented on GitHub (Mar 6, 2026): +1
Author
Owner

@MarkMuravev commented on GitHub (Mar 6, 2026):

Still the same error on ollama version 0.17.7

<!-- gh-comment-id:4011829137 --> @MarkMuravev commented on GitHub (Mar 6, 2026): Still the same error on ollama version 0.17.7
Author
Owner

@fcorneli commented on GitHub (Mar 6, 2026):

Quick-and-dirty patch:

diff --git a/model/models/qwen3next/model.go b/model/models/qwen3next/model.go
index 9681efda..5bc724c2 100644
--- a/model/models/qwen3next/model.go
+++ b/model/models/qwen3next/model.go
@@ -454,9 +454,9 @@ func (m *Model) Validate() error {
                if !ok || gdn == nil {
                        return fmt.Errorf("qwen3next: layer %d expected recurrent operator", i)
                }
-               if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil {
-                       return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i)
-               }
+               //if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil {
+               //      return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i)
+               //}
                if gdn.SSMBetaAlpha == nil && (gdn.SSMBeta == nil || gdn.SSMAlpha == nil) {
                        return fmt.Errorf("qwen3next: layer %d missing linear attention beta/alpha projections", i)

does not help. Now it gets stuck somewhere else:

panic: failed to build graph: qwen3next: missing attn_qkv/attn_gate projections (legacy ssm_in is not supported)
<!-- gh-comment-id:4012191879 --> @fcorneli commented on GitHub (Mar 6, 2026): Quick-and-dirty patch: ```patch diff --git a/model/models/qwen3next/model.go b/model/models/qwen3next/model.go index 9681efda..5bc724c2 100644 --- a/model/models/qwen3next/model.go +++ b/model/models/qwen3next/model.go @@ -454,9 +454,9 @@ func (m *Model) Validate() error { if !ok || gdn == nil { return fmt.Errorf("qwen3next: layer %d expected recurrent operator", i) } - if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil { - return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i) - } + //if gdn.SSMQKV == nil || gdn.SSMQKVGate == nil { + // return fmt.Errorf("qwen3next: layer %d missing attn_qkv/attn_gate projections", i) + //} if gdn.SSMBetaAlpha == nil && (gdn.SSMBeta == nil || gdn.SSMAlpha == nil) { return fmt.Errorf("qwen3next: layer %d missing linear attention beta/alpha projections", i) ``` does not help. Now it gets stuck somewhere else: ``` panic: failed to build graph: qwen3next: missing attn_qkv/attn_gate projections (legacy ssm_in is not supported) ```
Author
Owner

@fcorneli commented on GitHub (Mar 6, 2026):

I've let Claude Code go wild on this one. It works somehow.
https://github.com/ollama/ollama/pull/14675

<!-- gh-comment-id:4013986784 --> @fcorneli commented on GitHub (Mar 6, 2026): I've let Claude Code go wild on this one. It works somehow. https://github.com/ollama/ollama/pull/14675
Author
Owner

@fcorneli commented on GitHub (Mar 6, 2026):

Below what this thing had to say:

qwen3next: layer 0 missing attn_qkv/attn_gate projections

Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b.
We get the following error:

Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

On Ollama version 0.17.4 this still worked.

The issue has been reported at:
https://github.com/ollama/ollama/issues/14587

Verification

Run the Ollama server via:

go run . serve

Via:

go run . run qwen3-next:80b

you can verify whether the model loads or not.

Analysis and Patch

Root Cause

Commit 8da09b1e ("qwen3next: add compatibility with imported GGUF models") introduced:

  1. A Validate() function that runs after tensor loading to check for required tensors.
  2. A compatibility path in inferRecurrentLayers() for imported GGUFs where
    attention.head_count_kv is stored as a scalar (all-non-zero after expansion).

For GGUFs imported from llama.cpp (which is what qwen3-next:80b on Ollama Hub is),
the attention.head_count_kv scalar causes the compatibility path to correctly
infer which layers are recurrent. However, the imported GGUF uses
blk.N.ssm_in.weight — a combined tensor that interleaves Q, K, V, Z per K-head —
instead of the split blk.N.attn_qkv.weight + blk.N.attn_gate.weight tensors
that Ollama-native conversions produce.

Because populateFields cannot find blk.0.attn_qkv.weight, it leaves
GatedDeltaNet.SSMQKV = nil. The new Validate() then immediately fails with
"layer 0 missing attn_qkv/attn_gate projections".

In v0.17.4, the old New() code required a mix of zero and non-zero values in
headCountKV. For imported GGUFs (scalar, all-non-zero), it failed earlier with
"invalid attention.head_count_kv array; expected mix of zero and non-zero values".
The 8da09b1e commit fixed that first failure but exposed the tensor-name mismatch.

ssm_in layout

For each K-head j (j = 0 .. numKHeads−1), ssm_in.weight stores
qkvzDim = 2*headKDim + 2*vPerHead rows:

rows [j*qkvzDim .. j*qkvzDim + headKDim)           → Q for K-head j
rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j
rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V
rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim)          → Z

where vPerHead = headVDim * numVHeads / numKHeads.

Fix

Two small changes:

model/models/qwen3next/deltanet.go

  • Added SSMIn *nn.Linear \gguf:"ssm_in"`toGatedDeltaNet`.
  • Replaced the hard failure on missing SSMQKV/SSMQKVGate with a switch:
    • If SSMQKV + SSMQKVGate are present: existing fast path (unchanged).
    • If SSMIn is present: reshape the combined projection to
      [qkvzDim, numKHeads, nSeqTokens, nSeqs], then use Slice + Contiguous
      to extract Q, K, V, Z in the same memory layout that attn_qkv/attn_gate
      produce, and concatenate Q+K+V into qkvMixed. The rest of Forward() is
      unchanged.

model/models/qwen3next/model.go

  • In Validate(), changed the check to:
    if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) {
    
    so that a layer carrying ssm_in is accepted.
<!-- gh-comment-id:4014024997 --> @fcorneli commented on GitHub (Mar 6, 2026): Below what this thing had to say: # qwen3next: layer 0 missing attn_qkv/attn_gate projections Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b. We get the following error: ``` Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ``` On Ollama version 0.17.4 this still worked. The issue has been reported at: https://github.com/ollama/ollama/issues/14587 # Verification Run the Ollama server via: ``` go run . serve ``` Via: ``` go run . run qwen3-next:80b ``` you can verify whether the model loads or not. # Analysis and Patch ## Root Cause Commit `8da09b1e` ("qwen3next: add compatibility with imported GGUF models") introduced: 1. A `Validate()` function that runs after tensor loading to check for required tensors. 2. A compatibility path in `inferRecurrentLayers()` for imported GGUFs where `attention.head_count_kv` is stored as a **scalar** (all-non-zero after expansion). For GGUFs imported from llama.cpp (which is what `qwen3-next:80b` on Ollama Hub is), the `attention.head_count_kv` scalar causes the compatibility path to correctly infer which layers are recurrent. However, the imported GGUF uses `blk.N.ssm_in.weight` — a combined tensor that interleaves Q, K, V, Z per K-head — instead of the split `blk.N.attn_qkv.weight` + `blk.N.attn_gate.weight` tensors that Ollama-native conversions produce. Because `populateFields` cannot find `blk.0.attn_qkv.weight`, it leaves `GatedDeltaNet.SSMQKV = nil`. The new `Validate()` then immediately fails with "layer 0 missing attn_qkv/attn_gate projections". In v0.17.4, the old `New()` code required a mix of zero and non-zero values in `headCountKV`. For imported GGUFs (scalar, all-non-zero), it failed earlier with "invalid attention.head_count_kv array; expected mix of zero and non-zero values". The `8da09b1e` commit fixed that first failure but exposed the tensor-name mismatch. ## ssm_in layout For each K-head `j` (j = 0 .. numKHeads−1), `ssm_in.weight` stores `qkvzDim = 2*headKDim + 2*vPerHead` rows: ``` rows [j*qkvzDim .. j*qkvzDim + headKDim) → Q for K-head j rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim) → Z ``` where `vPerHead = headVDim * numVHeads / numKHeads`. ## Fix Two small changes: **`model/models/qwen3next/deltanet.go`** - Added `SSMIn *nn.Linear \`gguf:"ssm_in"\`` to `GatedDeltaNet`. - Replaced the hard failure on missing `SSMQKV`/`SSMQKVGate` with a `switch`: - If `SSMQKV` + `SSMQKVGate` are present: existing fast path (unchanged). - If `SSMIn` is present: reshape the combined projection to `[qkvzDim, numKHeads, nSeqTokens, nSeqs]`, then use `Slice` + `Contiguous` to extract Q, K, V, Z in the same memory layout that `attn_qkv`/`attn_gate` produce, and concatenate Q+K+V into `qkvMixed`. The rest of `Forward()` is unchanged. **`model/models/qwen3next/model.go`** - In `Validate()`, changed the check to: ```go if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) { ``` so that a layer carrying `ssm_in` is accepted.
Author
Owner

@marcinm1234 commented on GitHub (Mar 13, 2026):

aaaa, this still is not fixed?? Tried latest version, re-dl'ed qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4

<!-- gh-comment-id:4053293676 --> @marcinm1234 commented on GitHub (Mar 13, 2026): aaaa, this still is not fixed?? Tried latest version, re-dl'ed qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4
Author
Owner

@D337z commented on GitHub (Mar 13, 2026):

There are just a few things that need to be fixed in the code. Don't
worry, I am actively working on a patch for it, but I have a life of my own
to handle. On top of that, there's no guarantee that the patch will be
accepted when I'm done as I'm not the creator nor maintainer of the
project. I'm also working on the Vulkan fixes to properly address dynamic
memory boundary shifts as well as possibly adding a dynamic anti-lag method
to allow the system to remain stable during heavy inference, but that's on
the back burner as it requires a lot more tie-ins and a rework of the
memory management that would not be easy. But don't worry, it isn't being
ignored entirely.

On Fri, Mar 13, 2026 at 2:25 AM marcinm1234 @.***>
wrote:

marcinm1234 left a comment (ollama/ollama#14587)
https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676

aaaa, this still is not fixed?? Tried latest version, re-dl'ed
qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ATOTXI4QRM5LNTVO5TTOUID4QOZWHAVCNFSM6AAAAACWFJQNLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANJTGI4TGNRXGY
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:4056796093 --> @D337z commented on GitHub (Mar 13, 2026): There are just a few things that need to be fixed in the code. Don't worry, I am actively working on a patch for it, but I have a life of my own to handle. On top of that, there's no guarantee that the patch will be accepted when I'm done as I'm not the creator nor maintainer of the project. I'm also working on the Vulkan fixes to properly address dynamic memory boundary shifts as well as possibly adding a dynamic anti-lag method to allow the system to remain stable during heavy inference, but that's on the back burner as it requires a lot more tie-ins and a rework of the memory management that would not be easy. But don't worry, it isn't being ignored entirely. On Fri, Mar 13, 2026 at 2:25 AM marcinm1234 ***@***.***> wrote: > *marcinm1234* left a comment (ollama/ollama#14587) > <https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676> > > aaaa, this still is not fixed?? Tried latest version, re-dl'ed > qwen3-next:80b-a3b-thinking-q8_0, still the same :( Roll-back to 0.17.4 > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/14587#issuecomment-4053293676>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ATOTXI4QRM5LNTVO5TTOUID4QOZWHAVCNFSM6AAAAACWFJQNLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANJTGI4TGNRXGY> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@Glitch3dPenguin commented on GitHub (Mar 14, 2026):

I can also confirm that Ollama version 0.18.0 and Open WebUI version 0.8.10 this is still an issue. Thanks for all the hard work in this issue!

<!-- gh-comment-id:4059808707 --> @Glitch3dPenguin commented on GitHub (Mar 14, 2026): I can also confirm that Ollama version 0.18.0 and Open WebUI version 0.8.10 this is still an issue. Thanks for all the hard work in this issue!
Author
Owner

@heapsoftware commented on GitHub (Mar 14, 2026):

Same issue here running with 2 5090s still on v0.17.8-rc4

<!-- gh-comment-id:4061319291 --> @heapsoftware commented on GitHub (Mar 14, 2026): Same issue here running with 2 5090s still on v0.17.8-rc4
Author
Owner

@Node0 commented on GitHub (Mar 25, 2026):

Below what this thing had to say:

qwen3next: layer 0 missing attn_qkv/attn_gate projections

Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b. We get the following error:

Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

On Ollama version 0.17.4 this still worked.

The issue has been reported at: #14587

Verification

Run the Ollama server via:

go run . serve

Via:

go run . run qwen3-next:80b

you can verify whether the model loads or not.

Analysis and Patch

Root Cause

Commit 8da09b1e ("qwen3next: add compatibility with imported GGUF models") introduced:

  1. A Validate() function that runs after tensor loading to check for required tensors.
  2. A compatibility path in inferRecurrentLayers() for imported GGUFs where
    attention.head_count_kv is stored as a scalar (all-non-zero after expansion).

For GGUFs imported from llama.cpp (which is what qwen3-next:80b on Ollama Hub is), the attention.head_count_kv scalar causes the compatibility path to correctly infer which layers are recurrent. However, the imported GGUF uses blk.N.ssm_in.weight — a combined tensor that interleaves Q, K, V, Z per K-head — instead of the split blk.N.attn_qkv.weight + blk.N.attn_gate.weight tensors that Ollama-native conversions produce.

Because populateFields cannot find blk.0.attn_qkv.weight, it leaves GatedDeltaNet.SSMQKV = nil. The new Validate() then immediately fails with "layer 0 missing attn_qkv/attn_gate projections".

In v0.17.4, the old New() code required a mix of zero and non-zero values in headCountKV. For imported GGUFs (scalar, all-non-zero), it failed earlier with "invalid attention.head_count_kv array; expected mix of zero and non-zero values". The 8da09b1e commit fixed that first failure but exposed the tensor-name mismatch.

ssm_in layout

For each K-head j (j = 0 .. numKHeads−1), ssm_in.weight stores qkvzDim = 2*headKDim + 2*vPerHead rows:

rows [j*qkvzDim .. j*qkvzDim + headKDim)           → Q for K-head j
rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j
rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V
rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim)          → Z

where vPerHead = headVDim * numVHeads / numKHeads.

Fix

Two small changes:

model/models/qwen3next/deltanet.go

  • Added SSMIn *nn.Linear \gguf:"ssm_in"``toGatedDeltaNet.

  • Replaced the hard failure on missing SSMQKV/SSMQKVGate with a switch:

    • If SSMQKV + SSMQKVGate are present: existing fast path (unchanged).
    • If SSMIn is present: reshape the combined projection to
      [qkvzDim, numKHeads, nSeqTokens, nSeqs], then use Slice + Contiguous
      to extract Q, K, V, Z in the same memory layout that attn_qkv/attn_gate
      produce, and concatenate Q+K+V into qkvMixed. The rest of Forward() is
      unchanged.

model/models/qwen3next/model.go

  • In Validate(), changed the check to:
    if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) {

    so that a layer carrying ssm_in is accepted.

Thanks for your work on this.
I'll check out your changes and attempt to test them as well.
We need to get this model back up on its feet, especially after what happened at Tongyi Lab with its creators leaving for Meta (unlikely to see evolution of this Qwen3-Next family from the source).

<!-- gh-comment-id:4122956154 --> @Node0 commented on GitHub (Mar 25, 2026): > Below what this thing had to say: > > # qwen3next: layer 0 missing attn_qkv/attn_gate projections > Since Ollama version 0.17.5 Ollama can no longer load qwen3-next:80b. We get the following error: > > ``` > Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections > ``` > > On Ollama version 0.17.4 this still worked. > > The issue has been reported at: [#14587](https://github.com/ollama/ollama/issues/14587) > > # Verification > Run the Ollama server via: > > ``` > go run . serve > ``` > > Via: > > ``` > go run . run qwen3-next:80b > ``` > > you can verify whether the model loads or not. > > # Analysis and Patch > ## Root Cause > Commit `8da09b1e` ("qwen3next: add compatibility with imported GGUF models") introduced: > > 1. A `Validate()` function that runs after tensor loading to check for required tensors. > 2. A compatibility path in `inferRecurrentLayers()` for imported GGUFs where > `attention.head_count_kv` is stored as a **scalar** (all-non-zero after expansion). > > For GGUFs imported from llama.cpp (which is what `qwen3-next:80b` on Ollama Hub is), the `attention.head_count_kv` scalar causes the compatibility path to correctly infer which layers are recurrent. However, the imported GGUF uses `blk.N.ssm_in.weight` — a combined tensor that interleaves Q, K, V, Z per K-head — instead of the split `blk.N.attn_qkv.weight` + `blk.N.attn_gate.weight` tensors that Ollama-native conversions produce. > > Because `populateFields` cannot find `blk.0.attn_qkv.weight`, it leaves `GatedDeltaNet.SSMQKV = nil`. The new `Validate()` then immediately fails with "layer 0 missing attn_qkv/attn_gate projections". > > In v0.17.4, the old `New()` code required a mix of zero and non-zero values in `headCountKV`. For imported GGUFs (scalar, all-non-zero), it failed earlier with "invalid attention.head_count_kv array; expected mix of zero and non-zero values". The `8da09b1e` commit fixed that first failure but exposed the tensor-name mismatch. > > ## ssm_in layout > For each K-head `j` (j = 0 .. numKHeads−1), `ssm_in.weight` stores `qkvzDim = 2*headKDim + 2*vPerHead` rows: > > ``` > rows [j*qkvzDim .. j*qkvzDim + headKDim) → Q for K-head j > rows [j*qkvzDim + headKDim .. j*qkvzDim + 2*headKDim) → K for K-head j > rows [j*qkvzDim + 2*headKDim .. j*qkvzDim + 2*headKDim + vPerHead) → V > rows [j*qkvzDim + 2*headKDim + vPerHead .. (j+1)*qkvzDim) → Z > ``` > > where `vPerHead = headVDim * numVHeads / numKHeads`. > > ## Fix > Two small changes: > > **`model/models/qwen3next/deltanet.go`** > > * Added `SSMIn *nn.Linear \`gguf:"ssm_in"``to`GatedDeltaNet`. > * Replaced the hard failure on missing `SSMQKV`/`SSMQKVGate` with a `switch`: > > * If `SSMQKV` + `SSMQKVGate` are present: existing fast path (unchanged). > * If `SSMIn` is present: reshape the combined projection to > `[qkvzDim, numKHeads, nSeqTokens, nSeqs]`, then use `Slice` + `Contiguous` > to extract Q, K, V, Z in the same memory layout that `attn_qkv`/`attn_gate` > produce, and concatenate Q+K+V into `qkvMixed`. The rest of `Forward()` is > unchanged. > > **`model/models/qwen3next/model.go`** > > * In `Validate()`, changed the check to: > if gdn.SSMIn == nil && (gdn.SSMQKV == nil || gdn.SSMQKVGate == nil) { > > > > > > > > > > so that a layer carrying `ssm_in` is accepted. Thanks for your work on this. I'll check out your changes and attempt to test them as well. We need to get this model back up on its feet, especially after what happened at Tongyi Lab with its creators leaving for Meta (unlikely to see evolution of this Qwen3-Next family from the source).
Author
Owner

@fcorneli commented on GitHub (Mar 29, 2026):

@jmorganca Thanks!

<!-- gh-comment-id:4150880904 --> @fcorneli commented on GitHub (Mar 29, 2026): @jmorganca Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55971