[GH-ISSUE #14545] qwen3.5 HuggingFace GGUFs fail to load - missing tensor 'blk.0.ssm_in.weight' #71496

Closed
opened 2026-05-05 01:54:30 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @terjenauf on GitHub (Mar 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14545

What is the issue?

Environment:

  • Ollama version: 0.17.4
  • OS: Ubuntu Server 24.04 LTS
  • GPU: 4x NVIDIA (3x RTX 3060 12GB + 1x RTX 2060 12GB) = 48GB total VRAM
  • CPU: Intel Core i7-4820K @ 3.70GHz
  • RAM: 16GB DDR3

Issue:
Loading HuggingFace GGUF variants of qwen3.5 models fails with:

500: llama runner process has terminated: error loading model:
missing tensor 'blk.0.ssm_in.weight'

Tested models that FAIL:

  • hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L
  • bazobehram/qwen3-coder-next (community build based on Unsloth GGUF)

Official Ollama library versions that WORK:

  • qwen3.5:27b-q8_0
  • qwen3-coder-next:q4_K_M

Expected behavior:
HuggingFace GGUF variants should load correctly, allowing
users to run alternative quantizations not available in
the official Ollama library.

Actual behavior:
Model fails to load with missing tensor error, suggesting
llama.cpp lacks support for the DeltaNet/SSM hybrid
architecture used in the Qwen3.5 family.

Additional context:
The official Ollama library versions of these models work
correctly, indicating Ollama has implemented a workaround
internally. However, users cannot access alternative
quantizations (e.g. Q6_K, Q5_K_M) that would provide
better speed/quality tradeoffs for systems with limited VRAM.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.17.4

Originally created by @terjenauf on GitHub (Mar 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14545 ### What is the issue? **Environment:** - Ollama version: 0.17.4 - OS: Ubuntu Server 24.04 LTS - GPU: 4x NVIDIA (3x RTX 3060 12GB + 1x RTX 2060 12GB) = 48GB total VRAM - CPU: Intel Core i7-4820K @ 3.70GHz - RAM: 16GB DDR3 **Issue:** Loading HuggingFace GGUF variants of qwen3.5 models fails with: 500: llama runner process has terminated: error loading model: missing tensor 'blk.0.ssm_in.weight' **Tested models that FAIL:** - hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L - bazobehram/qwen3-coder-next (community build based on Unsloth GGUF) **Official Ollama library versions that WORK:** - qwen3.5:27b-q8_0 - qwen3-coder-next:q4_K_M **Expected behavior:** HuggingFace GGUF variants should load correctly, allowing users to run alternative quantizations not available in the official Ollama library. **Actual behavior:** Model fails to load with missing tensor error, suggesting llama.cpp lacks support for the DeltaNet/SSM hybrid architecture used in the Qwen3.5 family. **Additional context:** The official Ollama library versions of these models work correctly, indicating Ollama has implemented a workaround internally. However, users cannot access alternative quantizations (e.g. Q6_K, Q5_K_M) that would provide better speed/quality tradeoffs for systems with limited VRAM. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.17.4
GiteaMirror added the bug label 2026-05-05 01:54:30 -05:00
Author
Owner

@TTDiang2 commented on GitHub (Mar 2, 2026):

Adding some further observations that might help pinpoint the root cause.

I’ve noticed that when attempting to run the official Ollama-pulled qwen3.5-35b-a3b (MoE) model using a standalone llama.cpp (v.b8170), it fails with a hyperparameter length error:

llama_model_load: error loading model: error loading model hyperparameters: key qwen35moe.rope.dimension_sections has wrong array length; expected 4, got 3
llama_model_load_from_file_impl: failed to load model

This suggests a discrepancy in how the tensors and hyperparameters are structured between the official Ollama library models and the current standard in llama.cpp. This also seems linked to the Vision issues reported in #14508; the official Ollama versions appear to be missing the necessary Vision tensors.

Root Cause Analysis:

Looking at the llama.cpp commit history, there was a specific sequence of updates for Qwen3.5 support:

v.b7973: Initial support for Qwen3.5 dense and MoE was added, but explicitly stated "no vision" (#19435).

v.b7976: This "imperfect" support was reverted.

v.b7990: Qwen3.5 support was re-implemented fully and correctly.

It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds.

Suggested Resolution:

To resolve these compatibility issues, the Ollama team likely needs to:

  • Re-sync/Update the underlying llama.cpp runner to a version post-b7990.
  • Refresh/Re-quantize the models in the official Ollama library to align with the finalized tensor structure.

Environment:

Ollama version: 0.17.4

<!-- gh-comment-id:3982773615 --> @TTDiang2 commented on GitHub (Mar 2, 2026): # Adding some further observations that might help pinpoint the root cause. I’ve noticed that when attempting to run the official Ollama-pulled qwen3.5-35b-a3b (MoE) model using a standalone llama.cpp (v.b8170), it fails with a hyperparameter length error: ``` llama_model_load: error loading model: error loading model hyperparameters: key qwen35moe.rope.dimension_sections has wrong array length; expected 4, got 3 llama_model_load_from_file_impl: failed to load model ``` This suggests a discrepancy in how the tensors and hyperparameters are structured between the official Ollama library models and the current standard in llama.cpp. This also seems linked to the Vision issues reported in #14508; the official Ollama versions appear to be missing the necessary Vision tensors. ### Root Cause Analysis: Looking at the llama.cpp commit history, there was a specific sequence of updates for Qwen3.5 support: v.b7973: Initial support for Qwen3.5 dense and MoE was added, but explicitly stated "no vision" (#19435). v.b7976: This "imperfect" support was reverted. v.b7990: Qwen3.5 support was re-implemented fully and correctly. It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds. ### Suggested Resolution: To resolve these compatibility issues, the Ollama team likely needs to: - Re-sync/Update the underlying llama.cpp runner to a version post-b7990. - - Refresh/Re-quantize the models in the official Ollama library to align with the finalized tensor structure. Environment: Ollama version: 0.17.4
Author
Owner

@rick-github commented on GitHub (Mar 2, 2026):

What is the output from ollama -v?

hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L is a split model (separate text and vision GGUFs) and won't be supported until #14134 is merged.

bazobehram/qwen3-coder-next is missing tool support in the template. Try frob/qwen3-coder-text.

<!-- gh-comment-id:3982778339 --> @rick-github commented on GitHub (Mar 2, 2026): What is the output from `ollama -v`? hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L is a split model (separate text and vision GGUFs) and won't be supported until #14134 is merged. bazobehram/qwen3-coder-next is missing tool support in the template. Try [frob/qwen3-coder-text](https://ollama.com/frob/qwen3-coder-next).
Author
Owner

@rick-github commented on GitHub (Mar 2, 2026):

@TTDiang2

It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds.

Ollama has it's own implementation of qwen35. Please don't post AI content.

<!-- gh-comment-id:3982805568 --> @rick-github commented on GitHub (Mar 2, 2026): @TTDiang2 > It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds. Ollama has it's own implementation of qwen35. Please don't post AI content.
Author
Owner

@TTDiang2 commented on GitHub (Mar 2, 2026):

Sorry, but its not AI generated, its AI translated. I'm not native English speaker :)

<!-- gh-comment-id:3982843503 --> @TTDiang2 commented on GitHub (Mar 2, 2026): Sorry, but its not AI generated, its AI translated. I'm not native English speaker :)
Author
Owner

@terjenauf commented on GitHub (Mar 2, 2026):

Thanks for your clarification regarding the bartowski model. That explained the issue. I tried frob/qwen3-coder-text q4_K_M.

Unfortunately I ran into problems using the model. I am low on RAM, and are limited to 48Gb VRAM split over 4 GPU's.
ollama -v gives me "ollama version is 0.17.4".

Is the missing tensor 'blk.0.ssm_in.weight' error i get specifically related to the split model issue, or is it a separate llama.cpp architecture support problem for the DeltaNet/SSM hybrid?

And two more things :-) I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok. And I am not very experienced in regards of ollama other than basic use.

<!-- gh-comment-id:3983444994 --> @terjenauf commented on GitHub (Mar 2, 2026): Thanks for your clarification regarding the bartowski model. That explained the issue. I tried frob/qwen3-coder-text q4_K_M. Unfortunately I ran into problems using the model. I am low on RAM, and are limited to 48Gb VRAM split over 4 GPU's. ollama -v gives me "ollama version is 0.17.4". Is the missing tensor 'blk.0.ssm_in.weight' error i get specifically related to the split model issue, or is it a separate llama.cpp architecture support problem for the DeltaNet/SSM hybrid? And two more things :-) I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok. And I am not very experienced in regards of ollama other than basic use.
Author
Owner

@rick-github commented on GitHub (Mar 2, 2026):

Unfortunately I ran into problems using the model.

The model works with 0.17.4:

$ ollama run frob/qwen3-coder-next:latest hello
Hello! How can I help you today? 😊

The reason I asked about the version is because 'blk.0.ssm_in.weight' errors are from 0.15.5. Is ollama version is 0.17.4 the only output from ollama -v?

I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok.

Translation is fine. The issue is that using AI generated content that is incorrect gives it an authoritative feel which can mislead others.

And I am not very experienced in regards of ollama other than basic use.

No problem, that's why questions are welcomed.

<!-- gh-comment-id:3984954366 --> @rick-github commented on GitHub (Mar 2, 2026): > Unfortunately I ran into problems using the model. The model works with 0.17.4: ```console $ ollama run frob/qwen3-coder-next:latest hello Hello! How can I help you today? 😊 ``` The reason I asked about the version is because 'blk.0.ssm_in.weight' errors are from [0.15.5](https://github.com/ollama/ollama/issues/14133). Is `ollama version is 0.17.4` the _only_ output from `ollama -v`? > I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok. Translation is fine. The issue is that using AI generated content that is incorrect gives it an authoritative feel which can mislead others. > And I am not very experienced in regards of ollama other than basic use. No problem, that's why questions are welcomed.
Author
Owner

@terjenauf commented on GitHub (Mar 2, 2026):

Thanks, then I will wait in excitement for the #14134 being closed 👍

<!-- gh-comment-id:3986776933 --> @terjenauf commented on GitHub (Mar 2, 2026): Thanks, then I will wait in excitement for the #14134 being closed 👍
Author
Owner

@rick-github commented on GitHub (Mar 2, 2026):

#14134 will not fix blk.0.ssm_in.weight errors.

<!-- gh-comment-id:3986786200 --> @rick-github commented on GitHub (Mar 2, 2026): #14134 will not fix `blk.0.ssm_in.weight` errors.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71496