[GH-ISSUE #14545] qwen3.5 HuggingFace GGUFs fail to load - missing tensor 'blk.0.ssm_in.weight' #71496

New Issue

GiteaMirror · 2026-05-05T01:54:30-05:00

GiteaMirror commented

2026-05-05 01:54:30 -05:00

Originally created by @terjenauf on GitHub (Mar 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14545

What is the issue?

Environment:

Ollama version: 0.17.4
OS: Ubuntu Server 24.04 LTS
GPU: 4x NVIDIA (3x RTX 3060 12GB + 1x RTX 2060 12GB) = 48GB total VRAM
CPU: Intel Core i7-4820K @ 3.70GHz
RAM: 16GB DDR3

Issue:
Loading HuggingFace GGUF variants of qwen3.5 models fails with:

500: llama runner process has terminated: error loading model:
missing tensor 'blk.0.ssm_in.weight'

Tested models that FAIL:

hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L
bazobehram/qwen3-coder-next (community build based on Unsloth GGUF)

Official Ollama library versions that WORK:

qwen3.5:27b-q8_0
qwen3-coder-next:q4_K_M

Expected behavior:
HuggingFace GGUF variants should load correctly, allowing
users to run alternative quantizations not available in
the official Ollama library.

Actual behavior:
Model fails to load with missing tensor error, suggesting
llama.cpp lacks support for the DeltaNet/SSM hybrid
architecture used in the Qwen3.5 family.

Additional context:
The official Ollama library versions of these models work
correctly, indicating Ollama has implemented a workaround
internally. However, users cannot access alternative
quantizations (e.g. Q6_K, Q5_K_M) that would provide
better speed/quality tradeoffs for systems with limited VRAM.

Relevant log output

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.17.4

Originally created by @terjenauf on GitHub (Mar 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14545 ### What is the issue? **Environment:** - Ollama version: 0.17.4 - OS: Ubuntu Server 24.04 LTS - GPU: 4x NVIDIA (3x RTX 3060 12GB + 1x RTX 2060 12GB) = 48GB total VRAM - CPU: Intel Core i7-4820K @ 3.70GHz - RAM: 16GB DDR3 **Issue:** Loading HuggingFace GGUF variants of qwen3.5 models fails with: 500: llama runner process has terminated: error loading model: missing tensor 'blk.0.ssm_in.weight' **Tested models that FAIL:** - hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L - bazobehram/qwen3-coder-next (community build based on Unsloth GGUF) **Official Ollama library versions that WORK:** - qwen3.5:27b-q8_0 - qwen3-coder-next:q4_K_M **Expected behavior:** HuggingFace GGUF variants should load correctly, allowing users to run alternative quantizations not available in the official Ollama library. **Actual behavior:** Model fails to load with missing tensor error, suggesting llama.cpp lacks support for the DeltaNet/SSM hybrid architecture used in the Qwen3.5 family. **Additional context:** The official Ollama library versions of these models work correctly, indicating Ollama has implemented a workaround internally. However, users cannot access alternative quantizations (e.g. Q6_K, Q5_K_M) that would provide better speed/quality tradeoffs for systems with limited VRAM. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.17.4

GiteaMirror added the bug label 2026-05-05 01:54:30 -05:00

GiteaMirror closed this issue

2026-05-05 01:54:31 -05:00

GiteaMirror commented

2026-05-05 01:54:33 -05:00

@TTDiang2 commented on GitHub (Mar 2, 2026):

Adding some further observations that might help pinpoint the root cause.

I’ve noticed that when attempting to run the official Ollama-pulled qwen3.5-35b-a3b (MoE) model using a standalone llama.cpp (v.b8170), it fails with a hyperparameter length error:

llama_model_load: error loading model: error loading model hyperparameters: key qwen35moe.rope.dimension_sections has wrong array length; expected 4, got 3
llama_model_load_from_file_impl: failed to load model

This suggests a discrepancy in how the tensors and hyperparameters are structured between the official Ollama library models and the current standard in llama.cpp. This also seems linked to the Vision issues reported in #14508; the official Ollama versions appear to be missing the necessary Vision tensors.

Root Cause Analysis:

Looking at the llama.cpp commit history, there was a specific sequence of updates for Qwen3.5 support:

v.b7973: Initial support for Qwen3.5 dense and MoE was added, but explicitly stated "no vision" (#19435).

v.b7976: This "imperfect" support was reverted.

v.b7990: Qwen3.5 support was re-implemented fully and correctly.

It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds.

Suggested Resolution:

To resolve these compatibility issues, the Ollama team likely needs to:

Re-sync/Update the underlying llama.cpp runner to a version post-b7990.
Refresh/Re-quantize the models in the official Ollama library to align with the finalized tensor structure.

Environment:

Ollama version: 0.17.4

@TTDiang2 commented on GitHub (Mar 2, 2026): # Adding some further observations that might help pinpoint the root cause. I’ve noticed that when attempting to run the official Ollama-pulled qwen3.5-35b-a3b (MoE) model using a standalone llama.cpp (v.b8170), it fails with a hyperparameter length error: ``` llama_model_load: error loading model: error loading model hyperparameters: key qwen35moe.rope.dimension_sections has wrong array length; expected 4, got 3 llama_model_load_from_file_impl: failed to load model ``` This suggests a discrepancy in how the tensors and hyperparameters are structured between the official Ollama library models and the current standard in llama.cpp. This also seems linked to the Vision issues reported in #14508; the official Ollama versions appear to be missing the necessary Vision tensors. ### Root Cause Analysis: Looking at the llama.cpp commit history, there was a specific sequence of updates for Qwen3.5 support: v.b7973: Initial support for Qwen3.5 dense and MoE was added, but explicitly stated "no vision" (#19435). v.b7976: This "imperfect" support was reverted. v.b7990: Qwen3.5 support was re-implemented fully and correctly. It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds. ### Suggested Resolution: To resolve these compatibility issues, the Ollama team likely needs to: - Re-sync/Update the underlying llama.cpp runner to a version post-b7990. - - Refresh/Re-quantize the models in the official Ollama library to align with the finalized tensor structure. Environment: Ollama version: 0.17.4

GiteaMirror commented

2026-05-05 01:54:34 -05:00

@rick-github commented on GitHub (Mar 2, 2026):

What is the output from ollama -v?

hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L is a split model (separate text and vision GGUFs) and won't be supported until #14134 is merged.

bazobehram/qwen3-coder-next is missing tool support in the template. Try frob/qwen3-coder-text.

@rick-github commented on GitHub (Mar 2, 2026): What is the output from `ollama -v`? hf.co/bartowski/Qwen_Qwen3.5-27B-GGUF:Q6_K_L is a split model (separate text and vision GGUFs) and won't be supported until #14134 is merged. bazobehram/qwen3-coder-next is missing tool support in the template. Try [frob/qwen3-coder-text](https://ollama.com/frob/qwen3-coder-next).

GiteaMirror commented

2026-05-05 01:54:39 -05:00

@rick-github commented on GitHub (Mar 2, 2026):

@TTDiang2

It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds.

Ollama has it's own implementation of qwen35. Please don't post AI content.

@rick-github commented on GitHub (Mar 2, 2026): @TTDiang2 > It appears Ollama (v0.17.4) might still be utilizing an implementation based on the earlier, pre-b7990 logic. This explains why community GGUFs from Hugging Face (built on the finalized llama.cpp spec) fail in Ollama, and why Ollama's own Qwen3.5 models are incompatible with newer standalone llama.cpp builds. Ollama has it's own implementation of qwen35. Please don't post AI content.

GiteaMirror commented

2026-05-05 01:54:43 -05:00

@TTDiang2 commented on GitHub (Mar 2, 2026):

Sorry, but its not AI generated, its AI translated. I'm not native English speaker :)

@TTDiang2 commented on GitHub (Mar 2, 2026): Sorry, but its not AI generated, its AI translated. I'm not native English speaker :)

GiteaMirror commented

2026-05-05 01:54:48 -05:00

@terjenauf commented on GitHub (Mar 2, 2026):

Thanks for your clarification regarding the bartowski model. That explained the issue. I tried frob/qwen3-coder-text q4_K_M.

Unfortunately I ran into problems using the model. I am low on RAM, and are limited to 48Gb VRAM split over 4 GPU's.
ollama -v gives me "ollama version is 0.17.4".

Is the missing tensor 'blk.0.ssm_in.weight' error i get specifically related to the split model issue, or is it a separate llama.cpp architecture support problem for the DeltaNet/SSM hybrid?

And two more things :-) I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok. And I am not very experienced in regards of ollama other than basic use.

@terjenauf commented on GitHub (Mar 2, 2026): Thanks for your clarification regarding the bartowski model. That explained the issue. I tried frob/qwen3-coder-text q4_K_M. Unfortunately I ran into problems using the model. I am low on RAM, and are limited to 48Gb VRAM split over 4 GPU's. ollama -v gives me "ollama version is 0.17.4". Is the missing tensor 'blk.0.ssm_in.weight' error i get specifically related to the split model issue, or is it a separate llama.cpp architecture support problem for the DeltaNet/SSM hybrid? And two more things :-) I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok. And I am not very experienced in regards of ollama other than basic use.

GiteaMirror commented

2026-05-05 01:54:51 -05:00

@rick-github commented on GitHub (Mar 2, 2026):

Unfortunately I ran into problems using the model.

The model works with 0.17.4:

$ ollama run frob/qwen3-coder-next:latest hello
Hello! How can I help you today? 😊

The reason I asked about the version is because 'blk.0.ssm_in.weight' errors are from 0.15.5. Is ollama version is 0.17.4 the only output from ollama -v?

I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok.

Translation is fine. The issue is that using AI generated content that is incorrect gives it an authoritative feel which can mislead others.

And I am not very experienced in regards of ollama other than basic use.

No problem, that's why questions are welcomed.

@rick-github commented on GitHub (Mar 2, 2026): > Unfortunately I ran into problems using the model. The model works with 0.17.4: ```console $ ollama run frob/qwen3-coder-next:latest hello Hello! How can I help you today? 😊 ``` The reason I asked about the version is because 'blk.0.ssm_in.weight' errors are from [0.15.5](https://github.com/ollama/ollama/issues/14133). Is `ollama version is 0.17.4` the _only_ output from `ollama -v`? > I'm, just as TTDiang2 not native english speaker, so I have to ask Claude every now and then if my writing is ok. Translation is fine. The issue is that using AI generated content that is incorrect gives it an authoritative feel which can mislead others. > And I am not very experienced in regards of ollama other than basic use. No problem, that's why questions are welcomed.

GiteaMirror commented

2026-05-05 01:54:53 -05:00

@terjenauf commented on GitHub (Mar 2, 2026):

Thanks, then I will wait in excitement for the #14134 being closed 👍

@terjenauf commented on GitHub (Mar 2, 2026): Thanks, then I will wait in excitement for the #14134 being closed 👍

GiteaMirror commented

2026-05-05 01:54:57 -05:00

@rick-github commented on GitHub (Mar 2, 2026):

#14134 will not fix blk.0.ssm_in.weight errors.

@rick-github commented on GitHub (Mar 2, 2026): #14134 will not fix `blk.0.ssm_in.weight` errors.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#71496