[GH-ISSUE #14730] unknown model architecture: 'qwen35moe' when loading imported GGUF with mmproj (vision projector) #71585

Closed
opened 2026-05-05 02:13:09 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @mirifiuto135-debug on GitHub (Mar 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14730

Description

Imported Qwen3.5-35B-A3B GGUF models fail to load when a vision projector (mmproj)
file is attached. The same model loads fine for text-only (without mmproj), and
loads fine with mmproj via llama.cpp's --mmproj flag.

Ollama version

0.17.7

Steps to reproduce

  1. Download a community Qwen 3.5 GGUF (e.g., from
    llmfan46/Qwen3.5-35B-A3B-heretic-v2-GGUF) and its mmproj file
    (Qwen3.5-35B-A3B-mmproj-BF16.gguf)
  2. Create a Modelfile:
    FROM Qwen3.5-35B-A3B-heretic-v2-Q5_K_M.gguf
    FROM Qwen3.5-35B-A3B-mmproj-BF16.gguf
    TEMPLATE """{{ .Prompt }}"""
  3. ollama create qwen3.5:test -f Modelfile → succeeds
  4. ollama run qwen3.5:test → fails

Also tried ADAPTER instead of second FROM — same result.

Error

llama_model_load: error loading model: error loading model architecture: unknown
model architecture: 'qwen35moe'

Expected behavior

The model should load with vision support, same as it does with llama.cpp:
llama-server -m Qwen3.5-35B-A3B-heretic-v2-Q5_K_M.gguf --mmproj
Qwen3.5-35B-A3B-mmproj-BF16.gguf -c 4096
This works perfectly — text and vision both functional.

Notes

  • Without mmproj, the model loads fine for text (families: ['qwen35moe'])
  • With mmproj, families becomes ['qwen35moe', 'clip'] and loading fails
  • The official qwen3.5:35b works with vision because it has native
    qwen35moe.vision.* tensors embedded in the main GGUF — no clip involved
  • PR #14517 fixed text-only loading of imported qwen35moe GGUFs but the
    multimodal/clip runner path was not updated for this architecture
  • GPU: 2x RTX 5060 16GB
Originally created by @mirifiuto135-debug on GitHub (Mar 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14730 Description Imported Qwen3.5-35B-A3B GGUF models fail to load when a vision projector (mmproj) file is attached. The same model loads fine for text-only (without mmproj), and loads fine with mmproj via llama.cpp's --mmproj flag. Ollama version 0.17.7 Steps to reproduce 1. Download a community Qwen 3.5 GGUF (e.g., from llmfan46/Qwen3.5-35B-A3B-heretic-v2-GGUF) and its mmproj file (Qwen3.5-35B-A3B-mmproj-BF16.gguf) 2. Create a Modelfile: FROM Qwen3.5-35B-A3B-heretic-v2-Q5_K_M.gguf FROM Qwen3.5-35B-A3B-mmproj-BF16.gguf TEMPLATE """{{ .Prompt }}""" 3. ollama create qwen3.5:test -f Modelfile → succeeds 4. ollama run qwen3.5:test → fails Also tried ADAPTER instead of second FROM — same result. Error llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' Expected behavior The model should load with vision support, same as it does with llama.cpp: llama-server -m Qwen3.5-35B-A3B-heretic-v2-Q5_K_M.gguf --mmproj Qwen3.5-35B-A3B-mmproj-BF16.gguf -c 4096 This works perfectly — text and vision both functional. Notes - Without mmproj, the model loads fine for text (families: ['qwen35moe']) - With mmproj, families becomes ['qwen35moe', 'clip'] and loading fails - The official qwen3.5:35b works with vision because it has native qwen35moe.vision.* tensors embedded in the main GGUF — no clip involved - PR #14517 fixed text-only loading of imported qwen35moe GGUFs but the multimodal/clip runner path was not updated for this architecture - GPU: 2x RTX 5060 16GB
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71585