[GH-ISSUE #14413] Model request for Qwen3.5-35B-A3B-base #9357

Open
opened 2026-04-12 22:13:31 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @olumolu on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14413

Model request for Qwen3.5-35B-A3B-Base

Originally created by @olumolu on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14413 [Model request for Qwen3.5-35B-A3B-Base](https://huggingface.co/Qwen/Qwen3.5-35B-A3B-Base)
GiteaMirror added the model label 2026-04-12 22:13:31 -05:00
Author
Owner

@VideoFX commented on GitHub (Feb 25, 2026):

Its in the ollama library now, but needs 0.17.1-rc1: https://github.com/ollama/ollama/releases/tag/v0.17.1-rc1

<!-- gh-comment-id:3957494189 --> @VideoFX commented on GitHub (Feb 25, 2026): Its in the ollama library now, but needs 0.17.1-rc1: https://github.com/ollama/ollama/releases/tag/v0.17.1-rc1
Author
Owner

@FieldMouse-AI commented on GitHub (Feb 25, 2026):

I believe I see where the problem is!

Root Cause: It appears Ollama 0.17.1-rc1 does not yet recognize the new qwen35moe architecture label used in recent Qwen 3.5 MoE GGUF builds.

unknown model architecture: 'qwen35moe'

Bonus Points: Also, for non-MoE models of the Qwen 3.5 series, the architectures should be qwen35.

Note: I verified this is systemic. Non-MoE models (like the 27B) also fail with:

unknown model architecture: 'qwen35'

Problem Description

I found that while the Ollama team have a version of qwen3.5:35b-q4_K_M in the Ollama library and that it works, it is too large to fit inside of my 24GB VRAM.

Further, when I originally tried to create my own qwen3.5-35b-a3b:Q4_K_M from the GGUF from HuggingFace, my version came out to be only 20GB. But, for some unknown reason, I keep getting Error: 500 Internal Server Error: unable to load model: /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 every time I try to ollama run the model.

This failure occurs consistently for me with the following versions of Ollama:

  • 0.15.5
  • 0.17.0
  • 0.17.1-rc1
  • 0.17-1 👈UPDATE 2026-02-26

What is available from Ollama

Image
ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama run qwen3.5:35b-q4_K_M
pulling manifest 
pulling 2abd0d805943: 100% ▕███████████████████████████████████████████████████████████████████████████▏  23 GB                         
pulling 7339fa418c9a: 100% ▕███████████████████████████████████████████████████████████████████████████▏  11 KB                         
pulling f6417cb1e269: 100% ▕███████████████████████████████████████████████████████████████████████████▏   42 B                         
pulling 9850298a701d: 100% ▕███████████████████████████████████████████████████████████████████████████▏  482 B                         
verifying sha256 digest 
writing manifest 
success 
>>> /set verbose
Set 'verbose' mode.
>>> /set nothink
Set 'nothink' mode.
>>> hello
Hello! How can I help you today?

total duration:       3.964958536s
load duration:        237.468563ms
prompt eval count:    13 token(s)
prompt eval duration: 3.040116725s
prompt eval rate:     4.28 tokens/s
eval count:           10 token(s)
eval duration:        624.671208ms
eval rate:            16.01 tokens/s
>>> 
ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama ps
NAME                  ID              SIZE     PROCESSOR          CONTEXT    UNTIL              
qwen3.5:35b-q4_K_M    4af949f8bdf0    27 GB    17%/83% CPU/GPU    32768      4 minutes from now   

My output from my ollama create

# Creating model: qwen3.5-35b-a3b:Q4_K_M
deleted 'qwen3.5-35b-a3b:Q4_K_M'
gathering model components 
copying file sha256:d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 100% 
parsing GGUF 
using existing layer sha256:d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 
using existing layer sha256:64631f1262e4e87d47511bb7b405540321afd297f723f88bf72faae19992ddba 
using existing layer sha256:c18ba48e9e3f2535a69d74fc019772a0fb5ad9ccd27eeb86fc686278665d92ae 
writing manifest 
success 

ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama run qwen3.5-35b-a3b:q4_k_m
Error: 500 Internal Server Error: unable to load model: /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9
ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ 
ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama --version
ollama version is 0.17.1-rc1

ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ll -h /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9
-rw-r--r-- 1 ollama-user ollama-user 20G Feb 25 21:55 /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9

😭Log file from my ollama create and failed ollama run

I believe that the problem in the log is shown to be here:

print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 19.76 GiB (4.90 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file_impl: failed to load model
time=2026-02-26T03:03:56.607+09:00 level=INFO source=sched.go:473 msg="NewLlamaServer failed" model=/app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 error="unable to load model: /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9"
[GIN] 2026/02/26 - 03:03:56 | 500 |  1.158766051s |       127.0.0.1 | POST     "/api/generate"

qwen3.5_failure.log

What I would LOVE💚 to see!

Since I am using ollama create I can create even smaller versions of Qwen3.5-35b-a3b.

😭 Sadly, now, none of them work.

I just want to be able to use the models that I create since they would actually fit.

Any questions or comments, please be sure to ask.
Thanks in advance folks!

<!-- gh-comment-id:3960863214 --> @FieldMouse-AI commented on GitHub (Feb 25, 2026): # I believe I see where the problem is! **Root Cause:** It appears Ollama `0.17.1-rc1` does not yet recognize the new **`qwen35moe`** architecture label used in recent Qwen 3.5 MoE GGUF builds. ``` unknown model architecture: 'qwen35moe' ``` **Bonus Points:** Also, for **non-MoE** models of the Qwen 3.5 series, the architectures should be **`qwen35`**. *Note:* I verified this is systemic. Non-MoE models (like the `27B`) also fail with: ``` unknown model architecture: 'qwen35' ``` # Problem Description I found that while the Ollama team have a version of `qwen3.5:35b-q4_K_M` in the Ollama library and that it works, it is too large to fit inside of my 24GB VRAM. Further, when I originally tried to create my own `qwen3.5-35b-a3b:Q4_K_M` from the GGUF from HuggingFace, my version came out to be only 20GB. But, for some unknown reason, I keep getting `Error: 500 Internal Server Error: unable to load model: /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9` every time I try to `ollama run` the model. This failure occurs consistently for me with the following versions of Ollama: - 0.15.5 - 0.17.0 - 0.17.1-rc1 - 0.17-1 **👈UPDATE 2026-02-26** # What is available from Ollama <img width="1143" height="102" alt="Image" src="https://github.com/user-attachments/assets/c5953df4-13a0-49bd-bd6f-6b93a20bfd6d" /> ``` ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama run qwen3.5:35b-q4_K_M pulling manifest pulling 2abd0d805943: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 23 GB pulling 7339fa418c9a: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 11 KB pulling f6417cb1e269: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 42 B pulling 9850298a701d: 100% ▕███████████████████████████████████████████████████████████████████████████▏ 482 B verifying sha256 digest writing manifest success >>> /set verbose Set 'verbose' mode. >>> /set nothink Set 'nothink' mode. >>> hello Hello! How can I help you today? total duration: 3.964958536s load duration: 237.468563ms prompt eval count: 13 token(s) prompt eval duration: 3.040116725s prompt eval rate: 4.28 tokens/s eval count: 10 token(s) eval duration: 624.671208ms eval rate: 16.01 tokens/s >>> ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen3.5:35b-q4_K_M 4af949f8bdf0 27 GB 17%/83% CPU/GPU 32768 4 minutes from now ``` # My output from my `ollama create` ``` # Creating model: qwen3.5-35b-a3b:Q4_K_M deleted 'qwen3.5-35b-a3b:Q4_K_M' gathering model components copying file sha256:d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 100% parsing GGUF using existing layer sha256:d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 using existing layer sha256:64631f1262e4e87d47511bb7b405540321afd297f723f88bf72faae19992ddba using existing layer sha256:c18ba48e9e3f2535a69d74fc019772a0fb5ad9ccd27eeb86fc686278665d92ae writing manifest success ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama run qwen3.5-35b-a3b:q4_k_m Error: 500 Internal Server Error: unable to load model: /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ollama --version ollama version is 0.17.1-rc1 ollama-user@98ef7bc77ad4:/mywork/qwen3.5$ ll -h /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 -rw-r--r-- 1 ollama-user ollama-user 20G Feb 25 21:55 /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 ``` ## 😭Log file from my `ollama create` and failed `ollama run` I believe that the problem in the log is shown to be here: ``` print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 19.76 GiB (4.90 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model time=2026-02-26T03:03:56.607+09:00 level=INFO source=sched.go:473 msg="NewLlamaServer failed" model=/app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9 error="unable to load model: /app/ollama/models/blobs/sha256-d9610e733abcf0ccad50cfabed78ae7a1d48b1a5c53999a0b467f89aa6ee9dd9" [GIN] 2026/02/26 - 03:03:56 | 500 | 1.158766051s | 127.0.0.1 | POST "/api/generate" ``` [qwen3.5_failure.log](https://github.com/user-attachments/files/25553692/qwen3.5_failure.log) # What I would LOVE💚 to see! Since I am using `ollama create` I can create even smaller versions of Qwen3.5-35b-a3b. 😭 Sadly, now, none of them work. I just want to be able to use the models that I create since they would actually fit. Any questions or comments, please be sure to ask. Thanks in advance folks!
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

It's unlikely the base version will be added to the ollama library. When ollama supports qwen35 models downloaded from HF it can be imported.

<!-- gh-comment-id:3974460475 --> @rick-github commented on GitHub (Feb 27, 2026): It's unlikely the `base` version will be added to the ollama library. When ollama [supports](#14503) qwen35 models downloaded from HF it can be [imported](https://github.com/ollama/ollama/blob/main/docs/import.mdx).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9357