[GH-ISSUE #14419] qwen3.5:35b Error: 500 Internal Server Error #55873

Closed
opened 2026-04-29 09:50:18 -05:00 by GiteaMirror · 52 comments
Owner

Originally created by @shoque88 on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14419

What is the issue?

ollama run qwen3.5:35b
pulling manifest
pulling d838916ba05b: 100% ▕██████████████████████████████████████████████████████▏ 23 GB
pulling 7339fa418c9a: 100% ▕██████████████████████████████████████████████████████▏ 11 KB
pulling f6417cb1e269: 100% ▕██████████████████████████████████████████████████████▏ 42 B
pulling d6d7dd44ab69: 100% ▕██████████████████████████████████████████████████████▏ 502 B
verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: unable to load model

Relevant log output


OS

WSL

GPU

RTX 5090

CPU

R7 7800x3d

Ollama version

ollama version is 0.17.0

Originally created by @shoque88 on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14419 ### What is the issue? ollama run qwen3.5:35b pulling manifest pulling d838916ba05b: 100% ▕██████████████████████████████████████████████████████▏ 23 GB pulling 7339fa418c9a: 100% ▕██████████████████████████████████████████████████████▏ 11 KB pulling f6417cb1e269: 100% ▕██████████████████████████████████████████████████████▏ 42 B pulling d6d7dd44ab69: 100% ▕██████████████████████████████████████████████████████▏ 502 B verifying sha256 digest writing manifest success Error: 500 Internal Server Error: unable to load model ### Relevant log output ```shell ``` ### OS WSL ### GPU RTX 5090 ### CPU R7 7800x3d ### Ollama version ollama version is 0.17.0
GiteaMirror added the bug label 2026-04-29 09:50:18 -05:00
Author
Owner

@wangzhongliang commented on GitHub (Feb 25, 2026):

same issue here

<!-- gh-comment-id:3958685357 --> @wangzhongliang commented on GitHub (Feb 25, 2026): same issue here
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

0.17.1 for qwen3.5 support.

<!-- gh-comment-id:3958763362 --> @rick-github commented on GitHub (Feb 25, 2026): 0.17.1 for qwen3.5 support.
Author
Owner

@wangzhongliang commented on GitHub (Feb 25, 2026):

0.17.1 for qwen3.5 support.

I solve it by installing 0.17.1-rc1

<!-- gh-comment-id:3958981286 --> @wangzhongliang commented on GitHub (Feb 25, 2026): > 0.17.1 for qwen3.5 support. I solve it by installing 0.17.1-rc1
Author
Owner

@shoque88 commented on GitHub (Feb 25, 2026):

when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0
how should I update to 0.17.1?

<!-- gh-comment-id:3959020060 --> @shoque88 commented on GitHub (Feb 25, 2026): when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0 how should I update to 0.17.1?
Author
Owner

@rawzone commented on GitHub (Feb 25, 2026):

when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0 how should I update to 0.17.1?

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc1 sh

@shoque88 this seems to install a select version.

<!-- gh-comment-id:3959039089 --> @rawzone commented on GitHub (Feb 25, 2026): > when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0 how should I update to 0.17.1? `curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc1 sh` @shoque88 this seems to install a select version.
Author
Owner

@shoque88 commented on GitHub (Feb 25, 2026):

Thank you, solved

<!-- gh-comment-id:3959091565 --> @shoque88 commented on GitHub (Feb 25, 2026): Thank you, solved
Author
Owner

@SebastianGode commented on GitHub (Feb 25, 2026):

It will not work with ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine.

<!-- gh-comment-id:3959146644 --> @SebastianGode commented on GitHub (Feb 25, 2026): It will not work with `ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL` sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine.
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync.

<!-- gh-comment-id:3959159035 --> @rick-github commented on GitHub (Feb 25, 2026): qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync.
Author
Owner

@richardng1505 commented on GitHub (Feb 25, 2026):

use https://github.com/ollama/ollama/releases/tag/v0.17.1-rc1 it works

<!-- gh-comment-id:3959261175 --> @richardng1505 commented on GitHub (Feb 25, 2026): use https://github.com/ollama/ollama/releases/tag/v0.17.1-rc1 it works
Author
Owner

@Snify89 commented on GitHub (Feb 25, 2026):

Edit: My context size was too high I guess. After lowering it, it worked

I can't get it to work with 0.17.1-rc1 (Qwen3.5 35B A3B Q4_K_M):

time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:20000 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[moz-extension://* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false"
time=2026-02-25T14:12:06.440Z level=INFO source=images.go:473 msg="total blobs: 33"
time=2026-02-25T14:12:06.440Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-25T14:12:06.441Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.1-rc1)"
time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-02-25T14:12:06.442Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39011"
time=2026-02-25T14:12:06.665Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36591"
time=2026-02-25T14:12:06.877Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38485"
time=2026-02-25T14:12:07.098Z level=INFO source=types.go:42 msg="inference compute" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:06:00.0 type=discrete total="12.0 GiB" available="11.6 GiB"
time=2026-02-25T14:12:07.098Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096
[GIN] 2026/02/25 - 14:12:07 | 200 | 2.290428ms | 127.0.0.1 | GET "/v1/models"
time=2026-02-25T14:12:07.321Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45629"
time=2026-02-25T14:12:07.543Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing "max": invalid syntax"
time=2026-02-25T14:12:07.724Z level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-02-25T14:12:07.724Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a --port 34401"
time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:491 msg="system memory" total="15.5 GiB" free="15.3 GiB" free_swap="0 B"
time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:498 msg="gpu memory" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 library=CUDA available="11.2 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-25T14:12:07.724Z level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:34401"
time=2026-02-25T14:12:07.746Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:07.816Z level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-02-25T14:12:07.985Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-02-25T14:12:09.210Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:09.897Z level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:11.218Z level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:482 msg="offloading 18 repeating layers to GPU"
time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:494 msg="offloaded 18/41 layers to GPU"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.1 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:245 msg="model weights" device=CPU size="13.1 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="878.0 MiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.1 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="916.2 MiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="214.6 MiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:272 msg="total memory" size="25.2 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=sched.go:566 msg="loaded runners" count=1
time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-25T14:12:20.980Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding"
time=2026-02-25T14:12:22.165Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error"
time=2026-02-25T14:12:22.415Z level=ERROR source=sched.go:572 msg="error loading llama server" error="llama runner process has terminated: signal: killed"
[GIN] 2026/02/25 - 14:12:22 | 500 | 15.318145428s | 192.168.178.48 | POST "/v1/chat/completions"
time=2026-02-25T14:12:25.419Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38265"
time=2026-02-25T14:12:26.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37819"
time=2026-02-25T14:12:26.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35919"
time=2026-02-25T14:12:26.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44635"
time=2026-02-25T14:12:26.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44417"
time=2026-02-25T14:12:27.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46607"
time=2026-02-25T14:12:27.308Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40303"
time=2026-02-25T14:12:27.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35357"
time=2026-02-25T14:12:27.809Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38131"
time=2026-02-25T14:12:28.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43583"
time=2026-02-25T14:12:28.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43735"
time=2026-02-25T14:12:28.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40861"
time=2026-02-25T14:12:28.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33921"
time=2026-02-25T14:12:29.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34741"
time=2026-02-25T14:12:29.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37543"
time=2026-02-25T14:12:29.558Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41577"
time=2026-02-25T14:12:29.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40053"
time=2026-02-25T14:12:30.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37391"
time=2026-02-25T14:12:30.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45947"
time=2026-02-25T14:12:30.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42859"
time=2026-02-25T14:12:30.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38487"
time=2026-02-25T14:12:30.809Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2026-02-25T14:12:30.809Z level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"

<!-- gh-comment-id:3959577076 --> @Snify89 commented on GitHub (Feb 25, 2026): Edit: My context size was too high I guess. After lowering it, it worked I can't get it to work with 0.17.1-rc1 (Qwen3.5 35B A3B Q4_K_M): > time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:20000 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[moz-extension://* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" > time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false" > time=2026-02-25T14:12:06.440Z level=INFO source=images.go:473 msg="total blobs: 33" > time=2026-02-25T14:12:06.440Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" > time=2026-02-25T14:12:06.441Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.1-rc1)" > time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:67 msg="discovering available GPUs..." > time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" > time=2026-02-25T14:12:06.442Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39011" > time=2026-02-25T14:12:06.665Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36591" > time=2026-02-25T14:12:06.877Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38485" > time=2026-02-25T14:12:07.098Z level=INFO source=types.go:42 msg="inference compute" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:06:00.0 type=discrete total="12.0 GiB" available="11.6 GiB" > time=2026-02-25T14:12:07.098Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096 > [GIN] 2026/02/25 - 14:12:07 | 200 | 2.290428ms | 127.0.0.1 | GET "/v1/models" > time=2026-02-25T14:12:07.321Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45629" > time=2026-02-25T14:12:07.543Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" > time=2026-02-25T14:12:07.724Z level=INFO source=server.go:247 msg="enabling flash attention" > time=2026-02-25T14:12:07.724Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a --port 34401" > time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:491 msg="system memory" total="15.5 GiB" free="15.3 GiB" free_swap="0 B" > time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:498 msg="gpu memory" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 library=CUDA available="11.2 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B" > time=2026-02-25T14:12:07.724Z level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1 > time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1411 msg="starting ollama engine" > time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:34401" > time=2026-02-25T14:12:07.746Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:07.816Z level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56 > load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so > ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no > ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no > ggml_cuda_init: found 1 CUDA devices: > Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 > load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so > time=2026-02-25T14:12:07.985Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) > time=2026-02-25T14:12:09.210Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:09.897Z level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:11.218Z level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:482 msg="offloading 18 repeating layers to GPU" > time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU" > time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:494 msg="offloaded 18/41 layers to GPU" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.1 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:245 msg="model weights" device=CPU size="13.1 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="878.0 MiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.1 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="916.2 MiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="214.6 MiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:272 msg="total memory" size="25.2 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=sched.go:566 msg="loaded runners" count=1 > time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" > time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" > time=2026-02-25T14:12:20.980Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding" > time=2026-02-25T14:12:22.165Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error" > time=2026-02-25T14:12:22.415Z level=ERROR source=sched.go:572 msg="error loading llama server" error="llama runner process has terminated: signal: killed" > [GIN] 2026/02/25 - 14:12:22 | 500 | 15.318145428s | 192.168.178.48 | POST "/v1/chat/completions" > time=2026-02-25T14:12:25.419Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38265" > time=2026-02-25T14:12:26.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37819" > time=2026-02-25T14:12:26.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35919" > time=2026-02-25T14:12:26.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44635" > time=2026-02-25T14:12:26.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44417" > time=2026-02-25T14:12:27.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46607" > time=2026-02-25T14:12:27.308Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40303" > time=2026-02-25T14:12:27.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35357" > time=2026-02-25T14:12:27.809Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38131" > time=2026-02-25T14:12:28.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43583" > time=2026-02-25T14:12:28.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43735" > time=2026-02-25T14:12:28.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40861" > time=2026-02-25T14:12:28.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33921" > time=2026-02-25T14:12:29.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34741" > time=2026-02-25T14:12:29.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37543" > time=2026-02-25T14:12:29.558Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41577" > time=2026-02-25T14:12:29.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40053" > time=2026-02-25T14:12:30.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37391" > time=2026-02-25T14:12:30.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45947" > time=2026-02-25T14:12:30.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42859" > time=2026-02-25T14:12:30.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38487" > time=2026-02-25T14:12:30.809Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout" > time=2026-02-25T14:12:30.809Z level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"
Author
Owner

@sleeplessai commented on GitHub (Feb 25, 2026):

verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: unable to load model: C:\Users\<username>\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a

@richardng1505 thanks for solution. the 0.17.1-rc1 works well.

<!-- gh-comment-id:3959810425 --> @sleeplessai commented on GitHub (Feb 25, 2026): ``` verifying sha256 digest writing manifest success Error: 500 Internal Server Error: unable to load model: C:\Users\<username>\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a ``` @richardng1505 thanks for solution. the 0.17.1-rc1 works well.
Author
Owner

@rluisr commented on GitHub (Feb 25, 2026):

hmm...

root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17
root@octa:~# vim /etc/systemd/system/ollama.service
root@octa:~# /usr/local/bin/ollama --version
ollama version is 0.17.1-rc1
<!-- gh-comment-id:3960431282 --> @rluisr commented on GitHub (Feb 25, 2026): hmm... ``` root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17 root@octa:~# vim /etc/systemd/system/ollama.service root@octa:~# /usr/local/bin/ollama --version ollama version is 0.17.1-rc1 ```
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

@rluisr https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

<!-- gh-comment-id:3960438868 --> @rick-github commented on GitHub (Feb 25, 2026): @rluisr https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035
Author
Owner

@pitchinnate commented on GitHub (Feb 25, 2026):

Running ollama version is 0.17.1-rc1 I tried to run the 27B and 35B variations both got 500 error. Not sure if this is helpful from the logs.

ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M

print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 15.58 GiB (4.98 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model

ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M

print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 19.74 GiB (4.89 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file_impl: failed to load model
<!-- gh-comment-id:3960731301 --> @pitchinnate commented on GitHub (Feb 25, 2026): Running `ollama version is 0.17.1-rc1` I tried to run the 27B and 35B variations both got 500 error. Not sure if this is helpful from the logs. `ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M` ``` print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 15.58 GiB (4.98 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' llama_model_load_from_file_impl: failed to load model ``` `ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M` ``` print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 19.74 GiB (4.89 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model ```
Author
Owner

@SebastianGode commented on GitHub (Feb 25, 2026):

@pitchinnate scroll a bit up here :)

<!-- gh-comment-id:3960997650 --> @SebastianGode commented on GitHub (Feb 25, 2026): @pitchinnate scroll a bit up here :)
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

@pitchinnate https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

<!-- gh-comment-id:3961077576 --> @rick-github commented on GitHub (Feb 25, 2026): @pitchinnate https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035
Author
Owner

@sleeplessai commented on GitHub (Feb 25, 2026):

hmm...

root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17
root@octa:~# vim /etc/systemd/system/ollama.service
root@octa:~# /usr/local/bin/ollama --version
ollama version is 0.17.1-rc1

Only work with
ollama run qwen3.5:35b

<!-- gh-comment-id:3961100471 --> @sleeplessai commented on GitHub (Feb 25, 2026): > hmm... > ``` > root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF > Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17 > root@octa:~# vim /etc/systemd/system/ollama.service > root@octa:~# /usr/local/bin/ollama --version > ollama version is 0.17.1-rc1 > ``` Only work with `ollama run qwen3.5:35b`
Author
Owner

@pitchinnate commented on GitHub (Feb 25, 2026):

Sorry I misunderstood this comment then:

It will not work with ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine.

I thought they were saying the hf Q4_K_M variant worked.

<!-- gh-comment-id:3961103123 --> @pitchinnate commented on GitHub (Feb 25, 2026): Sorry I misunderstood this comment then: ``` It will not work with ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine. ``` I thought they were saying the hf Q4_K_M variant worked.
Author
Owner

@filmo commented on GitHub (Feb 25, 2026):

I'm getting a very similar error with RTX-3090s. I've 'uninstalled' it via 'ollama rm' and by manually deleting the sha file in models and then re-installing. Nether solved the problem:

Operating System: Debian GNU/Linux 12 (bookworm)  
          Kernel: Linux 6.8.12-11-pve
    Architecture: x86-64


root@openwebui:~# ollama -v
ollama version is 0.17.0

root@openwebui:~# nvidia-smi 
Wed Feb 25 10:22:35 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   37C    P8             25W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  |   00000000:05:00.0 Off |                  N/A |
| 30%   21C    P8             31W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        On  |   00000000:08:00.0 Off |                  N/A |
| 33%   22C    P8             20W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        On  |   00000000:09:00.0 Off |                  N/A |
| 32%   21C    P8             20W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

root@openwebui:~/.ollama/models/blobs# ollama pull qwen3.5:35b
pulling manifest 
pulling d838916ba05b: 100%                 
verifying sha256 digest 
writing manifest 
success 

root@openwebui:~/.ollama/models/blobs# ollama run qwen3.5:35b
Error: 500 Internal Server Error: unable to load model: /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a

I have not tried release candidate 0.17.1

<!-- gh-comment-id:3961246569 --> @filmo commented on GitHub (Feb 25, 2026): I'm getting a very similar error with RTX-3090s. I've 'uninstalled' it via 'ollama rm' and by manually deleting the sha file in models and then re-installing. Nether solved the problem: ``` Operating System: Debian GNU/Linux 12 (bookworm) Kernel: Linux 6.8.12-11-pve Architecture: x86-64 root@openwebui:~# ollama -v ollama version is 0.17.0 root@openwebui:~# nvidia-smi Wed Feb 25 10:22:35 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A | | 0% 37C P8 25W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 On | 00000000:05:00.0 Off | N/A | | 30% 21C P8 31W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3090 On | 00000000:08:00.0 Off | N/A | | 33% 22C P8 20W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3090 On | 00000000:09:00.0 Off | N/A | | 32% 21C P8 20W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ root@openwebui:~/.ollama/models/blobs# ollama pull qwen3.5:35b pulling manifest pulling d838916ba05b: 100% verifying sha256 digest writing manifest success root@openwebui:~/.ollama/models/blobs# ollama run qwen3.5:35b Error: 500 Internal Server Error: unable to load model: /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a ``` ### I have not tried release candidate 0.17.1 ###
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

@filmo https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

<!-- gh-comment-id:3961261177 --> @rick-github commented on GitHub (Feb 25, 2026): @filmo https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362
Author
Owner

@davidvv commented on GitHub (Feb 25, 2026):

Same issue here on macOS with Ollama 0.17.0 and Apple M4 Pro. Getting Error: 500 Internal Server Error: unable to load model when trying to run qwen3.5:35b. The model file exists but fails to load. Other models like glm-4.7-flash work fine.

<!-- gh-comment-id:3961510029 --> @davidvv commented on GitHub (Feb 25, 2026): Same issue here on macOS with Ollama 0.17.0 and Apple M4 Pro. Getting Error: 500 Internal Server Error: unable to load model when trying to run qwen3.5:35b. The model file exists but fails to load. Other models like glm-4.7-flash work fine.
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

@davidvv https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

<!-- gh-comment-id:3961518007 --> @rick-github commented on GitHub (Feb 25, 2026): @davidvv https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362
Author
Owner

@VideoFX commented on GitHub (Feb 25, 2026):

also, I can't disable thinking. 0.17.1-rc1

<!-- gh-comment-id:3961557402 --> @VideoFX commented on GitHub (Feb 25, 2026): also, I can't disable thinking. 0.17.1-rc1
Author
Owner

@Avaruz commented on GitHub (Feb 25, 2026):

I got this error:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file_impl: failed to load model

ollama version is 0.17.0
2 NVIDIA GTX3090 24 GB each

<!-- gh-comment-id:3961802165 --> @Avaruz commented on GitHub (Feb 25, 2026): I got this error: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model ollama version is 0.17.0 2 NVIDIA GTX3090 24 GB each
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

@Avaruz https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

<!-- gh-comment-id:3961811106 --> @rick-github commented on GitHub (Feb 25, 2026): @Avaruz https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362
Author
Owner

@alnaranjo commented on GitHub (Feb 25, 2026):

also, I can't disable thinking. 0.17.1-rc1

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

<!-- gh-comment-id:3961813108 --> @alnaranjo commented on GitHub (Feb 25, 2026): > also, I can't disable thinking. 0.17.1-rc1 curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh
Author
Owner

@VideoFX commented on GitHub (Feb 25, 2026):

also, I can't disable thinking. 0.17.1-rc1

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

Thank you, I see updates 2 hours ago.

<!-- gh-comment-id:3962204796 --> @VideoFX commented on GitHub (Feb 25, 2026): > > also, I can't disable thinking. 0.17.1-rc1 > > curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh Thank you, I see updates 2 hours ago.
Author
Owner

@YJesus commented on GitHub (Feb 25, 2026):

ollama --version
ollama version is 0.17.1-rc2

ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q6_K
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-b0700c220e072828ea194df6f5679adda6ffaf165b13cd3be91834efdd360361

ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:IQ4_XS
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-4ddc2fdfa0a6825967bbc1d3bddf703045523df60bbc26ba1f30db181850e4a7

<!-- gh-comment-id:3962977362 --> @YJesus commented on GitHub (Feb 25, 2026): ollama --version ollama version is 0.17.1-rc2 ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q6_K Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-b0700c220e072828ea194df6f5679adda6ffaf165b13cd3be91834efdd360361 ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:IQ4_XS Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-4ddc2fdfa0a6825967bbc1d3bddf703045523df60bbc26ba1f30db181850e4a7
Author
Owner

@rick-github commented on GitHub (Feb 25, 2026):

@YJesus https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

<!-- gh-comment-id:3962983846 --> @rick-github commented on GitHub (Feb 25, 2026): @YJesus https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035
Author
Owner

@VideoFX commented on GitHub (Feb 26, 2026):

also, I can't disable thinking. 0.17.1-rc1

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

Thank you, I see updates 2 hours ago.

I was able to disable thinking after updating to "qwen3.5:35b-a3b-q4_K_M" and 0.17.1-rc2

<!-- gh-comment-id:3963271454 --> @VideoFX commented on GitHub (Feb 26, 2026): > > > also, I can't disable thinking. 0.17.1-rc1 > > > > > > curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh > > Thank you, I see updates 2 hours ago. I was able to disable thinking after updating to "qwen3.5:35b-a3b-q4_K_M" and 0.17.1-rc2
Author
Owner

@0x7CFE commented on GitHub (Feb 26, 2026):

In my case 0.17.1-rc2 and more recent 0.17.1 both crash when trying to use the model with OLLAMA_VULKAN=1. CPU only works fine.

$ ollama --version
ollama version is 0.17.1
$ ollama run qwen3.5:35b
>>> /set verbose
Set 'verbose' mode.
>>> Tell me about yourself.
Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details
фев 26 10:58:04 fw13 ollama[878128]: SIGSEGV: segmentation violation
фев 26 10:58:04 fw13 ollama[878128]: PC=0x72394d6f057f m=19 sigcode=1 addr=0x669000
фев 26 10:58:04 fw13 ollama[878128]: signal arrived during cgo execution
фев 26 10:58:04 fw13 ollama[878128]: goroutine 2160 gp=0xc000f22c40 m=19 mp=0xc00058c808 [syscall]:
фев 26 10:58:04 fw13 ollama[878128]: runtime.cgocall(0x6143cf309330, 0xc00175daa0)
фев 26 10:58:04 fw13 ollama[878128]:         runtime/cgocall.go:167 +0x4b fp=0xc00175da78 sp=0xc00175da40 pc=0x6143ce3bba6b
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x614400d23090, 0x723894001c20)
фев 26 10:58:04 fw13 ollama[878128]:         _cgo_gotypes.go:979 +0x4a fp=0xc00175daa0 sp=0xc00175da78 pc=0x6143ce8a6b0a
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...)
фев 26 10:58:04 fw13 ollama[878128]:         github.com/ollama/ollama/ml/backend/ggml/ggml.go:825
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc000f77580, 0xc00164c280?, {0xc00185f7b0, 0x1, 0x2?})
фев 26 10:58:04 fw13 ollama[878128]:         github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b2 fp=0xc00175db78 sp=0xc00175daa0 pc=0x6143ce8b5492
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00023b0e0, {0x0, {0x6143cfc609d0, 0xc000f77580}, {0x6143cfc6de30, 0xc00174e048}, {0xc0016e8f00, 0xf, 0x10}, {{0x6143cfc6de30, ...}, ...}, ...})
фев 26 10:58:04 fw13 ollama[878128]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:716 +0x862 fp=0xc00175def0 sp=0xc00175db78 pc=0x6143ce9e0282
<!-- gh-comment-id:3964315496 --> @0x7CFE commented on GitHub (Feb 26, 2026): In my case `0.17.1-rc2` and more recent `0.17.1` both crash when trying to use the model with `OLLAMA_VULKAN=1`. CPU only works fine. ``` $ ollama --version ollama version is 0.17.1 $ ollama run qwen3.5:35b >>> /set verbose Set 'verbose' mode. >>> Tell me about yourself. Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details ``` ``` фев 26 10:58:04 fw13 ollama[878128]: SIGSEGV: segmentation violation фев 26 10:58:04 fw13 ollama[878128]: PC=0x72394d6f057f m=19 sigcode=1 addr=0x669000 фев 26 10:58:04 fw13 ollama[878128]: signal arrived during cgo execution фев 26 10:58:04 fw13 ollama[878128]: goroutine 2160 gp=0xc000f22c40 m=19 mp=0xc00058c808 [syscall]: фев 26 10:58:04 fw13 ollama[878128]: runtime.cgocall(0x6143cf309330, 0xc00175daa0) фев 26 10:58:04 fw13 ollama[878128]: runtime/cgocall.go:167 +0x4b fp=0xc00175da78 sp=0xc00175da40 pc=0x6143ce3bba6b фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x614400d23090, 0x723894001c20) фев 26 10:58:04 fw13 ollama[878128]: _cgo_gotypes.go:979 +0x4a fp=0xc00175daa0 sp=0xc00175da78 pc=0x6143ce8a6b0a фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...) фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc000f77580, 0xc00164c280?, {0xc00185f7b0, 0x1, 0x2?}) фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b2 fp=0xc00175db78 sp=0xc00175daa0 pc=0x6143ce8b5492 фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00023b0e0, {0x0, {0x6143cfc609d0, 0xc000f77580}, {0x6143cfc6de30, 0xc00174e048}, {0xc0016e8f00, 0xf, 0x10}, {{0x6143cfc6de30, ...}, ...}, ...}) фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/runner/ollamarunner/runner.go:716 +0x862 fp=0xc00175def0 sp=0xc00175db78 pc=0x6143ce9e0282 ```
Author
Owner

@yezhoujie commented on GitHub (Feb 26, 2026):

Same error to me on MacBook Pro M3 MAX 36G

ollama run qwen3.5-27b:latest
Error: 500 Internal Server Error: unable to load model: /Users/yzj/.ollama/models/blobs/sha256-4086d669ca6a7a8777ae8fd8506bf971abd2aece175b65354c63dffb6cc574f2
<!-- gh-comment-id:3964393934 --> @yezhoujie commented on GitHub (Feb 26, 2026): Same error to me on MacBook Pro M3 MAX 36G ``` ollama run qwen3.5-27b:latest Error: 500 Internal Server Error: unable to load model: /Users/yzj/.ollama/models/blobs/sha256-4086d669ca6a7a8777ae8fd8506bf971abd2aece175b65354c63dffb6cc574f2 ```
Author
Owner

@j3g commented on GitHub (Feb 26, 2026):

on version 0.17.1

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'

<!-- gh-comment-id:3964929242 --> @j3g commented on GitHub (Feb 26, 2026): on version 0.17.1 > llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
Author
Owner

@trollize commented on GitHub (Feb 26, 2026):

Do you have the mmproj projection file for Qwen3.5-27B in same directory?

mmproj-BF16.gguf

I don't have it, and it shows in error log:

time=2026-02-25T13:39:54.263-05:00 level=INFO source=sched.go:498 msg="gpu memory" id=GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 library=CUDA available="14.3 GiB" free="14.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-25T13:39:54.263-05:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-02-25T13:39:54.300-05:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-25T13:39:54.301-05:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:51684"
time=2026-02-25T13:39:54.307-05:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T13:39:54.333-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56
////
[GIN] 2026/02/25 - 13:52:05 | 200 | 15.2596ms | 10.0.0.154 | GET "/api/tags"
[GIN] 2026/02/25 - 13:52:11 | 200 | 0s | 10.0.0.154 | HEAD "/"
[GIN] 2026/02/25 - 13:52:11 | 404 | 7.8041ms | 10.0.0.154 | POST "/api/show"
time=2026-02-25T13:52:12.265-05:00 level=INFO source=download.go:179 msg="downloading a98ab071b984 in 16 979 MB part(s)"
[GIN] 2026/02/25 - 13:52:26 | 204 | 0s | 10.0.0.154 | OPTIONS "/api/chat"
time=2026-02-25T13:52:26.776-05:00 level=WARN source=types.go:976 msg="invalid option provided" option=keep_alive
time=2026-02-25T13:52:26.803-05:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\Users\Admin\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 65050"
time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=20 efficiency=12 threads=20
llama_model_loader: loaded meta data with 42 key-value pairs and 851 tensors from E:\Users\AI.ollama\models\blobs\sha256-eefa528ef8e8948a3b177e562cbd7554c8536914fd9278ac4eaf83bcceff1b27 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000
llama_model_loader: - kv 5: general.name str = Qwen3.5-27B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 27B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.tags arr[str,1] = ["image-text-to-text"]
llama_model_loader: - kv 13: qwen35.block_count u32 = 64
llama_model_loader: - kv 14: qwen35.context_length u32 = 262144
llama_model_loader: - kv 15: qwen35.embedding_length u32 = 5120
llama_model_loader: - kv 16: qwen35.feed_forward_length u32 = 17408
llama_model_loader: - kv 17: qwen35.attention.head_count u32 = 24
llama_model_loader: - kv 18: qwen35.attention.head_count_kv u32 = 4
llama_model_loader: - kv 19: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 20: qwen35.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 21: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 22: qwen35.attention.key_length u32 = 256
llama_model_loader: - kv 23: qwen35.attention.value_length u32 = 256
llama_model_loader: - kv 24: qwen35.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 25: qwen35.ssm.state_size u32 = 128
llama_model_loader: - kv 26: qwen35.ssm.group_count u32 = 16
llama_model_loader: - kv 27: qwen35.ssm.time_step_rank u32 = 48
llama_model_loader: - kv 28: qwen35.ssm.inner_size u32 = 6144
llama_model_loader: - kv 29: qwen35.full_attention_interval u32 = 4
llama_model_loader: - kv 30: qwen35.rope.dimension_count u32 = 64
llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 32: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 36: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 37: tokenizer.ggml.padding_token_id u32 = 248044
llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 39: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 40: general.quantization_version u32 = 2
llama_model_loader: - kv 41: general.file_type u32 = 14
llama_model_loader: - type f32: 353 tensors
llama_model_loader: - type q8_0: 96 tensors
llama_model_loader: - type q4_K: 341 tensors
llama_model_loader: - type q5_K: 60 tensors
llama_model_loader: - type q6_K: 1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Small
print_info: file size = 14.68 GiB (4.69 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model

<!-- gh-comment-id:3965116391 --> @trollize commented on GitHub (Feb 26, 2026): ### Do you have the mmproj projection file for Qwen3.5-27B in same directory? [mmproj-BF16.gguf](https://huggingface.co/unsloth/Qwen3.5-27B-GGUF/blob/main/mmproj-BF16.gguf) I don't have it, and it shows in error log: time=2026-02-25T13:39:54.263-05:00 level=INFO source=sched.go:498 msg="gpu memory" id=GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 library=CUDA available="14.3 GiB" free="14.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-25T13:39:54.263-05:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1 time=2026-02-25T13:39:54.300-05:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-25T13:39:54.301-05:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:51684" time=2026-02-25T13:39:54.307-05:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-25T13:39:54.333-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56 //// [GIN] 2026/02/25 - 13:52:05 | 200 | 15.2596ms | 10.0.0.154 | GET "/api/tags" [GIN] 2026/02/25 - 13:52:11 | 200 | 0s | 10.0.0.154 | HEAD "/" [GIN] 2026/02/25 - 13:52:11 | 404 | 7.8041ms | 10.0.0.154 | POST "/api/show" time=2026-02-25T13:52:12.265-05:00 level=INFO source=download.go:179 msg="downloading a98ab071b984 in 16 979 MB part(s)" [GIN] 2026/02/25 - 13:52:26 | 204 | 0s | 10.0.0.154 | OPTIONS "/api/chat" time=2026-02-25T13:52:26.776-05:00 level=WARN source=types.go:976 msg="invalid option provided" option=keep_alive time=2026-02-25T13:52:26.803-05:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\Admin\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 65050" time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=20 efficiency=12 threads=20 llama_model_loader: loaded meta data with 42 key-value pairs and 851 tensors from E:\Users\AI\.ollama\models\blobs\sha256-eefa528ef8e8948a3b177e562cbd7554c8536914fd9278ac4eaf83bcceff1b27 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen35 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.sampling.top_k i32 = 20 llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000 llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000 llama_model_loader: - kv 5: general.name str = Qwen3.5-27B llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B llama_model_loader: - kv 7: general.quantized_by str = Unsloth llama_model_loader: - kv 8: general.size_label str = 27B llama_model_loader: - kv 9: general.license str = apache-2.0 llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2... llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth llama_model_loader: - kv 12: general.tags arr[str,1] = ["image-text-to-text"] llama_model_loader: - kv 13: qwen35.block_count u32 = 64 llama_model_loader: - kv 14: qwen35.context_length u32 = 262144 llama_model_loader: - kv 15: qwen35.embedding_length u32 = 5120 llama_model_loader: - kv 16: qwen35.feed_forward_length u32 = 17408 llama_model_loader: - kv 17: qwen35.attention.head_count u32 = 24 llama_model_loader: - kv 18: qwen35.attention.head_count_kv u32 = 4 llama_model_loader: - kv 19: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0] llama_model_loader: - kv 20: qwen35.rope.freq_base f32 = 10000000.000000 llama_model_loader: - kv 21: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: qwen35.attention.key_length u32 = 256 llama_model_loader: - kv 23: qwen35.attention.value_length u32 = 256 llama_model_loader: - kv 24: qwen35.ssm.conv_kernel u32 = 4 llama_model_loader: - kv 25: qwen35.ssm.state_size u32 = 128 llama_model_loader: - kv 26: qwen35.ssm.group_count u32 = 16 llama_model_loader: - kv 27: qwen35.ssm.time_step_rank u32 = 48 llama_model_loader: - kv 28: qwen35.ssm.inner_size u32 = 6144 llama_model_loader: - kv 29: qwen35.full_attention_interval u32 = 4 llama_model_loader: - kv 30: qwen35.rope.dimension_count u32 = 64 llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 32: tokenizer.ggml.pre str = qwen35 llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 36: tokenizer.ggml.eos_token_id u32 = 248046 llama_model_loader: - kv 37: tokenizer.ggml.padding_token_id u32 = 248044 llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 39: tokenizer.chat_template str = {%- set image_count = namespace(value... llama_model_loader: - kv 40: general.quantization_version u32 = 2 llama_model_loader: - kv 41: general.file_type u32 = 14 llama_model_loader: - type f32: 353 tensors llama_model_loader: - type q8_0: 96 tensors llama_model_loader: - type q4_K: 341 tensors llama_model_loader: - type q5_K: 60 tensors llama_model_loader: - type q6_K: 1 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Small print_info: file size = 14.68 GiB (4.69 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' llama_model_load_from_file_impl: failed to load model
Author
Owner

@j-wiedemann commented on GitHub (Feb 26, 2026):

I'm total noob on this...
I'm pretty sure qwen3.5:35b run well on gpu with 0.17.1-rc1 yesterday. I mainly use it with newelle.
But today I upgrade to 0.17.1 and either it's returning error 500 or it's just running on cpu.
Is there an easy way to downgrade to 0.17.1-rc1 ? I'm on ubuntu 25.10.

<!-- gh-comment-id:3966090969 --> @j-wiedemann commented on GitHub (Feb 26, 2026): I'm total noob on this... I'm pretty sure qwen3.5:35b run well on gpu with 0.17.1-rc1 yesterday. I mainly use it with newelle. But today I upgrade to 0.17.1 and either it's returning error 500 or it's just running on cpu. Is there an easy way to downgrade to 0.17.1-rc1 ? I'm on ubuntu 25.10.
Author
Owner

@MAWK0235 commented on GitHub (Feb 26, 2026):

current release of ollama fails to use qwen3.5 past a single chat before having memory issues. 500 err

( i meet qwen3.5 system requirements)

Image
<!-- gh-comment-id:3967280229 --> @MAWK0235 commented on GitHub (Feb 26, 2026): current release of ollama fails to use qwen3.5 past a single chat before having memory issues. 500 err ( i meet qwen3.5 system requirements) <img width="1549" height="361" alt="Image" src="https://github.com/user-attachments/assets/4fe712c2-8b86-4538-9727-aff4fc231228" />
Author
Owner

@Noyze-AI commented on GitHub (Feb 26, 2026):

Image

on version 0.17.1 5090dv2 win11

500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

<!-- gh-comment-id:3968092702 --> @Noyze-AI commented on GitHub (Feb 26, 2026): <img width="1160" height="695" alt="Image" src="https://github.com/user-attachments/assets/451b597f-aaf5-474d-9b80-872d9e3b7c09" /> on version 0.17.1 5090dv2 win11 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details
Author
Owner

@rick-github commented on GitHub (Feb 26, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:3968100894 --> @rick-github commented on GitHub (Feb 26, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@MAWK0235 commented on GitHub (Feb 26, 2026):

server logs only report "500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details" exception.

even with more verbosity I couldnt get more information from my system.

It looks like it panics and unloads the model from memory without any more information on my end

<!-- gh-comment-id:3968168374 --> @MAWK0235 commented on GitHub (Feb 26, 2026): server logs only report "500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details" exception. even with more verbosity I couldnt get more information from my system. It looks like it panics and unloads the model from memory without any more information on my end
Author
Owner

@MAWK0235 commented on GitHub (Feb 26, 2026):

I can triple check my log files when I get back to the workstation later

<!-- gh-comment-id:3968171914 --> @MAWK0235 commented on GitHub (Feb 26, 2026): I can triple check my log files when I get back to the workstation later
Author
Owner

@rick-github commented on GitHub (Feb 26, 2026):

@Noyze-AI Different problem, open a new issue.

<!-- gh-comment-id:3968425254 --> @rick-github commented on GitHub (Feb 26, 2026): @Noyze-AI Different problem, open a new issue.
Author
Owner

@sindab commented on GitHub (Feb 26, 2026):

PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama --version
ollama version is 0.17.1
PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M
Error: 500 Internal Server Error: unable to load model: C:\Users\Korisnik.ollama\models\blobs\sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17

<!-- gh-comment-id:3968689324 --> @sindab commented on GitHub (Feb 26, 2026): PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama --version ollama version is 0.17.1 PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M Error: 500 Internal Server Error: unable to load model: C:\Users\Korisnik\.ollama\models\blobs\sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17
Author
Owner

@rick-github commented on GitHub (Feb 26, 2026):

@sindab https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

<!-- gh-comment-id:3968694157 --> @rick-github commented on GitHub (Feb 26, 2026): @sindab https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035
Author
Owner

@joenorton commented on GitHub (Feb 26, 2026):

can confirm, qwen3.5:35b-a3b-q4_K_M throwing 500 error on ollama version 0.17.1

<!-- gh-comment-id:3968887770 --> @joenorton commented on GitHub (Feb 26, 2026): can confirm, qwen3.5:35b-a3b-q4_K_M throwing 500 error on ollama version 0.17.1
Author
Owner

@Matzebhv commented on GitHub (Feb 26, 2026):

Windows-Version 0.17.1 runs fine with qwen3.5:35b here. Connected frigate and homeassistant also, everything runs as aspected.
Ollama is running as a server on my ai max+ 395 64GB

Image Image Image
<!-- gh-comment-id:3968905293 --> @Matzebhv commented on GitHub (Feb 26, 2026): Windows-Version 0.17.1 runs fine with qwen3.5:35b here. Connected frigate and homeassistant also, everything runs as aspected. Ollama is running as a server on my ai max+ 395 64GB <img width="520" height="216" alt="Image" src="https://github.com/user-attachments/assets/ed5e8335-d0a3-407a-af49-9d29f8e43921" /> <img width="794" height="648" alt="Image" src="https://github.com/user-attachments/assets/05ebf55f-5a13-4281-8444-e2ee10de8176" /> <img width="799" height="849" alt="Image" src="https://github.com/user-attachments/assets/dad6c0b1-5647-4273-a694-171831f56ed5" />
Author
Owner

@j3g commented on GitHub (Feb 26, 2026):

if you are on macOS, this exact file work in LM Studio. From my research it is because llama.cpp is out of date (not latest) in ollama, and in LM Studio it is current. Since you have already downloaded the model, you can rename it with a symbolic link and LM Studio will discover it immediately. Notice ollama names files with sha hashes to make life annoying. Find the file that is 21GB.
PS Zed editor automagically found LM Studio local server and without any pain I was using the model. Adding it in OpenCode was easy also.

mkdir -p ~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF
ln -s ~/.ollama/models/blobs/sha256-e8c60ba898493e3b8141c287ecb016c9bcaa9d8e745775ef26cc81511945a673
~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF/Q4_K_M.gguf

<!-- gh-comment-id:3969170500 --> @j3g commented on GitHub (Feb 26, 2026): if you are on macOS, this exact file work in LM Studio. From my research it is because llama.cpp is out of date (not latest) in ollama, and in LM Studio it is current. Since you have already downloaded the model, you can rename it with a symbolic link and LM Studio will discover it immediately. Notice ollama names files with sha hashes to make life annoying. Find the file that is 21GB. PS Zed editor automagically found LM Studio local server and without any pain I was using the model. Adding it in OpenCode was easy also. > mkdir -p ~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF > ln -s ~/.ollama/models/blobs/sha256-e8c60ba898493e3b8141c287ecb016c9bcaa9d8e745775ef26cc81511945a673 \ > ~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF/Q4_K_M.gguf
Author
Owner

@rick-github commented on GitHub (Feb 26, 2026):

@iChristGit Different issue: #14444

<!-- gh-comment-id:3969776210 --> @rick-github commented on GitHub (Feb 26, 2026): @iChristGit Different issue: #14444
Author
Owner

@YJesus commented on GitHub (Feb 27, 2026):

qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync.

Is there any estimated timeline for the next sync? Thank you

<!-- gh-comment-id:3972313626 --> @YJesus commented on GitHub (Feb 27, 2026): > qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync. Is there any estimated timeline for the next sync? Thank you
Author
Owner

@SAXEM1997 commented on GitHub (Feb 27, 2026):

failed to load model with unknown model architecture: 'qwen35moe', ollama version 0.17.4
model is created from: https://huggingface.co/mradermacher/Qwen3.5-35B-A3B-heretic-GGUF/tree/main

time=2026-02-27T19:58:05.823+08:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\Users\59200\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 8834"
time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20
llama_model_loader: loaded meta data with 50 key-value pairs and 733 tensors from D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35moe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
llama_model_loader: - kv 5: general.name str = Qwen3.5 35B A3B Heretic
llama_model_loader: - kv 6: general.finetune str = heretic
llama_model_loader: - kv 7: general.basename str = Qwen3.5
llama_model_loader: - kv 8: general.size_label str = 35B-A3B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv 11: general.tags arr[str,5] = ["heretic", "uncensored", "decensored...
llama_model_loader: - kv 12: qwen35moe.block_count u32 = 40
llama_model_loader: - kv 13: qwen35moe.context_length u32 = 262144
llama_model_loader: - kv 14: qwen35moe.embedding_length u32 = 2048
llama_model_loader: - kv 15: qwen35moe.attention.head_count u32 = 16
llama_model_loader: - kv 16: qwen35moe.attention.head_count_kv u32 = 2
llama_model_loader: - kv 17: qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 18: qwen35moe.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 19: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 20: qwen35moe.expert_count u32 = 256
llama_model_loader: - kv 21: qwen35moe.expert_used_count u32 = 8
llama_model_loader: - kv 22: qwen35moe.attention.key_length u32 = 256
llama_model_loader: - kv 23: qwen35moe.attention.value_length u32 = 256
llama_model_loader: - kv 24: qwen35moe.expert_feed_forward_length u32 = 512
llama_model_loader: - kv 25: qwen35moe.expert_shared_feed_forward_length u32 = 512
llama_model_loader: - kv 26: qwen35moe.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 27: qwen35moe.ssm.state_size u32 = 128
llama_model_loader: - kv 28: qwen35moe.ssm.group_count u32 = 16
llama_model_loader: - kv 29: qwen35moe.ssm.time_step_rank u32 = 32
llama_model_loader: - kv 30: qwen35moe.ssm.inner_size u32 = 4096
llama_model_loader: - kv 31: qwen35moe.full_attention_interval u32 = 4
llama_model_loader: - kv 32: qwen35moe.rope.dimension_count u32 = 64
llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 248044
llama_model_loader: - kv 40: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 41: general.quantization_version u32 = 2
llama_model_loader: - kv 42: general.file_type u32 = 15
llama_model_loader: - kv 43: general.url str = https://huggingface.co/mradermacher/Q...
llama_model_loader: - kv 44: mradermacher.quantize_version str = 2
llama_model_loader: - kv 45: mradermacher.quantized_by str = mradermacher
llama_model_loader: - kv 46: mradermacher.quantized_at str = 2026-02-26T08:06:07+01:00
llama_model_loader: - kv 47: mradermacher.quantized_on str = nico1
llama_model_loader: - kv 48: general.source.url str = https://huggingface.co/brayniac/Qwen3...
llama_model_loader: - kv 49: mradermacher.convert_type str = hf
llama_model_loader: - type f32: 301 tensors
llama_model_loader: - type q4_K: 355 tensors
llama_model_loader: - type q5_K: 30 tensors
llama_model_loader: - type q6_K: 47 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 19.71 GiB (4.88 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file_impl: failed to load model
time=2026-02-27T19:58:06.124+08:00 level=INFO source=sched.go:473 msg="NewLlamaServer failed" model=D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 error="unable to load model: D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5"

<!-- gh-comment-id:3972695970 --> @SAXEM1997 commented on GitHub (Feb 27, 2026): failed to load model with unknown model architecture: 'qwen35moe', ollama version 0.17.4 model is created from: [https://huggingface.co/mradermacher/Qwen3.5-35B-A3B-heretic-GGUF/tree/main](url) > time=2026-02-27T19:58:05.823+08:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\59200\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 8834" time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20 llama_model_loader: loaded meta data with 50 key-value pairs and 733 tensors from D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen35moe llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.sampling.top_k i32 = 20 llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000 llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000 llama_model_loader: - kv 5: general.name str = Qwen3.5 35B A3B Heretic llama_model_loader: - kv 6: general.finetune str = heretic llama_model_loader: - kv 7: general.basename str = Qwen3.5 llama_model_loader: - kv 8: general.size_label str = 35B-A3B llama_model_loader: - kv 9: general.license str = apache-2.0 llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3... llama_model_loader: - kv 11: general.tags arr[str,5] = ["heretic", "uncensored", "decensored... llama_model_loader: - kv 12: qwen35moe.block_count u32 = 40 llama_model_loader: - kv 13: qwen35moe.context_length u32 = 262144 llama_model_loader: - kv 14: qwen35moe.embedding_length u32 = 2048 llama_model_loader: - kv 15: qwen35moe.attention.head_count u32 = 16 llama_model_loader: - kv 16: qwen35moe.attention.head_count_kv u32 = 2 llama_model_loader: - kv 17: qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0] llama_model_loader: - kv 18: qwen35moe.rope.freq_base f32 = 10000000.000000 llama_model_loader: - kv 19: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 20: qwen35moe.expert_count u32 = 256 llama_model_loader: - kv 21: qwen35moe.expert_used_count u32 = 8 llama_model_loader: - kv 22: qwen35moe.attention.key_length u32 = 256 llama_model_loader: - kv 23: qwen35moe.attention.value_length u32 = 256 llama_model_loader: - kv 24: qwen35moe.expert_feed_forward_length u32 = 512 llama_model_loader: - kv 25: qwen35moe.expert_shared_feed_forward_length u32 = 512 llama_model_loader: - kv 26: qwen35moe.ssm.conv_kernel u32 = 4 llama_model_loader: - kv 27: qwen35moe.ssm.state_size u32 = 128 llama_model_loader: - kv 28: qwen35moe.ssm.group_count u32 = 16 llama_model_loader: - kv 29: qwen35moe.ssm.time_step_rank u32 = 32 llama_model_loader: - kv 30: qwen35moe.ssm.inner_size u32 = 4096 llama_model_loader: - kv 31: qwen35moe.full_attention_interval u32 = 4 llama_model_loader: - kv 32: qwen35moe.rope.dimension_count u32 = 64 llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen35 llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 248046 llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 248044 llama_model_loader: - kv 40: tokenizer.chat_template str = {%- set image_count = namespace(value... llama_model_loader: - kv 41: general.quantization_version u32 = 2 llama_model_loader: - kv 42: general.file_type u32 = 15 llama_model_loader: - kv 43: general.url str = https://huggingface.co/mradermacher/Q... llama_model_loader: - kv 44: mradermacher.quantize_version str = 2 llama_model_loader: - kv 45: mradermacher.quantized_by str = mradermacher llama_model_loader: - kv 46: mradermacher.quantized_at str = 2026-02-26T08:06:07+01:00 llama_model_loader: - kv 47: mradermacher.quantized_on str = nico1 llama_model_loader: - kv 48: general.source.url str = https://huggingface.co/brayniac/Qwen3... llama_model_loader: - kv 49: mradermacher.convert_type str = hf llama_model_loader: - type f32: 301 tensors llama_model_loader: - type q4_K: 355 tensors llama_model_loader: - type q5_K: 30 tensors llama_model_loader: - type q6_K: 47 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 19.71 GiB (4.88 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model time=2026-02-27T19:58:06.124+08:00 level=INFO source=sched.go:473 msg="NewLlamaServer failed" model=D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 error="unable to load model: D:\\data\\ai\\models\\ollama\\blobs\\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5"
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

More information on HF qwen3.5 and ollama in #14503. Closing this as the initial problem of loading qwen35 is resolved by using 0.17.1+.

<!-- gh-comment-id:3974397919 --> @rick-github commented on GitHub (Feb 27, 2026): More information on HF qwen3.5 and ollama in #14503. Closing this as the initial problem of loading qwen35 is resolved by using 0.17.1+.
Author
Owner

@kuolung1 commented on GitHub (Mar 30, 2026):

same problem in 0.18.3 for load
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
ollama run hf.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q6_K

<!-- gh-comment-id:4152959253 --> @kuolung1 commented on GitHub (Mar 30, 2026): same problem in 0.18.3 for load llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' ollama run hf.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q6_K
Author
Owner

@jonmach commented on GitHub (Mar 31, 2026):

I get the same error on 0.19.0:

$ ollama run moophlo/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
Error: 500 Internal Server Error: unable to load model: /Users/xxx/.ollama/models/blobs/sha256-c071a7725ffcafb8c4faf41b3327f2cd5e162308bdd3431d6232d0405dc1d905

<!-- gh-comment-id:4164376218 --> @jonmach commented on GitHub (Mar 31, 2026): I get the same error on 0.19.0: $ ollama run moophlo/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF Error: 500 Internal Server Error: unable to load model: /Users/xxx/.ollama/models/blobs/sha256-c071a7725ffcafb8c4faf41b3327f2cd5e162308bdd3431d6232d0405dc1d905
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55873