[GH-ISSUE #14419] qwen3.5:35b Error: 500 Internal Server Error #55873

New Issue

GiteaMirror · 2026-04-29T09:50:18-05:00

GiteaMirror commented

2026-04-29 09:50:18 -05:00

Originally created by @shoque88 on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14419

What is the issue?

ollama run qwen3.5:35b
pulling manifest
pulling d838916ba05b: 100% ▕██████████████████████████████████████████████████████▏ 23 GB
pulling 7339fa418c9a: 100% ▕██████████████████████████████████████████████████████▏ 11 KB
pulling f6417cb1e269: 100% ▕██████████████████████████████████████████████████████▏ 42 B
pulling d6d7dd44ab69: 100% ▕██████████████████████████████████████████████████████▏ 502 B
verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: unable to load model

Relevant log output

OS

WSL

GPU

RTX 5090

CPU

R7 7800x3d

Ollama version

ollama version is 0.17.0

Originally created by @shoque88 on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14419 ### What is the issue? ollama run qwen3.5:35b pulling manifest pulling d838916ba05b: 100% ▕██████████████████████████████████████████████████████▏ 23 GB pulling 7339fa418c9a: 100% ▕██████████████████████████████████████████████████████▏ 11 KB pulling f6417cb1e269: 100% ▕██████████████████████████████████████████████████████▏ 42 B pulling d6d7dd44ab69: 100% ▕██████████████████████████████████████████████████████▏ 502 B verifying sha256 digest writing manifest success Error: 500 Internal Server Error: unable to load model ### Relevant log output ```shell ``` ### OS WSL ### GPU RTX 5090 ### CPU R7 7800x3d ### Ollama version ollama version is 0.17.0

GiteaMirror added the bug label 2026-04-29 09:50:18 -05:00

GiteaMirror closed this issue

2026-04-29 09:50:20 -05:00

GiteaMirror commented

2026-04-29 09:50:22 -05:00

@wangzhongliang commented on GitHub (Feb 25, 2026):

same issue here

@wangzhongliang commented on GitHub (Feb 25, 2026): same issue here

GiteaMirror commented

2026-04-29 09:50:23 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

0.17.1 for qwen3.5 support.

@rick-github commented on GitHub (Feb 25, 2026): 0.17.1 for qwen3.5 support.

GiteaMirror commented

2026-04-29 09:50:23 -05:00

@wangzhongliang commented on GitHub (Feb 25, 2026):

0.17.1 for qwen3.5 support.

I solve it by installing 0.17.1-rc1

@wangzhongliang commented on GitHub (Feb 25, 2026): > 0.17.1 for qwen3.5 support. I solve it by installing 0.17.1-rc1

GiteaMirror commented

2026-04-29 09:50:24 -05:00

@shoque88 commented on GitHub (Feb 25, 2026):

when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0
how should I update to 0.17.1?

@shoque88 commented on GitHub (Feb 25, 2026): when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0 how should I update to 0.17.1?

GiteaMirror commented

2026-04-29 09:50:24 -05:00

@rawzone commented on GitHub (Feb 25, 2026):

when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0 how should I update to 0.17.1?

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc1 sh

@shoque88 this seems to install a select version.

@rawzone commented on GitHub (Feb 25, 2026): > when I run this command: curl -fsSL https://ollama.com/install.sh | sh it installs 0.17.0 how should I update to 0.17.1? `curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc1 sh` @shoque88 this seems to install a select version.

GiteaMirror commented

2026-04-29 09:50:25 -05:00

@shoque88 commented on GitHub (Feb 25, 2026):

Thank you, solved

@shoque88 commented on GitHub (Feb 25, 2026): Thank you, solved

GiteaMirror commented

2026-04-29 09:50:25 -05:00

@SebastianGode commented on GitHub (Feb 25, 2026):

It will not work with ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine.

@SebastianGode commented on GitHub (Feb 25, 2026): It will not work with `ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL` sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine.

GiteaMirror commented

2026-04-29 09:50:26 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync.

@rick-github commented on GitHub (Feb 25, 2026): qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync.

GiteaMirror commented

2026-04-29 09:50:26 -05:00

@richardng1505 commented on GitHub (Feb 25, 2026):

use https://github.com/ollama/ollama/releases/tag/v0.17.1-rc1 it works

@richardng1505 commented on GitHub (Feb 25, 2026): use https://github.com/ollama/ollama/releases/tag/v0.17.1-rc1 it works

GiteaMirror commented

2026-04-29 09:50:27 -05:00

@Snify89 commented on GitHub (Feb 25, 2026):

Edit: My context size was too high I guess. After lowering it, it worked

I can't get it to work with 0.17.1-rc1 (Qwen3.5 35B A3B Q4_K_M):

time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:20000 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[moz-extension://* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false"
time=2026-02-25T14:12:06.440Z level=INFO source=images.go:473 msg="total blobs: 33"
time=2026-02-25T14:12:06.440Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-25T14:12:06.441Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.1-rc1)"
time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1"
time=2026-02-25T14:12:06.442Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39011"
time=2026-02-25T14:12:06.665Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36591"
time=2026-02-25T14:12:06.877Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38485"
time=2026-02-25T14:12:07.098Z level=INFO source=types.go:42 msg="inference compute" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:06:00.0 type=discrete total="12.0 GiB" available="11.6 GiB"
time=2026-02-25T14:12:07.098Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096
[GIN] 2026/02/25 - 14:12:07 | 200 | 2.290428ms | 127.0.0.1 | GET "/v1/models"
time=2026-02-25T14:12:07.321Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45629"
time=2026-02-25T14:12:07.543Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing "max": invalid syntax"
time=2026-02-25T14:12:07.724Z level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-02-25T14:12:07.724Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a --port 34401"
time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:491 msg="system memory" total="15.5 GiB" free="15.3 GiB" free_swap="0 B"
time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:498 msg="gpu memory" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 library=CUDA available="11.2 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-25T14:12:07.724Z level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:34401"
time=2026-02-25T14:12:07.746Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:07.816Z level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-02-25T14:12:07.985Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-02-25T14:12:09.210Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:09.897Z level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:11.218Z level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:482 msg="offloading 18 repeating layers to GPU"
time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:494 msg="offloaded 18/41 layers to GPU"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.1 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:245 msg="model weights" device=CPU size="13.1 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="878.0 MiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.1 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="916.2 MiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="214.6 MiB"
time=2026-02-25T14:12:11.218Z level=INFO source=device.go:272 msg="total memory" size="25.2 GiB"
time=2026-02-25T14:12:11.218Z level=INFO source=sched.go:566 msg="loaded runners" count=1
time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-25T14:12:20.980Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding"
time=2026-02-25T14:12:22.165Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error"
time=2026-02-25T14:12:22.415Z level=ERROR source=sched.go:572 msg="error loading llama server" error="llama runner process has terminated: signal: killed"
[GIN] 2026/02/25 - 14:12:22 | 500 | 15.318145428s | 192.168.178.48 | POST "/v1/chat/completions"
time=2026-02-25T14:12:25.419Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38265"
time=2026-02-25T14:12:26.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37819"
time=2026-02-25T14:12:26.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35919"
time=2026-02-25T14:12:26.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44635"
time=2026-02-25T14:12:26.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44417"
time=2026-02-25T14:12:27.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46607"
time=2026-02-25T14:12:27.308Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40303"
time=2026-02-25T14:12:27.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35357"
time=2026-02-25T14:12:27.809Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38131"
time=2026-02-25T14:12:28.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43583"
time=2026-02-25T14:12:28.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43735"
time=2026-02-25T14:12:28.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40861"
time=2026-02-25T14:12:28.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33921"
time=2026-02-25T14:12:29.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34741"
time=2026-02-25T14:12:29.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37543"
time=2026-02-25T14:12:29.558Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41577"
time=2026-02-25T14:12:29.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40053"
time=2026-02-25T14:12:30.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37391"
time=2026-02-25T14:12:30.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45947"
time=2026-02-25T14:12:30.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42859"
time=2026-02-25T14:12:30.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38487"
time=2026-02-25T14:12:30.809Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2026-02-25T14:12:30.809Z level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"

@Snify89 commented on GitHub (Feb 25, 2026): Edit: My context size was too high I guess. After lowering it, it worked I can't get it to work with 0.17.1-rc1 (Qwen3.5 35B A3B Q4_K_M): > time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1663 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:20000 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[moz-extension://* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" > time=2026-02-25T14:12:06.438Z level=INFO source=routes.go:1665 msg="Ollama cloud disabled: false" > time=2026-02-25T14:12:06.440Z level=INFO source=images.go:473 msg="total blobs: 33" > time=2026-02-25T14:12:06.440Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" > time=2026-02-25T14:12:06.441Z level=INFO source=routes.go:1718 msg="Listening on [::]:11434 (version 0.17.1-rc1)" > time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:67 msg="discovering available GPUs..." > time=2026-02-25T14:12:06.441Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" > time=2026-02-25T14:12:06.442Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39011" > time=2026-02-25T14:12:06.665Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36591" > time=2026-02-25T14:12:06.877Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38485" > time=2026-02-25T14:12:07.098Z level=INFO source=types.go:42 msg="inference compute" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:06:00.0 type=discrete total="12.0 GiB" available="11.6 GiB" > time=2026-02-25T14:12:07.098Z level=INFO source=routes.go:1768 msg="vram-based default context" total_vram="12.0 GiB" default_num_ctx=4096 > [GIN] 2026/02/25 - 14:12:07 | 200 | 2.290428ms | 127.0.0.1 | GET "/v1/models" > time=2026-02-25T14:12:07.321Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45629" > time=2026-02-25T14:12:07.543Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" > time=2026-02-25T14:12:07.724Z level=INFO source=server.go:247 msg="enabling flash attention" > time=2026-02-25T14:12:07.724Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a --port 34401" > time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:491 msg="system memory" total="15.5 GiB" free="15.3 GiB" free_swap="0 B" > time=2026-02-25T14:12:07.724Z level=INFO source=sched.go:498 msg="gpu memory" id=GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 library=CUDA available="11.2 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B" > time=2026-02-25T14:12:07.724Z level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1 > time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1411 msg="starting ollama engine" > time=2026-02-25T14:12:07.737Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:34401" > time=2026-02-25T14:12:07.746Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:07.816Z level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56 > load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so > ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no > ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no > ggml_cuda_init: found 1 CUDA devices: > Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 > load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so > time=2026-02-25T14:12:07.985Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) > time=2026-02-25T14:12:09.210Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:09.897Z level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:11.218Z level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:20000 KvCacheType: NumThreads:8 GPULayers:18[ID:GPU-6c444ebc-808c-b9ad-b300-e549f92e6ba2 Layers:18(22..39)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:482 msg="offloading 18 repeating layers to GPU" > time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU" > time=2026-02-25T14:12:11.218Z level=INFO source=ggml.go:494 msg="offloaded 18/41 layers to GPU" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="9.1 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:245 msg="model weights" device=CPU size="13.1 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="878.0 MiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="1.1 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="916.2 MiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="214.6 MiB" > time=2026-02-25T14:12:11.218Z level=INFO source=device.go:272 msg="total memory" size="25.2 GiB" > time=2026-02-25T14:12:11.218Z level=INFO source=sched.go:566 msg="loaded runners" count=1 > time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" > time=2026-02-25T14:12:11.218Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" > time=2026-02-25T14:12:20.980Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding" > time=2026-02-25T14:12:22.165Z level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error" > time=2026-02-25T14:12:22.415Z level=ERROR source=sched.go:572 msg="error loading llama server" error="llama runner process has terminated: signal: killed" > [GIN] 2026/02/25 - 14:12:22 | 500 | 15.318145428s | 192.168.178.48 | POST "/v1/chat/completions" > time=2026-02-25T14:12:25.419Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38265" > time=2026-02-25T14:12:26.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37819" > time=2026-02-25T14:12:26.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35919" > time=2026-02-25T14:12:26.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44635" > time=2026-02-25T14:12:26.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44417" > time=2026-02-25T14:12:27.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 46607" > time=2026-02-25T14:12:27.308Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40303" > time=2026-02-25T14:12:27.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35357" > time=2026-02-25T14:12:27.809Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38131" > time=2026-02-25T14:12:28.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43583" > time=2026-02-25T14:12:28.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43735" > time=2026-02-25T14:12:28.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40861" > time=2026-02-25T14:12:28.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33921" > time=2026-02-25T14:12:29.058Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34741" > time=2026-02-25T14:12:29.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37543" > time=2026-02-25T14:12:29.558Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41577" > time=2026-02-25T14:12:29.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40053" > time=2026-02-25T14:12:30.059Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37391" > time=2026-02-25T14:12:30.309Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45947" > time=2026-02-25T14:12:30.559Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42859" > time=2026-02-25T14:12:30.808Z level=INFO source=server.go:431 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38487" > time=2026-02-25T14:12:30.809Z level=INFO source=runner.go:464 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] error="failed to finish discovery before timeout" > time=2026-02-25T14:12:30.809Z level=WARN source=runner.go:356 msg="unable to refresh free memory, using old values"

GiteaMirror commented

2026-04-29 09:50:28 -05:00

@sleeplessai commented on GitHub (Feb 25, 2026):

verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: unable to load model: C:\Users\<username>\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a

@richardng1505 thanks for solution. the 0.17.1-rc1 works well.

@sleeplessai commented on GitHub (Feb 25, 2026): ``` verifying sha256 digest writing manifest success Error: 500 Internal Server Error: unable to load model: C:\Users\<username>\.ollama\models\blobs\sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a ``` @richardng1505 thanks for solution. the 0.17.1-rc1 works well.

GiteaMirror commented

2026-04-29 09:50:30 -05:00

@rluisr commented on GitHub (Feb 25, 2026):

hmm...

root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17
root@octa:~# vim /etc/systemd/system/ollama.service
root@octa:~# /usr/local/bin/ollama --version
ollama version is 0.17.1-rc1

@rluisr commented on GitHub (Feb 25, 2026): hmm... ``` root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17 root@octa:~# vim /etc/systemd/system/ollama.service root@octa:~# /usr/local/bin/ollama --version ollama version is 0.17.1-rc1 ```

GiteaMirror commented

2026-04-29 09:50:31 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

@rluisr https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

@rick-github commented on GitHub (Feb 25, 2026): @rluisr https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

GiteaMirror commented

2026-04-29 09:50:31 -05:00

@pitchinnate commented on GitHub (Feb 25, 2026):

Running ollama version is 0.17.1-rc1 I tried to run the 27B and 35B variations both got 500 error. Not sure if this is helpful from the logs.

ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M

print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 15.58 GiB (4.98 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model

ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M

print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 19.74 GiB (4.89 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file_impl: failed to load model

@pitchinnate commented on GitHub (Feb 25, 2026): Running `ollama version is 0.17.1-rc1` I tried to run the 27B and 35B variations both got 500 error. Not sure if this is helpful from the logs. `ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M` ``` print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 15.58 GiB (4.98 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' llama_model_load_from_file_impl: failed to load model ``` `ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M` ``` print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 19.74 GiB (4.89 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model ```

GiteaMirror commented

2026-04-29 09:50:32 -05:00

@SebastianGode commented on GitHub (Feb 25, 2026):

@pitchinnate scroll a bit up here :)

@SebastianGode commented on GitHub (Feb 25, 2026): @pitchinnate scroll a bit up here :)

GiteaMirror commented

2026-04-29 09:50:32 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

@pitchinnate https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

@rick-github commented on GitHub (Feb 25, 2026): @pitchinnate https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

GiteaMirror commented

2026-04-29 09:50:33 -05:00

@sleeplessai commented on GitHub (Feb 25, 2026):

hmm...

root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17
root@octa:~# vim /etc/systemd/system/ollama.service
root@octa:~# /usr/local/bin/ollama --version
ollama version is 0.17.1-rc1

Only work with
ollama run qwen3.5:35b

@sleeplessai commented on GitHub (Feb 25, 2026): > hmm... > ``` > root@octa:~# ollama run hf.co/unsloth/Qwen3.5-27B-GGUF > Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17 > root@octa:~# vim /etc/systemd/system/ollama.service > root@octa:~# /usr/local/bin/ollama --version > ollama version is 0.17.1-rc1 > ``` Only work with `ollama run qwen3.5:35b`

GiteaMirror commented

2026-04-29 09:50:33 -05:00

@pitchinnate commented on GitHub (Feb 25, 2026):

Sorry I misunderstood this comment then:

It will not work with ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine.

I thought they were saying the hf Q4_K_M variant worked.

@pitchinnate commented on GitHub (Feb 25, 2026): Sorry I misunderstood this comment then: ``` It will not work with ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL sadly, even on 0.17.1-rc1 yet. Only the Q4_K_M variant works fine. ``` I thought they were saying the hf Q4_K_M variant worked.

GiteaMirror commented

2026-04-29 09:50:34 -05:00

@filmo commented on GitHub (Feb 25, 2026):

I'm getting a very similar error with RTX-3090s. I've 'uninstalled' it via 'ollama rm' and by manually deleting the sha file in models and then re-installing. Nether solved the problem:

Operating System: Debian GNU/Linux 12 (bookworm)  
          Kernel: Linux 6.8.12-11-pve
    Architecture: x86-64


root@openwebui:~# ollama -v
ollama version is 0.17.0

root@openwebui:~# nvidia-smi 
Wed Feb 25 10:22:35 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   37C    P8             25W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  |   00000000:05:00.0 Off |                  N/A |
| 30%   21C    P8             31W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        On  |   00000000:08:00.0 Off |                  N/A |
| 33%   22C    P8             20W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        On  |   00000000:09:00.0 Off |                  N/A |
| 32%   21C    P8             20W /  300W |       4MiB /  24576MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

root@openwebui:~/.ollama/models/blobs# ollama pull qwen3.5:35b
pulling manifest 
pulling d838916ba05b: 100%                 
verifying sha256 digest 
writing manifest 
success 

root@openwebui:~/.ollama/models/blobs# ollama run qwen3.5:35b
Error: 500 Internal Server Error: unable to load model: /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a

I have not tried release candidate 0.17.1

@filmo commented on GitHub (Feb 25, 2026): I'm getting a very similar error with RTX-3090s. I've 'uninstalled' it via 'ollama rm' and by manually deleting the sha file in models and then re-installing. Nether solved the problem: ``` Operating System: Debian GNU/Linux 12 (bookworm) Kernel: Linux 6.8.12-11-pve Architecture: x86-64 root@openwebui:~# ollama -v ollama version is 0.17.0 root@openwebui:~# nvidia-smi Wed Feb 25 10:22:35 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A | | 0% 37C P8 25W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 On | 00000000:05:00.0 Off | N/A | | 30% 21C P8 31W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA GeForce RTX 3090 On | 00000000:08:00.0 Off | N/A | | 33% 22C P8 20W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA GeForce RTX 3090 On | 00000000:09:00.0 Off | N/A | | 32% 21C P8 20W / 300W | 4MiB / 24576MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ root@openwebui:~/.ollama/models/blobs# ollama pull qwen3.5:35b pulling manifest pulling d838916ba05b: 100% verifying sha256 digest writing manifest success root@openwebui:~/.ollama/models/blobs# ollama run qwen3.5:35b Error: 500 Internal Server Error: unable to load model: /root/.ollama/models/blobs/sha256-d838916ba05b9d908e9c3fecf16273b942a99aae94d1725c3e9fdd772522cf1a ``` ### I have not tried release candidate 0.17.1 ###

GiteaMirror commented

2026-04-29 09:50:35 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

@filmo https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

@rick-github commented on GitHub (Feb 25, 2026): @filmo https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

GiteaMirror commented

2026-04-29 09:50:35 -05:00

@davidvv commented on GitHub (Feb 25, 2026):

Same issue here on macOS with Ollama 0.17.0 and Apple M4 Pro. Getting Error: 500 Internal Server Error: unable to load model when trying to run qwen3.5:35b. The model file exists but fails to load. Other models like glm-4.7-flash work fine.

@davidvv commented on GitHub (Feb 25, 2026): Same issue here on macOS with Ollama 0.17.0 and Apple M4 Pro. Getting Error: 500 Internal Server Error: unable to load model when trying to run qwen3.5:35b. The model file exists but fails to load. Other models like glm-4.7-flash work fine.

GiteaMirror commented

2026-04-29 09:50:36 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

@davidvv https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

@rick-github commented on GitHub (Feb 25, 2026): @davidvv https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

GiteaMirror commented

2026-04-29 09:50:37 -05:00

@VideoFX commented on GitHub (Feb 25, 2026):

also, I can't disable thinking. 0.17.1-rc1

@VideoFX commented on GitHub (Feb 25, 2026): also, I can't disable thinking. 0.17.1-rc1

GiteaMirror commented

2026-04-29 09:50:38 -05:00

@Avaruz commented on GitHub (Feb 25, 2026):

I got this error:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file_impl: failed to load model

ollama version is 0.17.0
2 NVIDIA GTX3090 24 GB each

@Avaruz commented on GitHub (Feb 25, 2026): I got this error: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model ollama version is 0.17.0 2 NVIDIA GTX3090 24 GB each

GiteaMirror commented

2026-04-29 09:50:39 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

@Avaruz https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

@rick-github commented on GitHub (Feb 25, 2026): @Avaruz https://github.com/ollama/ollama/issues/14419#issuecomment-3958763362

GiteaMirror commented

2026-04-29 09:50:41 -05:00

@alnaranjo commented on GitHub (Feb 25, 2026):

also, I can't disable thinking. 0.17.1-rc1

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

@alnaranjo commented on GitHub (Feb 25, 2026): > also, I can't disable thinking. 0.17.1-rc1 curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

GiteaMirror commented

2026-04-29 09:50:42 -05:00

@VideoFX commented on GitHub (Feb 25, 2026):

also, I can't disable thinking. 0.17.1-rc1

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

Thank you, I see updates 2 hours ago.

@VideoFX commented on GitHub (Feb 25, 2026): > > also, I can't disable thinking. 0.17.1-rc1 > > curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh Thank you, I see updates 2 hours ago.

GiteaMirror commented

2026-04-29 09:50:43 -05:00

@YJesus commented on GitHub (Feb 25, 2026):

ollama --version
ollama version is 0.17.1-rc2

ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q6_K
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-b0700c220e072828ea194df6f5679adda6ffaf165b13cd3be91834efdd360361

ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:IQ4_XS
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-4ddc2fdfa0a6825967bbc1d3bddf703045523df60bbc26ba1f30db181850e4a7

@YJesus commented on GitHub (Feb 25, 2026): ollama --version ollama version is 0.17.1-rc2 ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q6_K Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-b0700c220e072828ea194df6f5679adda6ffaf165b13cd3be91834efdd360361 ollama run hf.co/unsloth/Qwen3.5-35B-A3B-GGUF:IQ4_XS Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-4ddc2fdfa0a6825967bbc1d3bddf703045523df60bbc26ba1f30db181850e4a7

GiteaMirror commented

2026-04-29 09:50:43 -05:00

@rick-github commented on GitHub (Feb 25, 2026):

@YJesus https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

@rick-github commented on GitHub (Feb 25, 2026): @YJesus https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

GiteaMirror commented

2026-04-29 09:50:44 -05:00

@VideoFX commented on GitHub (Feb 26, 2026):

also, I can't disable thinking. 0.17.1-rc1

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh

Thank you, I see updates 2 hours ago.

I was able to disable thinking after updating to "qwen3.5:35b-a3b-q4_K_M" and 0.17.1-rc2

@VideoFX commented on GitHub (Feb 26, 2026): > > > also, I can't disable thinking. 0.17.1-rc1 > > > > > > curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.17.1-rc2 sh > > Thank you, I see updates 2 hours ago. I was able to disable thinking after updating to "qwen3.5:35b-a3b-q4_K_M" and 0.17.1-rc2

GiteaMirror commented

2026-04-29 09:50:44 -05:00

@0x7CFE commented on GitHub (Feb 26, 2026):

In my case 0.17.1-rc2 and more recent 0.17.1 both crash when trying to use the model with OLLAMA_VULKAN=1. CPU only works fine.

$ ollama --version
ollama version is 0.17.1
$ ollama run qwen3.5:35b
>>> /set verbose
Set 'verbose' mode.
>>> Tell me about yourself.
Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

фев 26 10:58:04 fw13 ollama[878128]: SIGSEGV: segmentation violation
фев 26 10:58:04 fw13 ollama[878128]: PC=0x72394d6f057f m=19 sigcode=1 addr=0x669000
фев 26 10:58:04 fw13 ollama[878128]: signal arrived during cgo execution
фев 26 10:58:04 fw13 ollama[878128]: goroutine 2160 gp=0xc000f22c40 m=19 mp=0xc00058c808 [syscall]:
фев 26 10:58:04 fw13 ollama[878128]: runtime.cgocall(0x6143cf309330, 0xc00175daa0)
фев 26 10:58:04 fw13 ollama[878128]:         runtime/cgocall.go:167 +0x4b fp=0xc00175da78 sp=0xc00175da40 pc=0x6143ce3bba6b
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x614400d23090, 0x723894001c20)
фев 26 10:58:04 fw13 ollama[878128]:         _cgo_gotypes.go:979 +0x4a fp=0xc00175daa0 sp=0xc00175da78 pc=0x6143ce8a6b0a
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...)
фев 26 10:58:04 fw13 ollama[878128]:         github.com/ollama/ollama/ml/backend/ggml/ggml.go:825
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc000f77580, 0xc00164c280?, {0xc00185f7b0, 0x1, 0x2?})
фев 26 10:58:04 fw13 ollama[878128]:         github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b2 fp=0xc00175db78 sp=0xc00175daa0 pc=0x6143ce8b5492
фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00023b0e0, {0x0, {0x6143cfc609d0, 0xc000f77580}, {0x6143cfc6de30, 0xc00174e048}, {0xc0016e8f00, 0xf, 0x10}, {{0x6143cfc6de30, ...}, ...}, ...})
фев 26 10:58:04 fw13 ollama[878128]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:716 +0x862 fp=0xc00175def0 sp=0xc00175db78 pc=0x6143ce9e0282

@0x7CFE commented on GitHub (Feb 26, 2026): In my case `0.17.1-rc2` and more recent `0.17.1` both crash when trying to use the model with `OLLAMA_VULKAN=1`. CPU only works fine. ``` $ ollama --version ollama version is 0.17.1 $ ollama run qwen3.5:35b >>> /set verbose Set 'verbose' mode. >>> Tell me about yourself. Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details ``` ``` фев 26 10:58:04 fw13 ollama[878128]: SIGSEGV: segmentation violation фев 26 10:58:04 fw13 ollama[878128]: PC=0x72394d6f057f m=19 sigcode=1 addr=0x669000 фев 26 10:58:04 fw13 ollama[878128]: signal arrived during cgo execution фев 26 10:58:04 fw13 ollama[878128]: goroutine 2160 gp=0xc000f22c40 m=19 mp=0xc00058c808 [syscall]: фев 26 10:58:04 fw13 ollama[878128]: runtime.cgocall(0x6143cf309330, 0xc00175daa0) фев 26 10:58:04 fw13 ollama[878128]: runtime/cgocall.go:167 +0x4b fp=0xc00175da78 sp=0xc00175da40 pc=0x6143ce3bba6b фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x614400d23090, 0x723894001c20) фев 26 10:58:04 fw13 ollama[878128]: _cgo_gotypes.go:979 +0x4a fp=0xc00175daa0 sp=0xc00175da78 pc=0x6143ce8a6b0a фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...) фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc000f77580, 0xc00164c280?, {0xc00185f7b0, 0x1, 0x2?}) фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b2 fp=0xc00175db78 sp=0xc00175daa0 pc=0x6143ce8b5492 фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc00023b0e0, {0x0, {0x6143cfc609d0, 0xc000f77580}, {0x6143cfc6de30, 0xc00174e048}, {0xc0016e8f00, 0xf, 0x10}, {{0x6143cfc6de30, ...}, ...}, ...}) фев 26 10:58:04 fw13 ollama[878128]: github.com/ollama/ollama/runner/ollamarunner/runner.go:716 +0x862 fp=0xc00175def0 sp=0xc00175db78 pc=0x6143ce9e0282 ```

GiteaMirror commented

2026-04-29 09:50:45 -05:00

@yezhoujie commented on GitHub (Feb 26, 2026):

Same error to me on MacBook Pro M3 MAX 36G

ollama run qwen3.5-27b:latest
Error: 500 Internal Server Error: unable to load model: /Users/yzj/.ollama/models/blobs/sha256-4086d669ca6a7a8777ae8fd8506bf971abd2aece175b65354c63dffb6cc574f2

@yezhoujie commented on GitHub (Feb 26, 2026): Same error to me on MacBook Pro M3 MAX 36G ``` ollama run qwen3.5-27b:latest Error: 500 Internal Server Error: unable to load model: /Users/yzj/.ollama/models/blobs/sha256-4086d669ca6a7a8777ae8fd8506bf971abd2aece175b65354c63dffb6cc574f2 ```

GiteaMirror commented

2026-04-29 09:50:45 -05:00

@j3g commented on GitHub (Feb 26, 2026):

on version 0.17.1

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'

@j3g commented on GitHub (Feb 26, 2026): on version 0.17.1 > llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'

GiteaMirror commented

2026-04-29 09:50:46 -05:00

@trollize commented on GitHub (Feb 26, 2026):

Do you have the mmproj projection file for Qwen3.5-27B in same directory?

mmproj-BF16.gguf

I don't have it, and it shows in error log:

time=2026-02-25T13:39:54.263-05:00 level=INFO source=sched.go:498 msg="gpu memory" id=GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 library=CUDA available="14.3 GiB" free="14.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-25T13:39:54.263-05:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1
time=2026-02-25T13:39:54.300-05:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-02-25T13:39:54.301-05:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:51684"
time=2026-02-25T13:39:54.307-05:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-25T13:39:54.333-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56
////
[GIN] 2026/02/25 - 13:52:05 | 200 | 15.2596ms | 10.0.0.154 | GET "/api/tags"
[GIN] 2026/02/25 - 13:52:11 | 200 | 0s | 10.0.0.154 | HEAD "/"
[GIN] 2026/02/25 - 13:52:11 | 404 | 7.8041ms | 10.0.0.154 | POST "/api/show"
time=2026-02-25T13:52:12.265-05:00 level=INFO source=download.go:179 msg="downloading a98ab071b984 in 16 979 MB part(s)"
[GIN] 2026/02/25 - 13:52:26 | 204 | 0s | 10.0.0.154 | OPTIONS "/api/chat"
time=2026-02-25T13:52:26.776-05:00 level=WARN source=types.go:976 msg="invalid option provided" option=keep_alive
time=2026-02-25T13:52:26.803-05:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\Users\Admin\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 65050"
time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=20 efficiency=12 threads=20
llama_model_loader: loaded meta data with 42 key-value pairs and 851 tensors from E:\Users\AI.ollama\models\blobs\sha256-eefa528ef8e8948a3b177e562cbd7554c8536914fd9278ac4eaf83bcceff1b27 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000
llama_model_loader: - kv 5: general.name str = Qwen3.5-27B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 27B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.tags arr[str,1] = ["image-text-to-text"]
llama_model_loader: - kv 13: qwen35.block_count u32 = 64
llama_model_loader: - kv 14: qwen35.context_length u32 = 262144
llama_model_loader: - kv 15: qwen35.embedding_length u32 = 5120
llama_model_loader: - kv 16: qwen35.feed_forward_length u32 = 17408
llama_model_loader: - kv 17: qwen35.attention.head_count u32 = 24
llama_model_loader: - kv 18: qwen35.attention.head_count_kv u32 = 4
llama_model_loader: - kv 19: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 20: qwen35.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 21: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 22: qwen35.attention.key_length u32 = 256
llama_model_loader: - kv 23: qwen35.attention.value_length u32 = 256
llama_model_loader: - kv 24: qwen35.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 25: qwen35.ssm.state_size u32 = 128
llama_model_loader: - kv 26: qwen35.ssm.group_count u32 = 16
llama_model_loader: - kv 27: qwen35.ssm.time_step_rank u32 = 48
llama_model_loader: - kv 28: qwen35.ssm.inner_size u32 = 6144
llama_model_loader: - kv 29: qwen35.full_attention_interval u32 = 4
llama_model_loader: - kv 30: qwen35.rope.dimension_count u32 = 64
llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 32: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 36: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 37: tokenizer.ggml.padding_token_id u32 = 248044
llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 39: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 40: general.quantization_version u32 = 2
llama_model_loader: - kv 41: general.file_type u32 = 14
llama_model_loader: - type f32: 353 tensors
llama_model_loader: - type q8_0: 96 tensors
llama_model_loader: - type q4_K: 341 tensors
llama_model_loader: - type q5_K: 60 tensors
llama_model_loader: - type q6_K: 1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Small
print_info: file size = 14.68 GiB (4.69 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model

@trollize commented on GitHub (Feb 26, 2026): ### Do you have the mmproj projection file for Qwen3.5-27B in same directory? [mmproj-BF16.gguf](https://huggingface.co/unsloth/Qwen3.5-27B-GGUF/blob/main/mmproj-BF16.gguf) I don't have it, and it shows in error log: time=2026-02-25T13:39:54.263-05:00 level=INFO source=sched.go:498 msg="gpu memory" id=GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 library=CUDA available="14.3 GiB" free="14.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-25T13:39:54.263-05:00 level=INFO source=server.go:757 msg="loading model" "model layers"=41 requested=-1 time=2026-02-25T13:39:54.300-05:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-02-25T13:39:54.301-05:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:51684" time=2026-02-25T13:39:54.307-05:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:41[ID:GPU-24ceeb4c-8772-4bc9-7d0a-b91e6f600874 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-25T13:39:54.333-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1959 num_key_values=56 //// [GIN] 2026/02/25 - 13:52:05 | 200 | 15.2596ms | 10.0.0.154 | GET "/api/tags" [GIN] 2026/02/25 - 13:52:11 | 200 | 0s | 10.0.0.154 | HEAD "/" [GIN] 2026/02/25 - 13:52:11 | 404 | 7.8041ms | 10.0.0.154 | POST "/api/show" time=2026-02-25T13:52:12.265-05:00 level=INFO source=download.go:179 msg="downloading a98ab071b984 in 16 979 MB part(s)" [GIN] 2026/02/25 - 13:52:26 | 204 | 0s | 10.0.0.154 | OPTIONS "/api/chat" time=2026-02-25T13:52:26.776-05:00 level=WARN source=types.go:976 msg="invalid option provided" option=keep_alive time=2026-02-25T13:52:26.803-05:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\Admin\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 65050" time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-02-25T13:52:26.975-05:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=20 efficiency=12 threads=20 llama_model_loader: loaded meta data with 42 key-value pairs and 851 tensors from E:\Users\AI\.ollama\models\blobs\sha256-eefa528ef8e8948a3b177e562cbd7554c8536914fd9278ac4eaf83bcceff1b27 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen35 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.sampling.top_k i32 = 20 llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000 llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000 llama_model_loader: - kv 5: general.name str = Qwen3.5-27B llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B llama_model_loader: - kv 7: general.quantized_by str = Unsloth llama_model_loader: - kv 8: general.size_label str = 27B llama_model_loader: - kv 9: general.license str = apache-2.0 llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2... llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth llama_model_loader: - kv 12: general.tags arr[str,1] = ["image-text-to-text"] llama_model_loader: - kv 13: qwen35.block_count u32 = 64 llama_model_loader: - kv 14: qwen35.context_length u32 = 262144 llama_model_loader: - kv 15: qwen35.embedding_length u32 = 5120 llama_model_loader: - kv 16: qwen35.feed_forward_length u32 = 17408 llama_model_loader: - kv 17: qwen35.attention.head_count u32 = 24 llama_model_loader: - kv 18: qwen35.attention.head_count_kv u32 = 4 llama_model_loader: - kv 19: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0] llama_model_loader: - kv 20: qwen35.rope.freq_base f32 = 10000000.000000 llama_model_loader: - kv 21: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: qwen35.attention.key_length u32 = 256 llama_model_loader: - kv 23: qwen35.attention.value_length u32 = 256 llama_model_loader: - kv 24: qwen35.ssm.conv_kernel u32 = 4 llama_model_loader: - kv 25: qwen35.ssm.state_size u32 = 128 llama_model_loader: - kv 26: qwen35.ssm.group_count u32 = 16 llama_model_loader: - kv 27: qwen35.ssm.time_step_rank u32 = 48 llama_model_loader: - kv 28: qwen35.ssm.inner_size u32 = 6144 llama_model_loader: - kv 29: qwen35.full_attention_interval u32 = 4 llama_model_loader: - kv 30: qwen35.rope.dimension_count u32 = 64 llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 32: tokenizer.ggml.pre str = qwen35 llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 36: tokenizer.ggml.eos_token_id u32 = 248046 llama_model_loader: - kv 37: tokenizer.ggml.padding_token_id u32 = 248044 llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 39: tokenizer.chat_template str = {%- set image_count = namespace(value... llama_model_loader: - kv 40: general.quantization_version u32 = 2 llama_model_loader: - kv 41: general.file_type u32 = 14 llama_model_loader: - type f32: 353 tensors llama_model_loader: - type q8_0: 96 tensors llama_model_loader: - type q4_K: 341 tensors llama_model_loader: - type q5_K: 60 tensors llama_model_loader: - type q6_K: 1 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Small print_info: file size = 14.68 GiB (4.69 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' llama_model_load_from_file_impl: failed to load model

GiteaMirror commented

2026-04-29 09:50:46 -05:00

@j-wiedemann commented on GitHub (Feb 26, 2026):

I'm total noob on this...
I'm pretty sure qwen3.5:35b run well on gpu with 0.17.1-rc1 yesterday. I mainly use it with newelle.
But today I upgrade to 0.17.1 and either it's returning error 500 or it's just running on cpu.
Is there an easy way to downgrade to 0.17.1-rc1 ? I'm on ubuntu 25.10.

@j-wiedemann commented on GitHub (Feb 26, 2026): I'm total noob on this... I'm pretty sure qwen3.5:35b run well on gpu with 0.17.1-rc1 yesterday. I mainly use it with newelle. But today I upgrade to 0.17.1 and either it's returning error 500 or it's just running on cpu. Is there an easy way to downgrade to 0.17.1-rc1 ? I'm on ubuntu 25.10.

GiteaMirror commented

2026-04-29 09:50:48 -05:00

@MAWK0235 commented on GitHub (Feb 26, 2026):

current release of ollama fails to use qwen3.5 past a single chat before having memory issues. 500 err

( i meet qwen3.5 system requirements)

@MAWK0235 commented on GitHub (Feb 26, 2026): current release of ollama fails to use qwen3.5 past a single chat before having memory issues. 500 err ( i meet qwen3.5 system requirements) <img width="1549" height="361" alt="Image" src="https://github.com/user-attachments/assets/4fe712c2-8b86-4538-9727-aff4fc231228" />

GiteaMirror commented

2026-04-29 09:50:49 -05:00

@Noyze-AI commented on GitHub (Feb 26, 2026):

on version 0.17.1 5090dv2 win11

500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

@Noyze-AI commented on GitHub (Feb 26, 2026): <img width="1160" height="695" alt="Image" src="https://github.com/user-attachments/assets/451b597f-aaf5-474d-9b80-872d9e3b7c09" /> on version 0.17.1 5090dv2 win11 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

GiteaMirror commented

2026-04-29 09:50:49 -05:00

@rick-github commented on GitHub (Feb 26, 2026):

Server logs will aid in debugging.

@rick-github commented on GitHub (Feb 26, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.

GiteaMirror commented

2026-04-29 09:50:50 -05:00

@MAWK0235 commented on GitHub (Feb 26, 2026):

server logs only report "500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details" exception.

even with more verbosity I couldnt get more information from my system.

It looks like it panics and unloads the model from memory without any more information on my end

@MAWK0235 commented on GitHub (Feb 26, 2026): server logs only report "500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details" exception. even with more verbosity I couldnt get more information from my system. It looks like it panics and unloads the model from memory without any more information on my end

GiteaMirror commented

2026-04-29 09:50:51 -05:00

@MAWK0235 commented on GitHub (Feb 26, 2026):

I can triple check my log files when I get back to the workstation later

@MAWK0235 commented on GitHub (Feb 26, 2026): I can triple check my log files when I get back to the workstation later

GiteaMirror commented

2026-04-29 09:50:53 -05:00

@rick-github commented on GitHub (Feb 26, 2026):

@Noyze-AI Different problem, open a new issue.

@rick-github commented on GitHub (Feb 26, 2026): @Noyze-AI Different problem, open a new issue.

GiteaMirror commented

2026-04-29 09:50:54 -05:00

@sindab commented on GitHub (Feb 26, 2026):

PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama --version
ollama version is 0.17.1
PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M
Error: 500 Internal Server Error: unable to load model: C:\Users\Korisnik.ollama\models\blobs\sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17

@sindab commented on GitHub (Feb 26, 2026): PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama --version ollama version is 0.17.1 PS C:\Users\Korisnik\Projekti\AI\qwen35> ollama run hf.co/unsloth/Qwen3.5-27B-GGUF:Q4_K_M Error: 500 Internal Server Error: unable to load model: C:\Users\Korisnik\.ollama\models\blobs\sha256-728960e4dda52d4f2af5bee09b2cbe86addfa93220fe9324bfac9dc727605c17

GiteaMirror commented

2026-04-29 09:50:54 -05:00

@rick-github commented on GitHub (Feb 26, 2026):

@sindab https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

@rick-github commented on GitHub (Feb 26, 2026): @sindab https://github.com/ollama/ollama/issues/14419#issuecomment-3959159035

GiteaMirror commented

2026-04-29 09:50:55 -05:00

@joenorton commented on GitHub (Feb 26, 2026):

can confirm, qwen3.5:35b-a3b-q4_K_M throwing 500 error on ollama version 0.17.1

@joenorton commented on GitHub (Feb 26, 2026): can confirm, qwen3.5:35b-a3b-q4_K_M throwing 500 error on ollama version 0.17.1

GiteaMirror commented

2026-04-29 09:50:55 -05:00

@Matzebhv commented on GitHub (Feb 26, 2026):

Windows-Version 0.17.1 runs fine with qwen3.5:35b here. Connected frigate and homeassistant also, everything runs as aspected.
Ollama is running as a server on my ai max+ 395 64GB

@Matzebhv commented on GitHub (Feb 26, 2026): Windows-Version 0.17.1 runs fine with qwen3.5:35b here. Connected frigate and homeassistant also, everything runs as aspected. Ollama is running as a server on my ai max+ 395 64GB <img width="520" height="216" alt="Image" src="https://github.com/user-attachments/assets/ed5e8335-d0a3-407a-af49-9d29f8e43921" /> <img width="794" height="648" alt="Image" src="https://github.com/user-attachments/assets/05ebf55f-5a13-4281-8444-e2ee10de8176" /> <img width="799" height="849" alt="Image" src="https://github.com/user-attachments/assets/dad6c0b1-5647-4273-a694-171831f56ed5" />

GiteaMirror commented

2026-04-29 09:50:56 -05:00

@j3g commented on GitHub (Feb 26, 2026):

if you are on macOS, this exact file work in LM Studio. From my research it is because llama.cpp is out of date (not latest) in ollama, and in LM Studio it is current. Since you have already downloaded the model, you can rename it with a symbolic link and LM Studio will discover it immediately. Notice ollama names files with sha hashes to make life annoying. Find the file that is 21GB.
PS Zed editor automagically found LM Studio local server and without any pain I was using the model. Adding it in OpenCode was easy also.

mkdir -p ~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF
ln -s ~/.ollama/models/blobs/sha256-e8c60ba898493e3b8141c287ecb016c9bcaa9d8e745775ef26cc81511945a673
~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF/Q4_K_M.gguf

@j3g commented on GitHub (Feb 26, 2026): if you are on macOS, this exact file work in LM Studio. From my research it is because llama.cpp is out of date (not latest) in ollama, and in LM Studio it is current. Since you have already downloaded the model, you can rename it with a symbolic link and LM Studio will discover it immediately. Notice ollama names files with sha hashes to make life annoying. Find the file that is 21GB. PS Zed editor automagically found LM Studio local server and without any pain I was using the model. Adding it in OpenCode was easy also. > mkdir -p ~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF > ln -s ~/.ollama/models/blobs/sha256-e8c60ba898493e3b8141c287ecb016c9bcaa9d8e745775ef26cc81511945a673 \ > ~/.cache/lm-studio/models/unsloth/Qwen3.5-35B-A3B-GGUF/Q4_K_M.gguf

GiteaMirror commented

2026-04-29 09:50:56 -05:00

@rick-github commented on GitHub (Feb 26, 2026):

@iChristGit Different issue: #14444

@rick-github commented on GitHub (Feb 26, 2026): @iChristGit Different issue: #14444

GiteaMirror commented

2026-04-29 09:50:57 -05:00

@YJesus commented on GitHub (Feb 27, 2026):

qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync.

Is there any estimated timeline for the next sync? Thank you

@YJesus commented on GitHub (Feb 27, 2026): > qwen3.5 models downloaded from HF run on the llama.cpp engine. Next vendor sync. Is there any estimated timeline for the next sync? Thank you

GiteaMirror commented

2026-04-29 09:50:58 -05:00

@SAXEM1997 commented on GitHub (Feb 27, 2026):

failed to load model with unknown model architecture: 'qwen35moe', ollama version 0.17.4
model is created from: https://huggingface.co/mradermacher/Qwen3.5-35B-A3B-heretic-GGUF/tree/main

time=2026-02-27T19:58:05.823+08:00 time=2026-02-27T19:58:05.957+08:00 time=2026-02-27T19:58:05.957+08:00 time=2026-02-27T19:58:05.957+08:00 llama_model_loader: llama_model_loader: llama_model_loader: - kv 0: llama_model_loader: - kv 1: llama_model_loader: - kv 2: llama_model_loader: - kv 3: llama_model_loader: - kv 4: llama_model_loader: - kv 5: llama_model_loader: - kv 6: llama_model_loader: - kv 7: llama_model_loader: - kv 8: llama_model_loader: - kv 9: llama_model_loader: - kv 10: llama_model_loader: - kv 11: llama_model_loader: - kv 12: llama_model_loader: - kv 13: llama_model_loader: - kv 14: llama_model_loader: - kv 15: llama_model_loader: - kv 16: llama_model_loader: - kv 17: llama_model_loader: - kv 18: llama_model_loader: - kv llama_model_loader: - kv 20: llama_model_loader: - kv 21: llama_model_loader: - kv 22: llama_model_loader: - kv 23: llama_model_loader: - kv 24: llama_model_loader: - kv llama_model_loader: - kv 26: llama_model_loader: - kv 27: llama_model_loader: - kv 28: llama_model_loader: - kv 29: llama_model_loader: - kv 30: llama_model_loader: - kv 31: llama_model_loader: - kv 32: llama_model_loader: - kv 33: llama_model_loader: - kv 34: llama_model_loader: - kv 35: llama_model_loader: - kv 36: llama_model_loader: - kv 37: llama_model_loader: - kv 38: llama_model_loader: - kv 39: llama_model_loader: - kv 40: llama_model_loader: - kv 41: llama_model_loader: - kv 42: llama_model_loader: - kv 43: llama_model_loader: - kv 44: llama_model_loader: - kv 45: llama_model_loader: - kv 46: llama_model_loader: - kv 47: llama_model_loader: - kv 48: llama_model_loader: - kv 49: llama_model_loader: - type llama_model_loader: - type q4_K: llama_model_loader: - type q5_K: llama_model_loader: - type q6_K: print_info: file format print_info: file type print_info: file size llama_model_load: error llama_model_load_from_file_impl: time=2026-02-27T19:58:06.124+08:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\Users\59200\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 8834"
level=INFO source=cpu_windows.go:148 msg=packages count=1
level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20
loaded meta data with 50 key-value pairs and 733 tensors from D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 (version GGUF V3 (latest))
Dumping metadata keys/values. Note: KV overrides do not apply in this output.
general.architecture str = qwen35moe
general.type str = model
general.sampling.top_k i32 = 20
general.sampling.top_p f32 = 0.950000
general.sampling.temp f32 = 1.000000
general.name str = Qwen3.5 35B A3B Heretic
general.finetune str = heretic
general.basename str = Qwen3.5
general.size_label str = 35B-A3B
general.license str = apache-2.0
general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3...
general.tags arr[str,5] = ["heretic", "uncensored", "decensored...
qwen35moe.block_count u32 = 40
qwen35moe.context_length u32 = 262144
qwen35moe.embedding_length u32 = 2048
qwen35moe.attention.head_count u32 = 16
qwen35moe.attention.head_count_kv u32 = 2
qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
qwen35moe.rope.freq_base f32 = 10000000.000000
19: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001
qwen35moe.expert_count u32 = 256
qwen35moe.expert_used_count u32 = 8
qwen35moe.attention.key_length u32 = 256
qwen35moe.attention.value_length u32 = 256
qwen35moe.expert_feed_forward_length u32 = 512
25: qwen35moe.expert_shared_feed_forward_length u32 = 512
qwen35moe.ssm.conv_kernel u32 = 4
qwen35moe.ssm.state_size u32 = 128
qwen35moe.ssm.group_count u32 = 16
qwen35moe.ssm.time_step_rank u32 = 32
qwen35moe.ssm.inner_size u32 = 4096
qwen35moe.full_attention_interval u32 = 4
qwen35moe.rope.dimension_count u32 = 64
tokenizer.ggml.model str = gpt2
tokenizer.ggml.pre str = qwen35
tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
tokenizer.ggml.eos_token_id u32 = 248046
tokenizer.ggml.padding_token_id u32 = 248044
tokenizer.chat_template str = {%- set image_count = namespace(value...
general.quantization_version u32 = 2
general.file_type u32 = 15
general.url str = https://huggingface.co/mradermacher/Q...
mradermacher.quantize_version str = 2
mradermacher.quantized_by str = mradermacher
mradermacher.quantized_at str = 2026-02-26T08:06:07+01:00
mradermacher.quantized_on str = nico1
general.source.url str = https://huggingface.co/brayniac/Qwen3...
mradermacher.convert_type str = hf
f32: 301 tensors
355 tensors
30 tensors
47 tensors
= GGUF V3 (latest)
= Q4_K - Medium
= 19.71 GiB (4.88 BPW)
loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
failed to load model
level=INFO source=sched.go:473 msg="NewLlamaServer failed" model=D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 error="unable to load model: D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5"

@SAXEM1997 commented on GitHub (Feb 27, 2026): failed to load model with unknown model architecture: 'qwen35moe', ollama version 0.17.4 model is created from: [https://huggingface.co/mradermacher/Qwen3.5-35B-A3B-heretic-GGUF/tree/main](url) > time=2026-02-27T19:58:05.823+08:00 level=INFO source=server.go:431 msg="starting runner" cmd="C:\\Users\\59200\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 8834" time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-02-27T19:58:05.957+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=14 efficiency=8 threads=20 llama_model_loader: loaded meta data with 50 key-value pairs and 733 tensors from D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen35moe llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.sampling.top_k i32 = 20 llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000 llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000 llama_model_loader: - kv 5: general.name str = Qwen3.5 35B A3B Heretic llama_model_loader: - kv 6: general.finetune str = heretic llama_model_loader: - kv 7: general.basename str = Qwen3.5 llama_model_loader: - kv 8: general.size_label str = 35B-A3B llama_model_loader: - kv 9: general.license str = apache-2.0 llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3... llama_model_loader: - kv 11: general.tags arr[str,5] = ["heretic", "uncensored", "decensored... llama_model_loader: - kv 12: qwen35moe.block_count u32 = 40 llama_model_loader: - kv 13: qwen35moe.context_length u32 = 262144 llama_model_loader: - kv 14: qwen35moe.embedding_length u32 = 2048 llama_model_loader: - kv 15: qwen35moe.attention.head_count u32 = 16 llama_model_loader: - kv 16: qwen35moe.attention.head_count_kv u32 = 2 llama_model_loader: - kv 17: qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0] llama_model_loader: - kv 18: qwen35moe.rope.freq_base f32 = 10000000.000000 llama_model_loader: - kv 19: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 20: qwen35moe.expert_count u32 = 256 llama_model_loader: - kv 21: qwen35moe.expert_used_count u32 = 8 llama_model_loader: - kv 22: qwen35moe.attention.key_length u32 = 256 llama_model_loader: - kv 23: qwen35moe.attention.value_length u32 = 256 llama_model_loader: - kv 24: qwen35moe.expert_feed_forward_length u32 = 512 llama_model_loader: - kv 25: qwen35moe.expert_shared_feed_forward_length u32 = 512 llama_model_loader: - kv 26: qwen35moe.ssm.conv_kernel u32 = 4 llama_model_loader: - kv 27: qwen35moe.ssm.state_size u32 = 128 llama_model_loader: - kv 28: qwen35moe.ssm.group_count u32 = 16 llama_model_loader: - kv 29: qwen35moe.ssm.time_step_rank u32 = 32 llama_model_loader: - kv 30: qwen35moe.ssm.inner_size u32 = 4096 llama_model_loader: - kv 31: qwen35moe.full_attention_interval u32 = 4 llama_model_loader: - kv 32: qwen35moe.rope.dimension_count u32 = 64 llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen35 llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 248046 llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 248044 llama_model_loader: - kv 40: tokenizer.chat_template str = {%- set image_count = namespace(value... llama_model_loader: - kv 41: general.quantization_version u32 = 2 llama_model_loader: - kv 42: general.file_type u32 = 15 llama_model_loader: - kv 43: general.url str = https://huggingface.co/mradermacher/Q... llama_model_loader: - kv 44: mradermacher.quantize_version str = 2 llama_model_loader: - kv 45: mradermacher.quantized_by str = mradermacher llama_model_loader: - kv 46: mradermacher.quantized_at str = 2026-02-26T08:06:07+01:00 llama_model_loader: - kv 47: mradermacher.quantized_on str = nico1 llama_model_loader: - kv 48: general.source.url str = https://huggingface.co/brayniac/Qwen3... llama_model_loader: - kv 49: mradermacher.convert_type str = hf llama_model_loader: - type f32: 301 tensors llama_model_loader: - type q4_K: 355 tensors llama_model_loader: - type q5_K: 30 tensors llama_model_loader: - type q6_K: 47 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 19.71 GiB (4.88 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load_from_file_impl: failed to load model time=2026-02-27T19:58:06.124+08:00 level=INFO source=sched.go:473 msg="NewLlamaServer failed" model=D:\data\ai\models\ollama\blobs\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5 error="unable to load model: D:\\data\\ai\\models\\ollama\\blobs\\sha256-499c2c0a0394da8c0c7c22a93850d75e743579f1320d4d056f1db28c2045aba5"

GiteaMirror commented

2026-04-29 09:50:58 -05:00

@rick-github commented on GitHub (Feb 27, 2026):

More information on HF qwen3.5 and ollama in #14503. Closing this as the initial problem of loading qwen35 is resolved by using 0.17.1+.

@rick-github commented on GitHub (Feb 27, 2026): More information on HF qwen3.5 and ollama in #14503. Closing this as the initial problem of loading qwen35 is resolved by using 0.17.1+.

GiteaMirror commented

2026-04-29 09:50:59 -05:00

@kuolung1 commented on GitHub (Mar 30, 2026):

same problem in 0.18.3 for load
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
ollama run hf.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q6_K

@kuolung1 commented on GitHub (Mar 30, 2026): same problem in 0.18.3 for load llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' ollama run hf.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q6_K

GiteaMirror commented

2026-04-29 09:50:59 -05:00

@jonmach commented on GitHub (Mar 31, 2026):

I get the same error on 0.19.0:

$ ollama run moophlo/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
Error: 500 Internal Server Error: unable to load model: /Users/xxx/.ollama/models/blobs/sha256-c071a7725ffcafb8c4faf41b3327f2cd5e162308bdd3431d6232d0405dc1d905

@jonmach commented on GitHub (Mar 31, 2026): I get the same error on 0.19.0: $ ollama run moophlo/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF Error: 500 Internal Server Error: unable to load model: /Users/xxx/.ollama/models/blobs/sha256-c071a7725ffcafb8c4faf41b3327f2cd5e162308bdd3431d6232d0405dc1d905

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#55873