[GH-ISSUE #15128] Qwen3-Next:80b : doesn't load anymore after v0.18.x #9684

Closed
opened 2026-04-12 22:34:04 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Aelentel on GitHub (Mar 29, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15128

What is the issue?

the model loaded successfully before v18 and after the update i got this error :

$ ollama -v && ollama run qwen3-next:80b

ollama version is 0.18.3
Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections

Relevant log output

Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 |      52.556µs |       127.0.0.1 | GET      "/api/version"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 |       28.23µs |       127.0.0.1 | HEAD     "/"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 |   90.213149ms |       127.0.0.1 | POST     "/api/show"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 |   85.191693ms |       127.0.0.1 | POST     "/api/show"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_backend_cuda_device_get_memory device GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7ab utilizing NVML memory reporting free: 44256460800 total: 100485038080
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.739Z level=INFO source=sched.go:627 msg="updated VRAM based on existing loaded models" gpu=GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7a>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.764Z level=WARN source=sched.go:423 msg="model architecture does not currently support parallel requests" architecture=qwen3next
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.796Z level=INFO source=server.go:247 msg="enabling flash attention"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.796Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ol>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.797Z level=INFO source=sched.go:484 msg="system memory" total="251.6 GiB" free="246.5 GiB" free_swap="0 B"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.797Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7ab library=CUDA available="40.8 GiB">
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.797Z level=INFO source=server.go:759 msg="loading model" "model layers"=49 requested=-1
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.808Z level=INFO source=runner.go:1411 msg="starting ollama engine"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.809Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:35725"
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.818Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.840Z level=INFO source=ggml.go:136 msg="" architecture=qwen3next file_type=Q4_K_M name="Qwen3 Next 80B A3B Thinking" description=">
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_cuda_init: found 1 CUDA devices:
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]:   Device 0: NVIDIA H100 NVL, compute capability 9.0, VMM: yes, ID: GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7ab
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.907Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.B>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.914Z level=INFO source=server.go:1218 msg="llm load error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_ga>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.915Z level=INFO source=runner.go:1284 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disable>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.915Z level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-8476acca2ca7dc4dd86ad2e0>
Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 500 |  310.450484ms |       127.0.0.1 | POST     "/api/generate"

OS

Ubuntu 24.04.4 LTS

GPU

Nvidia H100

CPU

Intel Xeon Processor

Ollama version

v0.18.3

Originally created by @Aelentel on GitHub (Mar 29, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15128 ### What is the issue? the model loaded successfully before v18 and after the update i got this error : $ ollama -v && ollama run qwen3-next:80b ollama version is 0.18.3 Error: 500 Internal Server Error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_gate projections ### Relevant log output ```shell Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 | 52.556µs | 127.0.0.1 | GET "/api/version" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 | 28.23µs | 127.0.0.1 | HEAD "/" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 | 90.213149ms | 127.0.0.1 | POST "/api/show" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 200 | 85.191693ms | 127.0.0.1 | POST "/api/show" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_backend_cuda_device_get_memory device GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7ab utilizing NVML memory reporting free: 44256460800 total: 100485038080 Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.739Z level=INFO source=sched.go:627 msg="updated VRAM based on existing loaded models" gpu=GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7a> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.764Z level=WARN source=sched.go:423 msg="model architecture does not currently support parallel requests" architecture=qwen3next Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.796Z level=INFO source=server.go:247 msg="enabling flash attention" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.796Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ol> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.797Z level=INFO source=sched.go:484 msg="system memory" total="251.6 GiB" free="246.5 GiB" free_swap="0 B" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.797Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7ab library=CUDA available="40.8 GiB"> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.797Z level=INFO source=server.go:759 msg="loading model" "model layers"=49 requested=-1 Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.808Z level=INFO source=runner.go:1411 msg="starting ollama engine" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.809Z level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:35725" Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.818Z level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.840Z level=INFO source=ggml.go:136 msg="" architecture=qwen3next file_type=Q4_K_M name="Qwen3 Next 80B A3B Thinking" description="> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: ggml_cuda_init: found 1 CUDA devices: Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: Device 0: NVIDIA H100 NVL, compute capability 9.0, VMM: yes, ID: GPU-e34b0cab-38f4-fb3e-dfd1-538bd97bf7ab Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.907Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.B> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.914Z level=INFO source=server.go:1218 msg="llm load error: failed to initialize model: qwen3next: layer 0 missing attn_qkv/attn_ga> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.915Z level=INFO source=runner.go:1284 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disable> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: time=2026-03-29T09:07:20.915Z level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-8476acca2ca7dc4dd86ad2e0> Mar 29 09:07:20 gof-zur-nut-vm490-023 ollama[816865]: [GIN] 2026/03/29 - 09:07:20 | 500 | 310.450484ms | 127.0.0.1 | POST "/api/generate" ``` ### OS Ubuntu 24.04.4 LTS ### GPU Nvidia H100 ### CPU Intel Xeon Processor ### Ollama version v0.18.3
GiteaMirror added the bug label 2026-04-12 22:34:04 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9684