[GH-ISSUE #14638] No parallel requests for qwen35moe #35244

Closed
opened 2026-04-22 19:37:44 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @chigkim on GitHub (Mar 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14638

What is the issue?

I've been playing with multiagents workflow with Codex, and I noticed that the timing of processing requests were weird.
I tried to send multiple requests to Ollama with python script manually, and it realized it doesn't work.
I also found this in the log:
level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35moe
Here's the full log:
https://pastebin.com/Tcz7uPMt

Relevant log output

time=2026-03-05T06:40:37.002-05:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_NO_CLOUD=0 OLLAMA_MODELS=/Users/cgk/.ollama/models OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 PATH="/Users/CGK/qt/6.10.1/macos/bin:/Users/cgk/.pyenv/shims:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/Applications/VMware Fusion.app/Contents/Public" OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_NEW_ESTIMATES=1 OLLAMA_NUM_PARALLEL=2 OLLAMA_CONTEXT_LENGTH=65536 OLLAMA_HOST=0.0.0.0 DYLD_LIBRARY_PATH=/Applications/Ollama.app/Contents/Resources OLLAMA_LIBRARY_PATH=/Applications/Ollama.app/Contents/Resources
...
time=2026-03-05T06:40:37.818-05:00 level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35moe
...
...
time=2026-03-05T06:40:37.880-05:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:12 GPULayers:41[ID:0 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
...
time=2026-03-05T06:41:27.656-05:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0
time=2026-03-05T06:41:27.674-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=67196 format=""
time=2026-03-05T06:41:27.692-05:00 level=DEBUG source=recurrent_checkpoints.go:318 msg="qwen3next: checkpoint miss" seq=0 slot=0 target=1 size=1 min=16 max=16 last=16
time=2026-03-05T06:41:27.692-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=488 prompt=15659 used=0 remaining=15659
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q8_0_f32', name = 'kernel_mul_mm_q8_0_f32_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q8_0_f32_bci=0_bco=0            0xbcf807600 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_ssm_conv_f32_f32_batched_4', name = 'kernel_ssm_conv_f32_f32_batched_4_ssm_conv_bs=256'
ggml_metal_library_compile_pipeline: loaded kernel_ssm_conv_f32_f32_batched_4_ssm_conv_bs=256      0xbcf807900 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_id_map0_ne20_8', name = 'kernel_mul_mm_id_map0_ne20_8_ne02=256'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_id_map0_ne20_8_ne02=256         0xbcf807c00 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_id_q8_0_f32', name = 'kernel_mul_mm_id_q8_0_f32_bci=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_id_q8_0_f32_bci=0               0xbcf8b4000 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_blk', name = 'kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64      0xbcf8b4300 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f16_dk256_dv256', name = 'kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=0_ns10=512_ns20=512_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=0_ns10=512_ns20=512_nsg=4      0xbcf8b4600 | th_max = 1024 | th_width =   32
time=2026-03-05T06:41:28.737-05:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0
time=2026-03-05T06:41:28.745-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=37333 format=""
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f16_dk256_dv256', name = 'kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=1_ns10=512_ns20=512_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=1_ns10=512_ns20=512_nsg=4      0xbcf8b4900 | th_max = 1024 | th_width =   32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_vec_f16_dk256_dv256', name = 'kernel_flash_attn_ext_vec_f16_dk256_dv256_mask=1_sink=0_bias=0_scap=0_kvpad=0_ns10=512_ns20=512_nsg=4_nwg=32'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_vec_f16_dk256_dv256_mask=1_sink=0_bias=0_scap=0_kvpad=0_ns10=512_ns20=512_nsg=4_nwg=32      0xbcf8b4c00 | th_max = 1024 | th_width =   32
[GIN] 2026/03/05 - 06:43:08 | 200 |         1m40s |  192.168.99.177 | POST     "/v1/chat/completions"
time=2026-03-05T06:43:08.282-05:00 level=DEBUG source=sched.go:431 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536
time=2026-03-05T06:43:08.282-05:00 level=DEBUG source=sched.go:354 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 refCount=1
time=2026-03-05T06:43:08.291-05:00 level=DEBUG source=recurrent_checkpoints.go:318 msg="qwen3next: checkpoint miss" seq=0 slot=0 target=113 size=9 min=511 max=16511 last=16511
time=2026-03-05T06:43:08.291-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=18032 prompt=7442 used=0 remaining=7442
[GIN] 2026/03/05 - 06:45:01 | 200 |         3m32s |  192.168.99.177 | POST     "/v1/chat/completions"
time=2026-03-05T06:45:01.563-05:00 level=DEBUG source=sched.go:431 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536
time=2026-03-05T06:45:01.563-05:00 level=DEBUG source=sched.go:336 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 duration=5m0s
time=2026-03-05T06:45:01.563-05:00 level=DEBUG source=sched.go:354 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 refCount=0
[GIN] 2026/03/05 - 06:45:37 | 200 |      58.334µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/03/05 - 06:45:37 | 200 |     192.792µs |       127.0.0.1 | GET      "/api/ps"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.17.6

Originally created by @chigkim on GitHub (Mar 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14638 ### What is the issue? I've been playing with multiagents workflow with Codex, and I noticed that the timing of processing requests were weird. I tried to send multiple requests to Ollama with python script manually, and it realized it doesn't work. I also found this in the log: `level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35moe` Here's the full log: https://pastebin.com/Tcz7uPMt ### Relevant log output ```shell time=2026-03-05T06:40:37.002-05:00 level=DEBUG source=server.go:431 msg=subprocess OLLAMA_NO_CLOUD=0 OLLAMA_MODELS=/Users/cgk/.ollama/models OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 PATH="/Users/CGK/qt/6.10.1/macos/bin:/Users/cgk/.pyenv/shims:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/Applications/VMware Fusion.app/Contents/Public" OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_NEW_ESTIMATES=1 OLLAMA_NUM_PARALLEL=2 OLLAMA_CONTEXT_LENGTH=65536 OLLAMA_HOST=0.0.0.0 DYLD_LIBRARY_PATH=/Applications/Ollama.app/Contents/Resources OLLAMA_LIBRARY_PATH=/Applications/Ollama.app/Contents/Resources ... time=2026-03-05T06:40:37.818-05:00 level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35moe ... ... time=2026-03-05T06:40:37.880-05:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:65536 KvCacheType: NumThreads:12 GPULayers:41[ID:0 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ... time=2026-03-05T06:41:27.656-05:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 time=2026-03-05T06:41:27.674-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=67196 format="" time=2026-03-05T06:41:27.692-05:00 level=DEBUG source=recurrent_checkpoints.go:318 msg="qwen3next: checkpoint miss" seq=0 slot=0 target=1 size=1 min=16 max=16 last=16 time=2026-03-05T06:41:27.692-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=488 prompt=15659 used=0 remaining=15659 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q8_0_f32', name = 'kernel_mul_mm_q8_0_f32_bci=0_bco=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q8_0_f32_bci=0_bco=0 0xbcf807600 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_ssm_conv_f32_f32_batched_4', name = 'kernel_ssm_conv_f32_f32_batched_4_ssm_conv_bs=256' ggml_metal_library_compile_pipeline: loaded kernel_ssm_conv_f32_f32_batched_4_ssm_conv_bs=256 0xbcf807900 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_id_map0_ne20_8', name = 'kernel_mul_mm_id_map0_ne20_8_ne02=256' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_id_map0_ne20_8_ne02=256 0xbcf807c00 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_id_q8_0_f32', name = 'kernel_mul_mm_id_q8_0_f32_bci=0' ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_id_q8_0_f32_bci=0 0xbcf8b4000 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_blk', name = 'kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64' ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64 0xbcf8b4300 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f16_dk256_dv256', name = 'kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=0_ns10=512_ns20=512_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=0_ns10=512_ns20=512_nsg=4 0xbcf8b4600 | th_max = 1024 | th_width = 32 time=2026-03-05T06:41:28.737-05:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 time=2026-03-05T06:41:28.745-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=37333 format="" ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f16_dk256_dv256', name = 'kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=1_ns10=512_ns20=512_nsg=4' ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f16_dk256_dv256_mask=1_sinks=0_bias=0_scap=0_kvpad=0_bcm=1_ns10=512_ns20=512_nsg=4 0xbcf8b4900 | th_max = 1024 | th_width = 32 ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_vec_f16_dk256_dv256', name = 'kernel_flash_attn_ext_vec_f16_dk256_dv256_mask=1_sink=0_bias=0_scap=0_kvpad=0_ns10=512_ns20=512_nsg=4_nwg=32' ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_vec_f16_dk256_dv256_mask=1_sink=0_bias=0_scap=0_kvpad=0_ns10=512_ns20=512_nsg=4_nwg=32 0xbcf8b4c00 | th_max = 1024 | th_width = 32 [GIN] 2026/03/05 - 06:43:08 | 200 | 1m40s | 192.168.99.177 | POST "/v1/chat/completions" time=2026-03-05T06:43:08.282-05:00 level=DEBUG source=sched.go:431 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 time=2026-03-05T06:43:08.282-05:00 level=DEBUG source=sched.go:354 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 refCount=1 time=2026-03-05T06:43:08.291-05:00 level=DEBUG source=recurrent_checkpoints.go:318 msg="qwen3next: checkpoint miss" seq=0 slot=0 target=113 size=9 min=511 max=16511 last=16511 time=2026-03-05T06:43:08.291-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=18032 prompt=7442 used=0 remaining=7442 [GIN] 2026/03/05 - 06:45:01 | 200 | 3m32s | 192.168.99.177 | POST "/v1/chat/completions" time=2026-03-05T06:45:01.563-05:00 level=DEBUG source=sched.go:431 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 time=2026-03-05T06:45:01.563-05:00 level=DEBUG source=sched.go:336 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 duration=5m0s time=2026-03-05T06:45:01.563-05:00 level=DEBUG source=sched.go:354 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:35b-a3b-q8_0 runner.inference="[{ID:0 Library:Metal}]" runner.size="39.9 GiB" runner.vram="39.9 GiB" runner.parallel=1 runner.pid=55938 runner.model=/Users/cgk/.ollama/models/blobs/sha256-acd3c29c18f07df11b02809f1787803dbf0ba97abcd16c26e38b75168fce79e0 runner.num_ctx=65536 refCount=0 [GIN] 2026/03/05 - 06:45:37 | 200 | 58.334µs | 127.0.0.1 | HEAD "/" [GIN] 2026/03/05 - 06:45:37 | 200 | 192.792µs | 127.0.0.1 | GET "/api/ps" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.17.6
GiteaMirror added the bug label 2026-04-22 19:37:44 -05:00
Author
Owner

@scmarvin commented on GitHub (Mar 6, 2026):

Please note that this issue was erroneously closed - it is unrelated to ticket 4165 as that was created in May 2024, well before Qwen v3.5 existed. My testing indicates that this is an ongoing problem with the current version of Qwen (v3.5) in relation to the current version of Ollama (v0.17.7) and that Qwen's Ollama parallelism integration is perfectly functional under the last version of Qwen (v3). Further research indicates that this model does support parallelism however Ollama is degrading it. Kindly reopen and address this issue accordingly.

<!-- gh-comment-id:4013893369 --> @scmarvin commented on GitHub (Mar 6, 2026): Please note that this issue was erroneously closed - it is unrelated to ticket 4165 as that was created in May 2024, well before Qwen v3.5 existed. My testing indicates that this is an ongoing problem with the current version of Qwen (v3.5) in relation to the current version of Ollama (v0.17.7) and that Qwen's Ollama parallelism integration is perfectly functional under the last version of Qwen (v3). Further research indicates that this model does support parallelism however Ollama is degrading it. Kindly reopen and address this issue accordingly.
Author
Owner

@rick-github commented on GitHub (Mar 7, 2026):

afb4c62fbf/server/sched.go (L446-L451)

<!-- gh-comment-id:4016226771 --> @rick-github commented on GitHub (Mar 7, 2026): https://github.com/ollama/ollama/blob/afb4c62fbf6839319dbe93c1bbb9eb7fc9a67c3e/server/sched.go#L446-L451
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35244