[GH-ISSUE #12910] [BUG] The qwen3-vl:30b-a3b-instruct model cannot output in parallel #34318

Closed
opened 2026-04-22 17:46:00 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @XiaZixun on GitHub (Nov 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12910

What is the issue?

I installed the Ollama service on my Mac via brew services start ollama and set OLLAMA_NUM_PARALLEL=4.

When using qwen3:4b-instruct or gemma3:27b-it-qat, the service performs parallel output normally.

However, when using qwen3-vl:30b-a3b-instruct, it always waits for one window's output to complete before the other window begins outputting.

Considering I have 64GB of GPU memory, and the qwen3-vl:30b-a3b-instruct model only has 20GB of weights, which is similar in size to gemma3:27b-it-qat, I don't think this is due to insufficient memory.

I haven't tested whether other Qwen3 VL models exhibit the same issue.

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.12.9

Originally created by @XiaZixun on GitHub (Nov 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12910 ### What is the issue? I installed the Ollama service on my Mac via `brew services start ollama` and set `OLLAMA_NUM_PARALLEL=4`. When using `qwen3:4b-instruct` or `gemma3:27b-it-qat`, the service performs parallel output normally. However, when using `qwen3-vl:30b-a3b-instruct`, it always waits for one window's output to complete before the other window begins outputting. Considering I have 64GB of GPU memory, and the `qwen3-vl:30b-a3b-instruct` model only has 20GB of weights, which is similar in size to `gemma3:27b-it-qat`, I don't think this is due to insufficient memory. I haven't tested whether other Qwen3 VL models exhibit the same issue. ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.12.9
GiteaMirror added the bug label 2026-04-22 17:46:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34318