[GH-ISSUE #12820] qwen3-vl:32b /set nothink not work and doesn't support parallel requests #70554

Closed
opened 2026-05-04 21:57:13 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chuanyu1 on GitHub (Oct 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12820

What is the issue?

  1. Unable to switch on or off the thinking mode
    ~/wwork/ollama_w$ ollama -v
    ollama version is 0.12.7-rc0
>>> hi /no_think
Thinking...
Okay, the user is sending "hi /no_think" again. Let me look back at the history to      
understand what's going on. 

First, the user started with "hi", then "set", then "hi" again. The assistant           
asked for clarification about "set". Then the user sent "hi /nothink" and "hi           
/no_think". The assistant responded^C


>>> /set nothink
Set 'nothink' mode.
>>> hi
Thinking...
Okay, the user just said "hi" again after a previous interaction where they first       
said "hi", then "set", and now back to "hi". Hmm, this feels like a pattern. 

Let me recap: 
First message was a simple "hi" - I responded warmly with an emoji and offered          
help. 
Then they sent "set" - I asked for clarification since it was ambiguous. 
Now they're back to "hi" - probably testing if I recognize the pattern or just          
being polite. ```
2. parallel requests can't be supported,Other models (eg qwen2.5:32b)don't have this problem


### Relevant log output

```shell

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.7-rc0

Originally created by @chuanyu1 on GitHub (Oct 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12820 ### What is the issue? 1. Unable to switch on or off the thinking mode ~/wwork/ollama_w$ ollama -v ollama version is 0.12.7-rc0 ``` >>> hi /no_think Thinking... Okay, the user is sending "hi /no_think" again. Let me look back at the history to understand what's going on. First, the user started with "hi", then "set", then "hi" again. The assistant asked for clarification about "set". Then the user sent "hi /nothink" and "hi /no_think". The assistant responded^C >>> /set nothink Set 'nothink' mode. >>> hi Thinking... Okay, the user just said "hi" again after a previous interaction where they first said "hi", then "set", and now back to "hi". Hmm, this feels like a pattern. Let me recap: First message was a simple "hi" - I responded warmly with an emoji and offered help. Then they sent "set" - I asked for clarification since it was ambiguous. Now they're back to "hi" - probably testing if I recognize the pattern or just being polite. ``` 2. parallel requests can't be supported,Other models (eg qwen2.5:32b)don't have this problem ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.7-rc0
GiteaMirror added the bug label 2026-05-04 21:57:13 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 29, 2025):

qwen3-vl comes in thinking and non-thinking variants. If you want non-thinking, use the instruct variant.

Set OLLAMA_NUM_PARALLEL to increase parallelism.

<!-- gh-comment-id:3460569201 --> @rick-github commented on GitHub (Oct 29, 2025): qwen3-vl comes in thinking and non-thinking variants. If you want non-thinking, use the [instruct](https://ollama.com/library/qwen3-vl:32b-instruct) variant. Set [`OLLAMA_NUM_PARALLEL`](https://docs.ollama.com/faq#how-does-ollama-handle-concurrent-requests%3F) to increase parallelism.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70554