[GH-ISSUE #13353] Qwen3VLRenderer/Parser ignores think API parameter #70877

Closed
opened 2026-05-04 23:19:38 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @yuu104 on GitHub (Dec 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13353

What is the issue?

When using qwen3-vl model with RENDERER qwen3-vl-thinking (which is the default for the official qwen3-vl model), the think API parameter is completely ignored.

Expected behavior:

  • think: true → thinking mode enabled
  • think: false → thinking mode disabled
  • think not specified → thinking mode disabled (consistent with CogitoRenderer)

Actual behavior: The model always operates in thinking mode regardless of the think parameter, resulting in slow responses (50+ seconds).

API request example

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl",
  "messages": [{"role": "user", "content": "Hello"}],
  "think": false,
  "stream": false
}'

Even with think: false, the model still enters thinking mode.

Root cause (code analysis)

1. Renderer issue

model/renderers/qwen3vl.go:74:

func (r *Qwen3VLRenderer) Render(messages []api.Message, tools []api.Tool, _ *api.ThinkValue) (string, error) {

The thinkValue parameter is discarded with _. The renderer only uses r.isThinking which is set at construction time based on the renderer name (qwen3-vl-thinking vs qwen3-vl-instruct).

2. Parser issue

model/parsers/qwen3vl.go:57-61:

func (p *Qwen3VLParser) Init(tools []api.Tool, lastMessage *api.Message, thinkValue *api.ThinkValue) []api.Tool {
    p.tools = tools
    p.setInitialState(lastMessage)  // thinkValue is NOT passed here
    return tools
}

thinkValue is received but not passed to setInitialState, so it has no effect.

Comparison with working implementation

CogitoRenderer correctly respects the think parameter (model/renderers/cogito.go:14-20):

func (r *CogitoRenderer) Render(messages []api.Message, tools []api.Tool, thinkValue *api.ThinkValue) (string, error) {
    // ...
    // thinking is enabled: model must support it AND user must request it (true)
    enableThinking := r.isThinking && (thinkValue != nil && thinkValue.Bool())
    // ...
}

This pattern should be applied to Qwen3VLRenderer and Qwen3VLParser as well.

Proposed fix

Apply the same pattern as CogitoRenderer: thinking is enabled only when the model supports it AND user explicitly requests it.

Renderer fix

func (r *Qwen3VLRenderer) Render(messages []api.Message, tools []api.Tool, thinkValue *api.ThinkValue) (string, error) {
    // thinking is enabled: model must support it AND user must request it
    enableThinking := r.isThinking && (thinkValue != nil && thinkValue.Bool())
    // Use enableThinking instead of r.isThinking throughout the method
    // ...
}

Parser fix

func (p *Qwen3VLParser) Init(tools []api.Tool, lastMessage *api.Message, thinkValue *api.ThinkValue) []api.Tool {
    p.tools = tools
    p.setInitialState(lastMessage, thinkValue)  // Pass thinkValue
    return tools
}

func (p *Qwen3VLParser) setInitialState(lastMessage *api.Message, thinkValue *api.ThinkValue) {
    prefill := lastMessage != nil && lastMessage.Role == "assistant"

    // Check both hasThinkingSupport AND thinkValue (same pattern as CogitoRenderer)
    thinkingEnabled := p.HasThinkingSupport() && (thinkValue != nil && thinkValue.Bool())

    if !thinkingEnabled {
        p.state = CollectingContent
        return
    }
    // ...
}

Behavior after fix

think parameter Current behavior Expected behavior
true thinking thinking
false thinking (bug) non-thinking
not specified thinking non-thinking (consistent with Cogito)

Workaround

Change Modelfile to use non-thinking variants:

RENDERER qwen3-vl-instruct
PARSER qwen3-vl-instruct

Then recreate the model:

ollama create your-model -f Modelfile

Environment

  • OS: Any
  • Ollama version: Latest (main branch, also affects released versions)
  • #10961 - Enable/disable thinking through modelfiles (feature request)
  • #10964 - think:false output is unstable (different root cause - model behavior)
  • #11712 - qwen3:235b /nothink doesn't work
Originally created by @yuu104 on GitHub (Dec 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13353 ## What is the issue? When using `qwen3-vl` model with `RENDERER qwen3-vl-thinking` (which is the default for the official qwen3-vl model), the `think` API parameter is completely ignored. **Expected behavior**: - `think: true` → thinking mode enabled - `think: false` → thinking mode disabled - `think` not specified → thinking mode disabled (consistent with `CogitoRenderer`) **Actual behavior**: The model always operates in thinking mode regardless of the `think` parameter, resulting in slow responses (50+ seconds). ### API request example ```bash curl http://localhost:11434/api/chat -d '{ "model": "qwen3-vl", "messages": [{"role": "user", "content": "Hello"}], "think": false, "stream": false }' ``` Even with `think: false`, the model still enters thinking mode. ## Root cause (code analysis) ### 1. Renderer issue `model/renderers/qwen3vl.go:74`: ```go func (r *Qwen3VLRenderer) Render(messages []api.Message, tools []api.Tool, _ *api.ThinkValue) (string, error) { ``` The `thinkValue` parameter is discarded with `_`. The renderer only uses `r.isThinking` which is set at construction time based on the renderer name (`qwen3-vl-thinking` vs `qwen3-vl-instruct`). ### 2. Parser issue `model/parsers/qwen3vl.go:57-61`: ```go func (p *Qwen3VLParser) Init(tools []api.Tool, lastMessage *api.Message, thinkValue *api.ThinkValue) []api.Tool { p.tools = tools p.setInitialState(lastMessage) // thinkValue is NOT passed here return tools } ``` `thinkValue` is received but not passed to `setInitialState`, so it has no effect. ## Comparison with working implementation `CogitoRenderer` correctly respects the `think` parameter (`model/renderers/cogito.go:14-20`): ```go func (r *CogitoRenderer) Render(messages []api.Message, tools []api.Tool, thinkValue *api.ThinkValue) (string, error) { // ... // thinking is enabled: model must support it AND user must request it (true) enableThinking := r.isThinking && (thinkValue != nil && thinkValue.Bool()) // ... } ``` This pattern should be applied to `Qwen3VLRenderer` and `Qwen3VLParser` as well. ## Proposed fix Apply the same pattern as `CogitoRenderer`: thinking is enabled only when the model supports it AND user explicitly requests it. ### Renderer fix ```go func (r *Qwen3VLRenderer) Render(messages []api.Message, tools []api.Tool, thinkValue *api.ThinkValue) (string, error) { // thinking is enabled: model must support it AND user must request it enableThinking := r.isThinking && (thinkValue != nil && thinkValue.Bool()) // Use enableThinking instead of r.isThinking throughout the method // ... } ``` ### Parser fix ```go func (p *Qwen3VLParser) Init(tools []api.Tool, lastMessage *api.Message, thinkValue *api.ThinkValue) []api.Tool { p.tools = tools p.setInitialState(lastMessage, thinkValue) // Pass thinkValue return tools } func (p *Qwen3VLParser) setInitialState(lastMessage *api.Message, thinkValue *api.ThinkValue) { prefill := lastMessage != nil && lastMessage.Role == "assistant" // Check both hasThinkingSupport AND thinkValue (same pattern as CogitoRenderer) thinkingEnabled := p.HasThinkingSupport() && (thinkValue != nil && thinkValue.Bool()) if !thinkingEnabled { p.state = CollectingContent return } // ... } ``` ## Behavior after fix | `think` parameter | Current behavior | Expected behavior | |-------------------|------------------|-------------------| | `true` | thinking | thinking | | `false` | thinking (bug) | non-thinking | | not specified | thinking | non-thinking (consistent with Cogito) | ## Workaround Change Modelfile to use non-thinking variants: ``` RENDERER qwen3-vl-instruct PARSER qwen3-vl-instruct ``` Then recreate the model: ```bash ollama create your-model -f Modelfile ``` ## Environment - **OS**: Any - **Ollama version**: Latest (main branch, also affects released versions) ## Related issues - #10961 - Enable/disable thinking through modelfiles (feature request) - #10964 - `think:false` output is unstable (different root cause - model behavior) - #11712 - qwen3:235b `/nothink` doesn't work
Author
Owner

@rick-github commented on GitHub (Dec 6, 2025):

qwen3-vl is not a hybrid model. If you don't want thinking. use the instruct version of the model.

<!-- gh-comment-id:3620006552 --> @rick-github commented on GitHub (Dec 6, 2025): qwen3-vl is not a hybrid model. If you don't want thinking. use the [instruct](https://ollama.com/library/qwen3-vl:8b-instruct-q4_K_M) version of the model.
Author
Owner

@yuu104 commented on GitHub (Dec 6, 2025):

@rick-github
Thank you for the clarification. I understand now that qwen3-vl is not a hybrid model like qwen3, and users should choose between the thinking and instruct variants at the model level rather than using the think API parameter.

<!-- gh-comment-id:3620201805 --> @yuu104 commented on GitHub (Dec 6, 2025): @rick-github Thank you for the clarification. I understand now that qwen3-vl is not a hybrid model like qwen3, and users should choose between the thinking and instruct variants at the model level rather than using the `think` API parameter.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70877