[GH-ISSUE #11327] Client's chat function doesn't handle the "think" argument properly for Qwen 3 #33234

Closed
opened 2026-04-22 15:42:21 -05:00 by GiteaMirror · 16 comments
Owner

Originally created by @mattans on GitHub (Jul 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11327

What is the issue?

When not passing the think argument to Client.chat, Qwen3 defaults to use reasoning.
When passing think=False, reasoning is turned off.
However, when passing think=True, reasoning is also turned off.

This is probably also true for AsyncClient, and to the generate method too.

See attached logs to reproduce.

Relevant log output

from ollama import  Client

client = Client()
model = "qwen3:0.6b"
messages = [{"role": "user", "content": "What is the capital of France?"}]

response = client.chat(model=model, messages=messages)
content = response.message.content
print("Response:")
print(content)

print("**" * 20)

response = client.chat(model=model, messages=messages, think=True)
content = response.message.content
print("think=True Response:")
print(content)



Response:
<think>
Okay, the user is asking about the capital of France. I know that France has a capital city called Paris. But I should make sure I'm not mixing up any other options. Let me think... Are there any other cities that are considered capitals? I recall that sometimes people confuse other countries. For example, in some other nations, there's a city called "Monaco" or "Madrid." But no, in France, the capital is definitely Paris. I should also mention that the capital is a city of French origin to explain why it's called "capital." Plus, maybe add a bit about the history or significance to give more context. Wait, but the user just asked for the capital, so maybe just confirming the answer is enough. Let me check again. Yes, Paris is the capital. I think that's all.
</think>
The capital of France is **Paris**. It is a city of French origin and holds significant historical and cultural importance.
****************************************
think=True Response:
The capital of France is **Paris**. It is a major city in France and holds historical significance as the center of the old French empire.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.5.1

Originally created by @mattans on GitHub (Jul 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11327 ### What is the issue? When not passing the `think` argument to `Client.chat`, Qwen3 defaults to use reasoning. When passing `think=False`, reasoning is turned off. However, when passing `think=True`, **reasoning is also turned off.** This is probably also true for AsyncClient, and to the `generate` method too. See attached logs to reproduce. ### Relevant log output ```shell from ollama import Client client = Client() model = "qwen3:0.6b" messages = [{"role": "user", "content": "What is the capital of France?"}] response = client.chat(model=model, messages=messages) content = response.message.content print("Response:") print(content) print("**" * 20) response = client.chat(model=model, messages=messages, think=True) content = response.message.content print("think=True Response:") print(content) Response: <think> Okay, the user is asking about the capital of France. I know that France has a capital city called Paris. But I should make sure I'm not mixing up any other options. Let me think... Are there any other cities that are considered capitals? I recall that sometimes people confuse other countries. For example, in some other nations, there's a city called "Monaco" or "Madrid." But no, in France, the capital is definitely Paris. I should also mention that the capital is a city of French origin to explain why it's called "capital." Plus, maybe add a bit about the history or significance to give more context. Wait, but the user just asked for the capital, so maybe just confirming the answer is enough. Let me check again. Yes, Paris is the capital. I think that's all. </think> The capital of France is **Paris**. It is a city of French origin and holds significant historical and cultural importance. **************************************** think=True Response: The capital of France is **Paris**. It is a major city in France and holds historical significance as the center of the old French empire. ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.1
GiteaMirror added the bug label 2026-04-22 15:42:21 -05:00
Author
Owner

@DShaience commented on GitHub (Jul 8, 2025):

Happens to me as well

<!-- gh-comment-id:3047887310 --> @DShaience commented on GitHub (Jul 8, 2025): Happens to me as well
Author
Owner

@rick-github commented on GitHub (Jul 8, 2025):

--- 11327.py.orig	2025-07-08 11:13:02.824301770 +0200
+++ 11327.py	2025-07-08 11:13:43.544112993 +0200
@@ -13,5 +13,7 @@
 
 response = client.chat(model=model, messages=messages, think=True)
 content = response.message.content
+thinking = response.message.thinking
 print("think=True Response:")
+print(thinking)
 print(content)
<!-- gh-comment-id:3048047216 --> @rick-github commented on GitHub (Jul 8, 2025): ```diff --- 11327.py.orig 2025-07-08 11:13:02.824301770 +0200 +++ 11327.py 2025-07-08 11:13:43.544112993 +0200 @@ -13,5 +13,7 @@ response = client.chat(model=model, messages=messages, think=True) content = response.message.content +thinking = response.message.thinking print("think=True Response:") +print(thinking) print(content)
Author
Owner

@gee-coder commented on GitHub (Jul 8, 2025):

@mattans @DShaience @rick-github I didn't reproduce this problem when using v0.9.3

<!-- gh-comment-id:3048080412 --> @gee-coder commented on GitHub (Jul 8, 2025): @mattans @DShaience @rick-github I didn't reproduce this problem when using v0.9.3
Author
Owner

@rick-github commented on GitHub (Jul 8, 2025):

If think is enabled, the output from the model is split into two fields, thinking and content.

<!-- gh-comment-id:3048114635 --> @rick-github commented on GitHub (Jul 8, 2025): If `think` is enabled, the output from the model is split into two fields, `thinking` and `content`.
Author
Owner

@gee-coder commented on GitHub (Jul 8, 2025):

If think is enabled, the output from the model is split into two fields, thinking and content.

Yes, just like this

Image
<!-- gh-comment-id:3048158404 --> @gee-coder commented on GitHub (Jul 8, 2025): > If `think` is enabled, the output from the model is split into two fields, `thinking` and `content`. Yes, just like this <img width="2479" height="625" alt="Image" src="https://github.com/user-attachments/assets/d5052c36-7ea5-4c03-a4ad-61e79cd6a743" />
Author
Owner

@gee-coder commented on GitHub (Jul 8, 2025):

If think is enabled, the output from the model is split into two fields, thinking and content.

Do you think it's necessary to change this?

<!-- gh-comment-id:3048164369 --> @gee-coder commented on GitHub (Jul 8, 2025): > If `think` is enabled, the output from the model is split into two fields, `thinking` and `content`. Do you think it's necessary to change this?
Author
Owner

@rick-github commented on GitHub (Jul 8, 2025):

Why?

<!-- gh-comment-id:3048166775 --> @rick-github commented on GitHub (Jul 8, 2025): Why?
Author
Owner

@gee-coder commented on GitHub (Jul 8, 2025):

@rick-github Why not fix "think" as an independent field and distinguish it from "content" when not passing the "think" parameter

<!-- gh-comment-id:3048196972 --> @gee-coder commented on GitHub (Jul 8, 2025): @rick-github Why not fix "think" as an independent field and distinguish it from "content" when not passing the "think" parameter
Author
Owner

@rick-github commented on GitHub (Jul 8, 2025):

Because some clients were using the output from qwen3 before the thinking API changes, so leaving the content unchanged when the client doesn't set the think parameter maintains backwards compatibility.

<!-- gh-comment-id:3048235070 --> @rick-github commented on GitHub (Jul 8, 2025): Because some clients were using the output from qwen3 before the thinking API changes, so leaving the content unchanged when the client doesn't set the `think` parameter maintains backwards compatibility.
Author
Owner

@gee-coder commented on GitHub (Jul 8, 2025):

@rick-github Sir, in this way, can it not only be backward compatible with the client but also ensure the uniqueness of the API return results at the same time?

Image
<!-- gh-comment-id:3048275174 --> @gee-coder commented on GitHub (Jul 8, 2025): @rick-github Sir, in this way, can it not only be backward compatible with the client but also ensure the uniqueness of the API return results at the same time? <img width="2214" height="309" alt="Image" src="https://github.com/user-attachments/assets/f67deaef-d399-40b5-8fe4-fd4de36a82f0" />
Author
Owner

@gee-coder commented on GitHub (Jul 8, 2025):

If possible, can I create a PR

<!-- gh-comment-id:3048285652 --> @gee-coder commented on GitHub (Jul 8, 2025): If possible, can I create a PR
Author
Owner

@rick-github commented on GitHub (Jul 8, 2025):

This does not maintain backwards compatibility.

<!-- gh-comment-id:3048359402 --> @rick-github commented on GitHub (Jul 8, 2025): This does not maintain backwards compatibility.
Author
Owner

@jchwenger commented on GitHub (Feb 2, 2026):

Hi there,

I'm encountering this as well:

from ollama import  Client

client = Client()
model = "qwen3:0.6b"
messages = [{"role": "user", "content": "What is the capital of France?"}]

response = client.chat(model=model, messages=messages)
content = response.message.content
print("Response:")
print(content)   # displays the thinking trace

print("**" * 20)

response = client.chat(model=model, messages=messages, think=True)
thinking = response.message.thinking
content = response.message.content
print("think=True Response:")
print("thinking:")
print(thinking)    # ALWAYS NONE
print("content:")
print(content)   # displays the thinking trace

I was hoping to use response.message.thinking in the reasoning trace when streaming responses (as described here), but the attribute, at least for qwen3-0.6b, is always None. Is that the desired behaviour?

ollama                                0.15.4 
ollama-python                   0.6.1
<!-- gh-comment-id:3836329165 --> @jchwenger commented on GitHub (Feb 2, 2026): Hi there, I'm encountering this as well: ```python from ollama import Client client = Client() model = "qwen3:0.6b" messages = [{"role": "user", "content": "What is the capital of France?"}] response = client.chat(model=model, messages=messages) content = response.message.content print("Response:") print(content) # displays the thinking trace print("**" * 20) response = client.chat(model=model, messages=messages, think=True) thinking = response.message.thinking content = response.message.content print("think=True Response:") print("thinking:") print(thinking) # ALWAYS NONE print("content:") print(content) # displays the thinking trace ``` I was hoping to use `response.message.thinking` in the reasoning trace when streaming responses (as described [here](https://docs.ollama.com/capabilities/thinking)), but the attribute, at least for `qwen3-0.6b`, is always `None`. Is that the desired behaviour? ``` ollama 0.15.4 ollama-python 0.6.1 ```
Author
Owner

@jchwenger commented on GitHub (Feb 2, 2026):

It looks like the same behaviour happens with qwen3:1.7b.

<!-- gh-comment-id:3836376058 --> @jchwenger commented on GitHub (Feb 2, 2026): It looks like the same behaviour happens with `qwen3:1.7b`.
Author
Owner

@rick-github commented on GitHub (Feb 2, 2026):

$ python3 11327.py
Response:
The capital of France is **Paris**.
****************************************
think=True Response:
thinking:
Okay, the user is asking for the capital of France. Let me think. France is a country in Europe, right? I remember that the capital is Paris. But wait, is there any chance it could be another city? No, I think Paris is the correct answer. Let me double-check. Oh, right, the capital is indeed Paris. I don't think there's any confusion here. Maybe the user is just confirming the answer. I should just state it clearly and maybe add a bit about the history or significance to make it more informative.

content:
The capital of France is **Paris**.

What's the output of ollama -v?

<!-- gh-comment-id:3836382666 --> @rick-github commented on GitHub (Feb 2, 2026): ```console $ python3 11327.py Response: The capital of France is **Paris**. **************************************** think=True Response: thinking: Okay, the user is asking for the capital of France. Let me think. France is a country in Europe, right? I remember that the capital is Paris. But wait, is there any chance it could be another city? No, I think Paris is the correct answer. Let me double-check. Oh, right, the capital is indeed Paris. I don't think there's any confusion here. Maybe the user is just confirming the answer. I should just state it clearly and maybe add a bit about the history or significance to make it more informative. content: The capital of France is **Paris**. ``` What's the output of `ollama -v`?
Author
Owner

@jchwenger commented on GitHub (Feb 2, 2026):

Aah, @rick-github, sorry, rookie's mistake, my ollama server wasn't up to date. Works now with 0.15.4, thanks!

<!-- gh-comment-id:3836534524 --> @jchwenger commented on GitHub (Feb 2, 2026): Aah, @rick-github, sorry, rookie's mistake, my ollama server wasn't up to date. Works now with `0.15.4`, thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33234