[GH-ISSUE #14645] format is ignored when think is disabled for qwen3.5 series #35248

Open
opened 2026-04-22 19:38:05 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @johnnyxwan on GitHub (Mar 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14645

What is the issue?

Format is ignored when think is disabled for qwen3.5 series

I put an example here, and set temperature to 0, so that anyone can try to reproduce.
Ollama version: 0.17.6
Model: qwen3.5:35b-a3b (3460ffeede54)

I believe this can be achieved with 1) a proper output token probability masking, and 2) an empty thinking tag <think>\n\n</think>\n\n in template when thinking is disabled.
https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/chat_template.jinja#L149

It appears to be ollama is expecting the end of thinking token, before it engages the probability masking for formatting. But since the tag is already closed in the template, the model actually never outputs that. As result, the masking is never applied.

Relevant output

[think = True, format = None]
Normal since format is not enabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=True,
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])
Thinking exists? True
===
The sky is blue due to a phenomenon called **Rayleigh scattering**. Here is a simple breakdown of how it works:

**1. Sunlight looks white, but isn't**
...

[think = False, format = None]
Again, normal since format is not enabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])
Thinking exists? False
===
The sky appears blue due to a phenomenon called **Rayleigh scattering**.

Here is how it works:
...

[think = True, format = 'json']
Normal, which proves format alone is working if thinking enabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=True,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])
Thinking exists? True
===
{"answer":"The sky is blue due to a phenomenon called
...

[think = False, format = 'json']
It is not returning json in this case, which shows format is ignored only when thinking is disabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])
Thinking exists? False
===
The sky appears blue due to a phenomenon called **Rayleigh scattering**.

Here is how it works:
...

Ollama version

0.17.6

Originally created by @johnnyxwan on GitHub (Mar 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14645 ### What is the issue? Format is ignored when think is disabled for qwen3.5 series I put an example here, and set temperature to 0, so that anyone can try to reproduce. Ollama version: 0.17.6 Model: qwen3.5:35b-a3b (3460ffeede54) I believe this can be achieved with 1) a proper output token probability masking, and 2) an empty thinking tag `<think>\n\n</think>\n\n` in template when thinking is disabled. https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/chat_template.jinja#L149 It appears to be ollama is expecting the end of thinking token, before it engages the probability masking for formatting. But since the tag is already closed in the template, the model actually never outputs that. As result, the masking is never applied. ### Relevant output [think = True, format = None] Normal since format is not enabled. ```python response = client.chat( model = 'qwen3.5:35b-a3b', messages=[{'role': 'user', 'content': 'why is the sky blue'}], think=True, options={ 'temperature': 0 } ) print('Thinking exists?', 'thinking' in response['message']) print('===') print(response['message']['content']) ``` ```shell Thinking exists? True === The sky is blue due to a phenomenon called **Rayleigh scattering**. Here is a simple breakdown of how it works: **1. Sunlight looks white, but isn't** ... ``` [think = False, format = None] Again, normal since format is not enabled. ```python response = client.chat( model = 'qwen3.5:35b-a3b', messages=[{'role': 'user', 'content': 'why is the sky blue'}], think=False, options={ 'temperature': 0 } ) print('Thinking exists?', 'thinking' in response['message']) print('===') print(response['message']['content']) ``` ```shell Thinking exists? False === The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here is how it works: ... ``` [think = True, format = 'json'] Normal, which proves format alone is working if thinking enabled. ```python response = client.chat( model = 'qwen3.5:35b-a3b', messages=[{'role': 'user', 'content': 'why is the sky blue'}], think=True, format='json', options={ 'temperature': 0 } ) print('Thinking exists?', 'thinking' in response['message']) print('===') print(response['message']['content']) ``` ```shell Thinking exists? True === {"answer":"The sky is blue due to a phenomenon called ... ``` [think = False, format = 'json'] It is not returning json in this case, which shows format is ignored only when thinking is disabled. ```python response = client.chat( model = 'qwen3.5:35b-a3b', messages=[{'role': 'user', 'content': 'why is the sky blue'}], think=False, format='json', options={ 'temperature': 0 } ) print('Thinking exists?', 'thinking' in response['message']) print('===') print(response['message']['content']) ``` ```shell Thinking exists? False === The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here is how it works: ... ``` ### Ollama version 0.17.6
GiteaMirror added the bug label 2026-04-22 19:38:05 -05:00
Author
Owner

@johnnyxwan commented on GitHub (Mar 6, 2026):

@majiayu000 thank you, this exactly solves this problem, looking forward to see it being merged.

<!-- gh-comment-id:4009962734 --> @johnnyxwan commented on GitHub (Mar 6, 2026): @majiayu000 thank you, this exactly solves this problem, looking forward to see it being merged.
Author
Owner

@majiayu000 commented on GitHub (Mar 6, 2026):

Happy to help

<!-- gh-comment-id:4010297378 --> @majiayu000 commented on GitHub (Mar 6, 2026): Happy to help
Author
Owner

@arnoudius commented on GitHub (Mar 7, 2026):

Nice, experiencing the same issue

<!-- gh-comment-id:4016810687 --> @arnoudius commented on GitHub (Mar 7, 2026): Nice, experiencing the same issue
Author
Owner

@BigArty commented on GitHub (Mar 18, 2026):

Why this fix is not included in 1.18, 1.18.1 and 1.18.2-rc? It seem like it was available 2 weeks ago already.

<!-- gh-comment-id:4083199036 --> @BigArty commented on GitHub (Mar 18, 2026): Why this fix is not included in 1.18, 1.18.1 and 1.18.2-rc? It seem like it was available 2 weeks ago already.
Author
Owner

@johnnyxwan commented on GitHub (Mar 19, 2026):

Just ran some tests, the problem persists as expected in v0.18.2.

<!-- gh-comment-id:4091552224 --> @johnnyxwan commented on GitHub (Mar 19, 2026): Just ran some tests, the problem persists as expected in v0.18.2.
Author
Owner

@johnnyxwan commented on GitHub (Mar 31, 2026):

Confirmed that the problem persists as expected in v0.19.0.

<!-- gh-comment-id:4159672163 --> @johnnyxwan commented on GitHub (Mar 31, 2026): Confirmed that the problem persists as expected in v0.19.0.
Author
Owner

@BigArty commented on GitHub (Apr 7, 2026):

Still does not work in v0.20.2

<!-- gh-comment-id:4197640167 --> @BigArty commented on GitHub (Apr 7, 2026): Still does not work in v0.20.2
Author
Owner

@BigArty commented on GitHub (Apr 10, 2026):

Moreover on 0.20.2 this combination of arguments ignores format completely even with think=True
response = client_llm.chat(model="qwen3.5:35b",
messages=messages,
context_length= 40000,
top_p = 0.95,
top_k = 20,
temperature = 0.7,
# repeat_penalty=1.5,
max_tokens=10000,
think=True,
stream=True,
format=ProjectOverview,
tools=tool_list,
)

It also seems that tools sometimes are not called correctly - I see the <> tags in the output. So the format is also not working for them.

<!-- gh-comment-id:4222350638 --> @BigArty commented on GitHub (Apr 10, 2026): Moreover on 0.20.2 this combination of arguments ignores format completely even with think=True response = client_llm.chat(model="qwen3.5:35b", messages=messages, context_length= 40000, top_p = 0.95, top_k = 20, temperature = 0.7, # repeat_penalty=1.5, max_tokens=10000, think=True, stream=True, format=ProjectOverview, tools=tool_list, ) It also seems that tools sometimes are not called correctly - I see the <> tags in the output. So the format is also not working for them.
Author
Owner

@BigArty commented on GitHub (Apr 10, 2026):

Just confirmed that all problems are still present on v0.20.5

<!-- gh-comment-id:4223442355 --> @BigArty commented on GitHub (Apr 10, 2026): Just confirmed that all problems are still present on v0.20.5
Author
Owner

@johnnyxwan commented on GitHub (Apr 18, 2026):

qwen3.6 is here, and this issue persists for qwen3.5, qwen3.6 and gemma4 in v0.21.0

<!-- gh-comment-id:4273110395 --> @johnnyxwan commented on GitHub (Apr 18, 2026): qwen3.6 is here, and this issue persists for qwen3.5, qwen3.6 and gemma4 in v0.21.0
Author
Owner

@Orbiter commented on GitHub (Apr 18, 2026):

qwen3.6 is here, and this issue persists for qwen3.5, qwen3.6 and gemma4 in v0.21.0

yes, I measure the same; I have details about format testing in this benchmark file: https://github.com/Orbiter/project-euler-llm-benchmark/blob/main/benchmark.json

This shows that there must be a direct connection to the thinking ability of the model, because the frob-models:

  • frob/qwen3.5-instruct:35b
  • frob/qwen3.5-instruct:122b
  • frob/qwen3.5-instruct:27b
  • frob/qwen3.5-instruct:9b

.. they all are format-enabled. So it is not a problem that is connected to qwen3.5, only the act of disabling thinking during the API access. Btw: Testing of the models was done using the OpenAI-API.

However, this thinking model, also disabled on the API, also works, which is a bit confusing:

  • hf.co/mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q4_K_M-no_think
<!-- gh-comment-id:4273375809 --> @Orbiter commented on GitHub (Apr 18, 2026): > qwen3.6 is here, and this issue persists for qwen3.5, qwen3.6 and gemma4 in v0.21.0 yes, I measure the same; I have details about format testing in this benchmark file: https://github.com/Orbiter/project-euler-llm-benchmark/blob/main/benchmark.json This shows that there must be a direct connection to the thinking ability of the model, because the frob-models: - frob/qwen3.5-instruct:35b - frob/qwen3.5-instruct:122b - frob/qwen3.5-instruct:27b - frob/qwen3.5-instruct:9b .. they all are format-enabled. So it is not a problem that is connected to qwen3.5, only the act of disabling thinking during the API access. Btw: Testing of the models was done using the OpenAI-API. However, this thinking model, also disabled on the API, also works, which is a bit confusing: - hf.co/mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q4_K_M-no_think
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35248