[GH-ISSUE #14493] Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored #35159

New Issue

GiteaMirror · 2026-04-22T19:27:47-05:00

GiteaMirror commented

2026-04-22 19:27:47 -05:00

Originally created by @BigBIueWhale on GitHub (Feb 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14493

What is the issue?

Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored

Qwen 3.5 27B is the first consumer-GPU-capable model to match GPT-5 mini on SWE-bench (72.4), with native multimodal support, 262K context, and agentic tool calling trained across 1 million RL environments. It fits on a single RTX 5090 at Q4_K_M. There is currently no comparable alternative at this size.

Three bugs in Ollama make its agentic capabilities — the model's primary differentiator — completely non-functional. All three are verified against source code (v0.17.1 through master 79917cf) and the HuggingFace ground truth template. None are fixable by end users.

Full source-level analysis with exact file paths, line numbers, and diffs: Inference Report

Bug 1: Repetition penalties are silently ignored

The Go runner's sampler has zero implementation of penalty sampling. repeat_penalty, presence_penalty, and frequency_penalty are accepted by the API without error and silently discarded. The model card explicitly recommends presence_penalty=1.5 to prevent repetition loops during thinking. Setting it via the API has no effect whatsoever.

The C++ runner (llamarunner) implements penalties correctly — but Qwen 3.5 is forced onto the Go runner via OllamaEngineRequired() and cannot use it.

This affects all models on the Go runner, not just Qwen 3.5.

Full evidence: Missing Penalty Sampling

Bug 2: Tool calling uses the completely wrong format

The registry config blob sets renderer: "qwen3.5" / parser: "qwen3.5", which maps to the Qwen 3 Hermes-style JSON tool calling pipeline (Qwen3VLRenderer + Qwen3Parser).

Qwen 3.5 was not trained on this format. It was trained on the Qwen3-Coder XML format (<function=name><parameter=key>value</parameter></function>), as confirmed by the HuggingFace chat template.

The correct pipeline (Qwen3CoderRenderer + Qwen3CoderParser) already exists in the codebase — it's just wired to "qwen3-coder" instead of "qwen3.5".

The system prompt, the format instruction, the tool call rendering in conversation history, and the output parser are all wrong. There are 6 concrete mismatches between what Ollama sends and what the model was trained on.

Full evidence: Tool Calling Format Mismatch

Bug 3: Unclosed `</think>` tag corrupts multi-turn tool calling prompts

When an assistant message has thinking + tool calls but no text content (the standard "think then call a tool" pattern), the renderer never emits </think>. The tool call is rendered inside an unclosed <think> block, corrupting every subsequent turn the model sees.

The parser side of this was fixed in v0.17.3 (d98dda4, PR #14477) — the parser now correctly handles model output that has <tool_call> before </think>. But the renderer side remains broken: multi-turn prompts sent to the model still contain unclosed <think> tags. The tool-call thinking tests in qwen3vl_thinking_test.go (lines 119–323) are still commented out.

Full evidence: Bug 3 details

Status across releases

Bug	v0.17.1	v0.17.2	v0.17.3	v0.17.4	master
Penalty sampling silently ignored	Present	Present	Present	Present	Present
Wrong tool call format	Present	Present	Present	Present	Present
Unclosed `</think>` (renderer)	Present	Present	Present	Present	Present
Unclosed `</think>` (parser)	Present	Present	Fixed	Fixed	Fixed

Environment

Ollama v0.17.1 through master (79917cf, Feb 26, 2026 UTC)
Model: qwen3.5:27b-q4_K_M
Full report with source-level verification: qwen3.5_27b_inference_report.md

Edited (Feb 27, 2026 UTC): Found two additional issues while implementing fixes for the above three bugs in a fork based on v0.17.4 (cc90a035):

Bug 2 expanded — Coder pipeline has zero thinking support: Simply rewiring "qwen3.5" to Qwen3CoderRenderer/Qwen3CoderParser is insufficient. Both have zero thinking support — no <think>/</think> handling in the renderer, no thinking state machine in the parser. Qwen 3.5 requires thinking support for agentic use (the model card recommends enable_thinking=true), so the Coder pipeline needs to be extended with full thinking support before the rewiring is useful.

Bug 4 — Missing generation prompt after tool call turns: When the last message is an assistant message with tool calls, the renderer treats it as a prefill (incomplete turn to be continued) and never emits <|im_end|> or the <|im_start|>assistant\n generation prompt. Root cause is the prefill variable in both qwen3coder.go:148 and qwen3vl.go:82 — it fires for any last assistant message including ones with tool calls. This breaks the entire tool call round-trip loop. Also affects qwen3-vl-instruct and qwen3-vl-thinking. Full evidence.

Updated status table:

Bug	v0.17.1	v0.17.2	v0.17.3	v0.17.4	master
Penalty sampling silently ignored	Present	Present	Present	Present	Present
Wrong tool call format (+ missing thinking support)	Present	Present	Present	Present	Present
Unclosed `</think>` (renderer)	Present	Present	Present	Present	Present
Unclosed `</think>` (parser)	Present	Present	Fixed	Fixed	Fixed
Missing generation prompt after tool calls	Present	Present	Present	Present	Present

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

0.17.4

Originally created by @BigBIueWhale on GitHub (Feb 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14493 ### What is the issue? # Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored Qwen 3.5 27B is the first consumer-GPU-capable model to match GPT-5 mini on SWE-bench (72.4), with native multimodal support, 262K context, and agentic tool calling trained across 1 million RL environments. It fits on a single RTX 5090 at Q4_K_M. There is currently no comparable alternative at this size. Three bugs in Ollama make its agentic capabilities — the model's primary differentiator — completely non-functional. All three are verified against source code (v0.17.1 through master `79917cf`) and the [HuggingFace ground truth template](https://huggingface.co/Qwen/Qwen3.5-27B/blob/main/tokenizer_config.json). None are fixable by end users. Full source-level analysis with exact file paths, line numbers, and diffs: [Inference Report](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#bug-report-summary) --- ## Bug 1: Repetition penalties are silently ignored The Go runner's sampler has **zero implementation of penalty sampling**. `repeat_penalty`, `presence_penalty`, and `frequency_penalty` are accepted by the API without error and silently discarded. The model card explicitly recommends `presence_penalty=1.5` to prevent repetition loops during thinking. Setting it via the API has no effect whatsoever. The C++ runner (`llamarunner`) implements penalties correctly — but Qwen 3.5 is forced onto the Go runner via `OllamaEngineRequired()` and cannot use it. This affects **all models on the Go runner**, not just Qwen 3.5. **Full evidence:** [Missing Penalty Sampling](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#7-ollama-critical-bug-missing-penalty-sampling) --- ## Bug 2: Tool calling uses the completely wrong format The registry config blob sets `renderer: "qwen3.5"` / `parser: "qwen3.5"`, which maps to the **Qwen 3 Hermes-style JSON** tool calling pipeline (`Qwen3VLRenderer` + `Qwen3Parser`). Qwen 3.5 was not trained on this format. It was trained on the **Qwen3-Coder XML** format (`<function=name><parameter=key>value</parameter></function>`), as confirmed by the [HuggingFace chat template](https://huggingface.co/Qwen/Qwen3.5-27B/blob/main/tokenizer_config.json). The correct pipeline (`Qwen3CoderRenderer` + `Qwen3CoderParser`) already exists in the codebase — it's just wired to `"qwen3-coder"` instead of `"qwen3.5"`. The system prompt, the format instruction, the tool call rendering in conversation history, and the output parser are all wrong. There are [6 concrete mismatches](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#prompt-template-diff-huggingface-ground-truth-vs-ollama-renderer-output) between what Ollama sends and what the model was trained on. **Full evidence:** [Tool Calling Format Mismatch](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#8-ollama-critical-bug-tool-calling-format-mismatch) --- ## Bug 3: Unclosed `</think>` tag corrupts multi-turn tool calling prompts When an assistant message has thinking + tool calls but no text content (the standard "think then call a tool" pattern), the renderer never emits `</think>`. The tool call is rendered **inside an unclosed `<think>` block**, corrupting every subsequent turn the model sees. The **parser side** of this was fixed in v0.17.3 ([`d98dda4`](https://github.com/ollama/ollama/commit/d98dda4676d44a3882fd38492cc00db257f35974), PR [#14477](https://github.com/ollama/ollama/pull/14477)) — the parser now correctly handles model output that has `<tool_call>` before `</think>`. But the **renderer side** remains broken: multi-turn prompts sent to the model still contain unclosed `<think>` tags. The tool-call thinking tests in `qwen3vl_thinking_test.go` (lines 119–323) are still commented out. **Full evidence:** [Bug 3 details](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#bug-3-unclosed-think-tag-before-tool-calls-in-multi-turn-conversation-history) --- ## Status across releases | Bug | v0.17.1 | v0.17.2 | v0.17.3 | v0.17.4 | master | |-----|---------|---------|---------|---------|--------| | Penalty sampling silently ignored | Present | Present | Present | Present | **Present** | | Wrong tool call format | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (renderer) | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (parser) | Present | Present | **Fixed** | Fixed | Fixed | ## Environment - Ollama v0.17.1 through master (`79917cf`, Feb 26, 2026 UTC) - Model: `qwen3.5:27b-q4_K_M` - Full report with source-level verification: [qwen3.5_27b_inference_report.md](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md) --- **Edited (Feb 27, 2026 UTC):** Found two additional issues while implementing fixes for the above three bugs in a [fork based on v0.17.4](https://github.com/ollama/ollama/tree/cc90a035) (`cc90a035`): **Bug 2 expanded — Coder pipeline has zero thinking support:** Simply rewiring `"qwen3.5"` to `Qwen3CoderRenderer`/`Qwen3CoderParser` is insufficient. Both have **zero thinking support** — no `<think>`/`</think>` handling in the renderer, no thinking state machine in the parser. Qwen 3.5 requires thinking support for agentic use (the model card recommends `enable_thinking=true`), so the Coder pipeline needs to be extended with full thinking support before the rewiring is useful. **Bug 4 — Missing generation prompt after tool call turns:** When the last message is an assistant message with tool calls, the renderer treats it as a prefill (incomplete turn to be continued) and never emits `<|im_end|>` or the `<|im_start|>assistant\n` generation prompt. Root cause is the `prefill` variable in both [`qwen3coder.go:148`](https://github.com/ollama/ollama/blob/cc90a035/model/renderers/qwen3coder.go#L148) and [`qwen3vl.go:82`](https://github.com/ollama/ollama/blob/cc90a035/model/renderers/qwen3vl.go#L82) — it fires for any last assistant message including ones with tool calls. This breaks the entire tool call round-trip loop. Also affects `qwen3-vl-instruct` and `qwen3-vl-thinking`. [Full evidence](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#95-ollama-critical-bug-missing-generation-prompt-after-tool-call-turns). Updated status table: | Bug | v0.17.1 | v0.17.2 | v0.17.3 | v0.17.4 | master | |-----|---------|---------|---------|---------|--------| | Penalty sampling silently ignored | Present | Present | Present | Present | **Present** | | Wrong tool call format (+ missing thinking support) | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (renderer) | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (parser) | Present | Present | **Fixed** | Fixed | Fixed | | Missing generation prompt after tool calls | Present | Present | Present | Present | **Present** | ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.17.4

GiteaMirror added the bug label 2026-04-22 19:27:47 -05:00

GiteaMirror commented

2026-04-22 19:27:50 -05:00

@rick-github commented on GitHub (Feb 27, 2026):

$ ollama run qwen3.5:27b --experimental --experimental-yolo

This experimental version of Ollama has the bash tool enabled.
Models can read files on your computer, or run commands (after you allow them).

warning: yolo mode - all tool approvals will be skipped
>>> what is the time?
Thinking...
The user is asking about the current time. I don't have access to 
real-time information through the tools available to me. The only tool I 
have is a bash command executor, but I should first check if I can get 
time information through that.

Let me use the bash tool to get the current time.
...done thinking.


running: Bash: date
  Fr 27 Feb 2026 13:22:58 CET
  

Thinking...
I got the time from the system. The current time is Friday, February 27, 
2026, 13:22:58 CET (Central European Time).
...done thinking.

The current time is **Friday, February 27, 2026, 13:22:58 CET** (Central 
European Time).

Would you like me to convert this to a different time zone for you?

@rick-github commented on GitHub (Feb 27, 2026): ```console $ ollama run qwen3.5:27b --experimental --experimental-yolo This experimental version of Ollama has the bash tool enabled. Models can read files on your computer, or run commands (after you allow them). warning: yolo mode - all tool approvals will be skipped >>> what is the time? Thinking... The user is asking about the current time. I don't have access to real-time information through the tools available to me. The only tool I have is a bash command executor, but I should first check if I can get time information through that. Let me use the bash tool to get the current time. ...done thinking. running: Bash: date Fr 27 Feb 2026 13:22:58 CET Thinking... I got the time from the system. The current time is Friday, February 27, 2026, 13:22:58 CET (Central European Time). ...done thinking. The current time is **Friday, February 27, 2026, 13:22:58 CET** (Central European Time). Would you like me to convert this to a different time zone for you? ```

GiteaMirror commented

2026-04-22 19:27:51 -05:00

@BigBIueWhale commented on GitHub (Feb 27, 2026):

@rick-github Hi, Rick. Do you really want to be responsible for this model not reaching its full potential?

The fact is that with the current configuration, this model is receiving prompt formats that it simply has not been trained on. Especially in tool calling use cases.

My full report has all the necessary proof of these very real bugs.

@BigBIueWhale commented on GitHub (Feb 27, 2026): @rick-github Hi, Rick. Do you really want to be responsible for this model not reaching its full potential? The fact is that with the current configuration, this model is receiving prompt formats that it simply has not been trained on. **Especially** in tool calling use cases. My [full report](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#bug-3-unclosed-think-tag-before-tool-calls-in-multi-turn-conversation-history) has all the necessary proof of these very real bugs.

GiteaMirror commented

2026-04-22 19:27:52 -05:00

@rick-github commented on GitHub (Feb 27, 2026):

Do you really want to be responsible for this model not reaching its full potential?

That's not up to me. Your claim is that Tool calling completely non-functional which is demonstrably incorrect. If you would like your bug report to be taken seriously, being accurate would be a good start.

@rick-github commented on GitHub (Feb 27, 2026): > Do you really want to be responsible for this model not reaching its full potential? That's not up to me. Your claim is that `Tool calling completely non-functional` which is demonstrably incorrect. If you would like your bug report to be taken seriously, being accurate would be a good start.

GiteaMirror commented

2026-04-22 19:27:52 -05:00

@Jackallink commented on GitHub (Feb 28, 2026):

Thanks, seems this issue is explained the nature of the issue #14492, though I used qwen3.5:35b series model.
The latest Qwen3.5 models truly have powerful new features which can enhance current agent capabilities.
Hoping a complete compatibility for it.

@Jackallink commented on GitHub (Feb 28, 2026): Thanks, seems this issue is explained the nature of the issue #14492, though I used qwen3.5:35b series model. The latest Qwen3.5 models truly have powerful new features which can enhance current agent capabilities. Hoping a complete compatibility for it.

GiteaMirror commented

2026-04-22 19:27:53 -05:00

@rick-github commented on GitHub (Mar 1, 2026):

https://github.com/ollama/ollama/pull/14537

@rick-github commented on GitHub (Mar 1, 2026): https://github.com/ollama/ollama/pull/14537

GiteaMirror commented

2026-04-22 19:27:54 -05:00

@oleg-kiriyenko-2 commented on GitHub (Mar 4, 2026):

After updating to the latest version with tool calling fix it began to work. But still not working inside thinking tag (I suppose). This is from opencode + qwen 3.5 9B

Run server from this directory
Thinking: The user wants to run a server, but I don't have any information about what kind of project is in this directory. Let me first explore the current directory to understand what kind of project it is and what command needs to be run to start the server.
Let me check what's in this directory to determine how to run the server:
# List files in current directory
$ ls -la
итого 12
drwxrwxr-x   2 oleg oleg 4096 мар  3 21:29 .
drwxrwxr-x 111 oleg oleg 4096 мар  3 20:48 ..
-rw-rw-r--   1 oleg oleg  393 мар  4 20:36 server.js
Thinking: I can see there's a server.js file. Let me read it to understand what needs to be run.
<tool_call>
<function=read>
<parameter=filePath>
/home/oleg/projects/26/server.js
</parameter>
</function>
</tool_call>

Fisrt call to list files in directory is ok. But then tool call to read file was not recognized but was shown as thinkig tag.

@oleg-kiriyenko-2 commented on GitHub (Mar 4, 2026): After updating to the latest version with tool calling fix it began to work. But still not working inside thinking tag (I suppose). This is from opencode + qwen 3.5 9B ``` Run server from this directory Thinking: The user wants to run a server, but I don't have any information about what kind of project is in this directory. Let me first explore the current directory to understand what kind of project it is and what command needs to be run to start the server. Let me check what's in this directory to determine how to run the server: # List files in current directory $ ls -la итого 12 drwxrwxr-x 2 oleg oleg 4096 мар 3 21:29 . drwxrwxr-x 111 oleg oleg 4096 мар 3 20:48 .. -rw-rw-r-- 1 oleg oleg 393 мар 4 20:36 server.js Thinking: I can see there's a server.js file. Let me read it to understand what needs to be run. <tool_call> <function=read> <parameter=filePath> /home/oleg/projects/26/server.js </parameter> </function> </tool_call> ``` Fisrt call to list files in directory is ok. But then tool call to read file was not recognized but was shown as thinkig tag.

GiteaMirror commented

2026-04-22 19:27:55 -05:00

@oggixx commented on GitHub (Mar 8, 2026):

🦀 Coming from OpenClaw (AI agent framework) — can confirm this issue affects production setups.

Our Setup:

OpenClaw v2.0 with tool-calling agents
Qwen 3.5 27B via ollama/qwen3.5:397b-cloud
Tool definitions via /api/chat tools parameter

Observed Behavior:

Tools are defined correctly in the request
Qwen 3.5 acknowledges the tools in initial response
When it's time to actually CALL a tool → silence / repetition / malformed output
Repetition penalties appear to be ignored (model loops endlessly)

Impact:
This makes Qwen 3.5 unusable for agent workflows where tool calling is essential. We've had to fall back to:

Minimax M2.5 (works but less capable)
Kimi K2.5 (has session expiry issues every 5-10 min)

Workaround Attempted:
We tried the fix from #14603 (merged 2026-03-04) but the issue persists in our setup.

Questions for Ollama Team:

Is this a known regression in the latest Qwen 3.5 builds?
Does the fix in #14603 address the tool-calling issue or just the think-tags?
Any ETA on a comprehensive fix for Qwen 3.5 tool calling?

Environment:

Ollama: Latest (self-hosted)
Model: ollama/qwen3.5:397b-cloud
Framework: OpenClaw (https://openclaw.ai)

Thanks for the great work on Ollama! Hope we can get this resolved soon. 🙏

@oggixx commented on GitHub (Mar 8, 2026): 🦀 Coming from OpenClaw (AI agent framework) — can confirm this issue affects production setups. **Our Setup:** - OpenClaw v2.0 with tool-calling agents - Qwen 3.5 27B via `ollama/qwen3.5:397b-cloud` - Tool definitions via `/api/chat` tools parameter **Observed Behavior:** 1. Tools are defined correctly in the request 2. Qwen 3.5 acknowledges the tools in initial response 3. When it's time to actually CALL a tool → silence / repetition / malformed output 4. Repetition penalties appear to be ignored (model loops endlessly) **Impact:** This makes Qwen 3.5 unusable for agent workflows where tool calling is essential. We've had to fall back to: - Minimax M2.5 (works but less capable) - Kimi K2.5 (has session expiry issues every 5-10 min) **Workaround Attempted:** We tried the fix from #14603 (merged 2026-03-04) but the issue persists in our setup. **Questions for Ollama Team:** 1. Is this a known regression in the latest Qwen 3.5 builds? 2. Does the fix in #14603 address the tool-calling issue or just the think-tags? 3. Any ETA on a comprehensive fix for Qwen 3.5 tool calling? **Environment:** - Ollama: Latest (self-hosted) - Model: `ollama/qwen3.5:397b-cloud` - Framework: OpenClaw (https://openclaw.ai) Thanks for the great work on Ollama! Hope we can get this resolved soon. 🙏

GiteaMirror commented

2026-04-22 19:27:56 -05:00

@Yamakuzure commented on GitHub (Mar 9, 2026):

I do not know whether this is relevant to this issue, or whether this is a different one, but whenever I want qwen3.5 (I use the 9b variant, though) try to develop a plan using the OpenCode planning agent, it always ends in something like:

Thinking: Let me examine the key files to understand the current CMakeLists.txt structure in each tier and the main Makefile dependencies.
<tool_call>
<function=read>
<parameter=filePath>
***redacted***
</parameter>
</function>
</tool_call>
<tool_call>
<function=read>
<parameter=filePath>
***redacted***
</parameter>
</function>
</tool_call>
<tool_call>
<function=read>
<parameter=filePath>
***redacted***
</parameter>
</function>
</tool_call>
<tool_call>
<function=bash>
<parameter=command>
***redacted***
</parameter>
<parameter=description>
List t2_connect connector files
</parameter>
</function>
</tool_call>

Ollama then stops, without any further output, and so does the agent.

Edit: I have tried with ollama-0.17.6 which has "fixed qwen 3.5 tool calling" in its release notes.

@Yamakuzure commented on GitHub (Mar 9, 2026): I do not know whether this is relevant to this issue, or whether this is a different one, but whenever I want qwen3.5 (I use the 9b variant, though) try to develop a plan using the OpenCode planning agent, it always ends in something like: ```` Thinking: Let me examine the key files to understand the current CMakeLists.txt structure in each tier and the main Makefile dependencies. <tool_call> <function=read> <parameter=filePath> ***redacted*** </parameter> </function> </tool_call> <tool_call> <function=read> <parameter=filePath> ***redacted*** </parameter> </function> </tool_call> <tool_call> <function=read> <parameter=filePath> ***redacted*** </parameter> </function> </tool_call> <tool_call> <function=bash> <parameter=command> ***redacted*** </parameter> <parameter=description> List t2_connect connector files </parameter> </function> </tool_call> ```` Ollama then stops, without any further output, and so does the agent. Edit: I have tried with ollama-0.17.6 which has "fixed qwen 3.5 tool calling" in its release notes.

GiteaMirror commented

2026-04-22 19:27:57 -05:00

@yadav-prakhar commented on GitHub (Mar 9, 2026):

i've been running 9B model in Qwen Code and Claude Code, served through ollama. I see that ollama can run tool calls in --experimental mode, but is it possible to launch experimental mode when using via claude code or other coding agents?

can something be done to invoke it in experimental mode when wiring it up? @rick-github

@yadav-prakhar commented on GitHub (Mar 9, 2026): i've been running 9B model in Qwen Code and Claude Code, served through ollama. I see that ollama can run tool calls in `--experimental` mode, but is it possible to launch experimental mode when using via claude code or other coding agents? can something be done to invoke it in experimental mode when wiring it up? @rick-github

GiteaMirror commented

2026-04-22 19:27:58 -05:00

@SirNate0 commented on GitHub (Mar 30, 2026):

I think I'm running into the same bug with ollama version 0.18.2. If I run qwen3.5:122b-a10b I don't get any tool calls out, just a bunch of reasoning cut off by the length limit. Switching to qwen3-vl:32b, I am getting tool calls.

Which version was this fixed in?

@SirNate0 commented on GitHub (Mar 30, 2026): I think I'm running into the same bug with ollama version 0.18.2. If I run `qwen3.5:122b-a10b` I don't get any tool calls out, just a bunch of reasoning cut off by the length limit. Switching to `qwen3-vl:32b`, I am getting tool calls. Which version was this fixed in?

GiteaMirror commented

2026-04-22 19:27:58 -05:00

@daper commented on GitHub (Mar 30, 2026):

@SirNate0 Looks like the last referenced pr solves it. So v0.19.0.

@daper commented on GitHub (Mar 30, 2026): @SirNate0 Looks like the last referenced pr solves it. So v0.19.0.

GiteaMirror commented

2026-04-22 19:27:59 -05:00

@yadav-prakhar commented on GitHub (Apr 2, 2026):

At this point of time, someone should take a look at leaked Claude Code and make ollama served llms work with it. If not done, it would be a wasted opportunity.

@yadav-prakhar commented on GitHub (Apr 2, 2026): At this point of time, someone should take a look at leaked Claude Code and make ollama served llms work with it. If not done, it would be a wasted opportunity.

GiteaMirror commented

2026-04-22 19:28:02 -05:00

@philipp-fuchsenberger commented on GitHub (Apr 2, 2026):

Root cause identified: `strings.Contains("qwen3")` in `getParserName()`/`getRendererName()`

I've been hitting this issue while building a multi-turn tool-calling agent and spent some time tracing the root cause.

Problem

In x/create/client/create.go, the functions getParserName() (line ~529) and getRendererName() (line ~580) use:

if strings.Contains(archLower, "qwen3") {
    return "qwen3"
}

This matches all Qwen3 variants — Qwen3, Qwen3-Coder, Qwen3.5, and Qwen3-VL — and assigns them the same parser/renderer pipeline. However, these models use different tool-call formats:

Model	Trained on	Gets assigned
Qwen3	Hermes JSON	`qwen3` parser ✓
Qwen3-Coder	XML (`<tool_call>`)	`qwen3` parser ✗
Qwen3.5	XML (Qwen3-Coder style)	`qwen3` parser ✗

What I observed

Direct API test (single turn): Tool calls work correctly — Ollama's built-in template handles it
Multi-turn conversations (3+ turns): Model falls back to printing XML tool calls as text instead of producing native tool_calls in the API response
After ollama create from safetensors: Wrong pipeline assigned, tool calling broken from the start

Suggested fix

Replace the strings.Contains heuristic with an explicit mapping:

var qwen3Parsers = map[string]string{
    "qwen3":       "qwen3",
    "qwen3coder":  "qwen3-coder",
    "qwen35":      "qwen3.5",  // or delegate to qwen3-coder
}

This would also prevent future regressions when new Qwen3 variants are released.

Note

This primarily affects the ollama create (safetensors import) path. Models pulled via ollama pull use registry manifests that include the correct parser/renderer config.

I'd be happy to submit a PR if this approach aligns with the project's direction.

Tested with: Ollama v0.6.x, qwen3.5:27b, qwen2.5:32b (works correctly), RTX 3090

@philipp-fuchsenberger commented on GitHub (Apr 2, 2026): ## Root cause identified: `strings.Contains("qwen3")` in `getParserName()`/`getRendererName()` I've been hitting this issue while building a multi-turn tool-calling agent and spent some time tracing the root cause. ### Problem In `x/create/client/create.go`, the functions `getParserName()` (line ~529) and `getRendererName()` (line ~580) use: ```go if strings.Contains(archLower, "qwen3") { return "qwen3" } ``` This matches **all** Qwen3 variants — Qwen3, Qwen3-Coder, Qwen3.5, and Qwen3-VL — and assigns them the same parser/renderer pipeline. However, these models use different tool-call formats: | Model | Trained on | Gets assigned | |-------|-----------|---------------| | Qwen3 | Hermes JSON | `qwen3` parser ✓ | | Qwen3-Coder | XML (`<tool_call>`) | `qwen3` parser ✗ | | Qwen3.5 | XML (Qwen3-Coder style) | `qwen3` parser ✗ | ### What I observed - **Direct API test (single turn):** Tool calls work correctly — Ollama's built-in template handles it - **Multi-turn conversations (3+ turns):** Model falls back to printing XML tool calls as text instead of producing native `tool_calls` in the API response - **After `ollama create` from safetensors:** Wrong pipeline assigned, tool calling broken from the start ### Suggested fix Replace the `strings.Contains` heuristic with an explicit mapping: ```go var qwen3Parsers = map[string]string{ "qwen3": "qwen3", "qwen3coder": "qwen3-coder", "qwen35": "qwen3.5", // or delegate to qwen3-coder } ``` This would also prevent future regressions when new Qwen3 variants are released. ### Note This primarily affects the `ollama create` (safetensors import) path. Models pulled via `ollama pull` use registry manifests that include the correct parser/renderer config. I'd be happy to submit a PR if this approach aligns with the project's direction. --- *Tested with: Ollama v0.6.x, qwen3.5:27b, qwen2.5:32b (works correctly), RTX 3090*

GiteaMirror commented

2026-04-22 19:28:03 -05:00

@dhirajlochib commented on GitHub (Apr 2, 2026):

Fix submitted in #15224 — this addresses Bug 2 (wrong parser/renderer mapping for Qwen3 variants).

The root cause is that getParserName() and getRendererName() in x/create/client/create.go use strings.Contains(archLower, "qwen3") which matches all Qwen3 variants indiscriminately. The fix adds specific checks for Qwen3_5*, Qwen3Next*, and Qwen3VL* architectures before the generic qwen3 fallback, so each variant gets its correct parser and renderer.

@dhirajlochib commented on GitHub (Apr 2, 2026): Fix submitted in #15224 — this addresses Bug 2 (wrong parser/renderer mapping for Qwen3 variants). The root cause is that `getParserName()` and `getRendererName()` in `x/create/client/create.go` use `strings.Contains(archLower, "qwen3")` which matches all Qwen3 variants indiscriminately. The fix adds specific checks for `Qwen3_5*`, `Qwen3Next*`, and `Qwen3VL*` architectures before the generic `qwen3` fallback, so each variant gets its correct parser and renderer.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#35159

[GH-ISSUE #14493] Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored #35159

What is the issue?

Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored

Bug 1: Repetition penalties are silently ignored

Bug 2: Tool calling uses the completely wrong format

Bug 3: Unclosed </think> tag corrupts multi-turn tool calling prompts

Status across releases

Environment

Relevant log output

OS

GPU

CPU

Ollama version

Root cause identified: strings.Contains("qwen3") in getParserName()/getRendererName()

Problem

What I observed

Suggested fix

Note

Bug 3: Unclosed `</think>` tag corrupts multi-turn tool calling prompts

Root cause identified: `strings.Contains("qwen3")` in `getParserName()`/`getRendererName()`