[GH-ISSUE #14493] Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored #35159

Open
opened 2026-04-22 19:27:47 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @BigBIueWhale on GitHub (Feb 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14493

What is the issue?

Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored

Qwen 3.5 27B is the first consumer-GPU-capable model to match GPT-5 mini on SWE-bench (72.4), with native multimodal support, 262K context, and agentic tool calling trained across 1 million RL environments. It fits on a single RTX 5090 at Q4_K_M. There is currently no comparable alternative at this size.

Three bugs in Ollama make its agentic capabilities — the model's primary differentiator — completely non-functional. All three are verified against source code (v0.17.1 through master 79917cf) and the HuggingFace ground truth template. None are fixable by end users.

Full source-level analysis with exact file paths, line numbers, and diffs: Inference Report


Bug 1: Repetition penalties are silently ignored

The Go runner's sampler has zero implementation of penalty sampling. repeat_penalty, presence_penalty, and frequency_penalty are accepted by the API without error and silently discarded. The model card explicitly recommends presence_penalty=1.5 to prevent repetition loops during thinking. Setting it via the API has no effect whatsoever.

The C++ runner (llamarunner) implements penalties correctly — but Qwen 3.5 is forced onto the Go runner via OllamaEngineRequired() and cannot use it.

This affects all models on the Go runner, not just Qwen 3.5.

Full evidence: Missing Penalty Sampling


Bug 2: Tool calling uses the completely wrong format

The registry config blob sets renderer: "qwen3.5" / parser: "qwen3.5", which maps to the Qwen 3 Hermes-style JSON tool calling pipeline (Qwen3VLRenderer + Qwen3Parser).

Qwen 3.5 was not trained on this format. It was trained on the Qwen3-Coder XML format (<function=name><parameter=key>value</parameter></function>), as confirmed by the HuggingFace chat template.

The correct pipeline (Qwen3CoderRenderer + Qwen3CoderParser) already exists in the codebase — it's just wired to "qwen3-coder" instead of "qwen3.5".

The system prompt, the format instruction, the tool call rendering in conversation history, and the output parser are all wrong. There are 6 concrete mismatches between what Ollama sends and what the model was trained on.

Full evidence: Tool Calling Format Mismatch


Bug 3: Unclosed </think> tag corrupts multi-turn tool calling prompts

When an assistant message has thinking + tool calls but no text content (the standard "think then call a tool" pattern), the renderer never emits </think>. The tool call is rendered inside an unclosed <think> block, corrupting every subsequent turn the model sees.

The parser side of this was fixed in v0.17.3 (d98dda4, PR #14477) — the parser now correctly handles model output that has <tool_call> before </think>. But the renderer side remains broken: multi-turn prompts sent to the model still contain unclosed <think> tags. The tool-call thinking tests in qwen3vl_thinking_test.go (lines 119–323) are still commented out.

Full evidence: Bug 3 details


Status across releases

Bug v0.17.1 v0.17.2 v0.17.3 v0.17.4 master
Penalty sampling silently ignored Present Present Present Present Present
Wrong tool call format Present Present Present Present Present
Unclosed </think> (renderer) Present Present Present Present Present
Unclosed </think> (parser) Present Present Fixed Fixed Fixed

Environment

  • Ollama v0.17.1 through master (79917cf, Feb 26, 2026 UTC)
  • Model: qwen3.5:27b-q4_K_M
  • Full report with source-level verification: qwen3.5_27b_inference_report.md

Edited (Feb 27, 2026 UTC): Found two additional issues while implementing fixes for the above three bugs in a fork based on v0.17.4 (cc90a035):

Bug 2 expanded — Coder pipeline has zero thinking support: Simply rewiring "qwen3.5" to Qwen3CoderRenderer/Qwen3CoderParser is insufficient. Both have zero thinking support — no <think>/</think> handling in the renderer, no thinking state machine in the parser. Qwen 3.5 requires thinking support for agentic use (the model card recommends enable_thinking=true), so the Coder pipeline needs to be extended with full thinking support before the rewiring is useful.

Bug 4 — Missing generation prompt after tool call turns: When the last message is an assistant message with tool calls, the renderer treats it as a prefill (incomplete turn to be continued) and never emits <|im_end|> or the <|im_start|>assistant\n generation prompt. Root cause is the prefill variable in both qwen3coder.go:148 and qwen3vl.go:82 — it fires for any last assistant message including ones with tool calls. This breaks the entire tool call round-trip loop. Also affects qwen3-vl-instruct and qwen3-vl-thinking. Full evidence.

Updated status table:

Bug v0.17.1 v0.17.2 v0.17.3 v0.17.4 master
Penalty sampling silently ignored Present Present Present Present Present
Wrong tool call format (+ missing thinking support) Present Present Present Present Present
Unclosed </think> (renderer) Present Present Present Present Present
Unclosed </think> (parser) Present Present Fixed Fixed Fixed
Missing generation prompt after tool calls Present Present Present Present Present

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

0.17.4

Originally created by @BigBIueWhale on GitHub (Feb 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14493 ### What is the issue? # Qwen 3.5 27B: Tool calling completely non-functional and repetition penalties silently ignored Qwen 3.5 27B is the first consumer-GPU-capable model to match GPT-5 mini on SWE-bench (72.4), with native multimodal support, 262K context, and agentic tool calling trained across 1 million RL environments. It fits on a single RTX 5090 at Q4_K_M. There is currently no comparable alternative at this size. Three bugs in Ollama make its agentic capabilities — the model's primary differentiator — completely non-functional. All three are verified against source code (v0.17.1 through master `79917cf`) and the [HuggingFace ground truth template](https://huggingface.co/Qwen/Qwen3.5-27B/blob/main/tokenizer_config.json). None are fixable by end users. Full source-level analysis with exact file paths, line numbers, and diffs: [Inference Report](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#bug-report-summary) --- ## Bug 1: Repetition penalties are silently ignored The Go runner's sampler has **zero implementation of penalty sampling**. `repeat_penalty`, `presence_penalty`, and `frequency_penalty` are accepted by the API without error and silently discarded. The model card explicitly recommends `presence_penalty=1.5` to prevent repetition loops during thinking. Setting it via the API has no effect whatsoever. The C++ runner (`llamarunner`) implements penalties correctly — but Qwen 3.5 is forced onto the Go runner via `OllamaEngineRequired()` and cannot use it. This affects **all models on the Go runner**, not just Qwen 3.5. **Full evidence:** [Missing Penalty Sampling](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#7-ollama-critical-bug-missing-penalty-sampling) --- ## Bug 2: Tool calling uses the completely wrong format The registry config blob sets `renderer: "qwen3.5"` / `parser: "qwen3.5"`, which maps to the **Qwen 3 Hermes-style JSON** tool calling pipeline (`Qwen3VLRenderer` + `Qwen3Parser`). Qwen 3.5 was not trained on this format. It was trained on the **Qwen3-Coder XML** format (`<function=name><parameter=key>value</parameter></function>`), as confirmed by the [HuggingFace chat template](https://huggingface.co/Qwen/Qwen3.5-27B/blob/main/tokenizer_config.json). The correct pipeline (`Qwen3CoderRenderer` + `Qwen3CoderParser`) already exists in the codebase — it's just wired to `"qwen3-coder"` instead of `"qwen3.5"`. The system prompt, the format instruction, the tool call rendering in conversation history, and the output parser are all wrong. There are [6 concrete mismatches](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#prompt-template-diff-huggingface-ground-truth-vs-ollama-renderer-output) between what Ollama sends and what the model was trained on. **Full evidence:** [Tool Calling Format Mismatch](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#8-ollama-critical-bug-tool-calling-format-mismatch) --- ## Bug 3: Unclosed `</think>` tag corrupts multi-turn tool calling prompts When an assistant message has thinking + tool calls but no text content (the standard "think then call a tool" pattern), the renderer never emits `</think>`. The tool call is rendered **inside an unclosed `<think>` block**, corrupting every subsequent turn the model sees. The **parser side** of this was fixed in v0.17.3 ([`d98dda4`](https://github.com/ollama/ollama/commit/d98dda4676d44a3882fd38492cc00db257f35974), PR [#14477](https://github.com/ollama/ollama/pull/14477)) — the parser now correctly handles model output that has `<tool_call>` before `</think>`. But the **renderer side** remains broken: multi-turn prompts sent to the model still contain unclosed `<think>` tags. The tool-call thinking tests in `qwen3vl_thinking_test.go` (lines 119–323) are still commented out. **Full evidence:** [Bug 3 details](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#bug-3-unclosed-think-tag-before-tool-calls-in-multi-turn-conversation-history) --- ## Status across releases | Bug | v0.17.1 | v0.17.2 | v0.17.3 | v0.17.4 | master | |-----|---------|---------|---------|---------|--------| | Penalty sampling silently ignored | Present | Present | Present | Present | **Present** | | Wrong tool call format | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (renderer) | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (parser) | Present | Present | **Fixed** | Fixed | Fixed | ## Environment - Ollama v0.17.1 through master (`79917cf`, Feb 26, 2026 UTC) - Model: `qwen3.5:27b-q4_K_M` - Full report with source-level verification: [qwen3.5_27b_inference_report.md](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md) --- **Edited (Feb 27, 2026 UTC):** Found two additional issues while implementing fixes for the above three bugs in a [fork based on v0.17.4](https://github.com/ollama/ollama/tree/cc90a035) (`cc90a035`): **Bug 2 expanded — Coder pipeline has zero thinking support:** Simply rewiring `"qwen3.5"` to `Qwen3CoderRenderer`/`Qwen3CoderParser` is insufficient. Both have **zero thinking support** — no `<think>`/`</think>` handling in the renderer, no thinking state machine in the parser. Qwen 3.5 requires thinking support for agentic use (the model card recommends `enable_thinking=true`), so the Coder pipeline needs to be extended with full thinking support before the rewiring is useful. **Bug 4 — Missing generation prompt after tool call turns:** When the last message is an assistant message with tool calls, the renderer treats it as a prefill (incomplete turn to be continued) and never emits `<|im_end|>` or the `<|im_start|>assistant\n` generation prompt. Root cause is the `prefill` variable in both [`qwen3coder.go:148`](https://github.com/ollama/ollama/blob/cc90a035/model/renderers/qwen3coder.go#L148) and [`qwen3vl.go:82`](https://github.com/ollama/ollama/blob/cc90a035/model/renderers/qwen3vl.go#L82) — it fires for any last assistant message including ones with tool calls. This breaks the entire tool call round-trip loop. Also affects `qwen3-vl-instruct` and `qwen3-vl-thinking`. [Full evidence](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#95-ollama-critical-bug-missing-generation-prompt-after-tool-call-turns). Updated status table: | Bug | v0.17.1 | v0.17.2 | v0.17.3 | v0.17.4 | master | |-----|---------|---------|---------|---------|--------| | Penalty sampling silently ignored | Present | Present | Present | Present | **Present** | | Wrong tool call format (+ missing thinking support) | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (renderer) | Present | Present | Present | Present | **Present** | | Unclosed `</think>` (parser) | Present | Present | **Fixed** | Fixed | Fixed | | Missing generation prompt after tool calls | Present | Present | Present | Present | **Present** | ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.17.4
GiteaMirror added the bug label 2026-04-22 19:27:47 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

$ ollama run qwen3.5:27b --experimental --experimental-yolo

This experimental version of Ollama has the bash tool enabled.
Models can read files on your computer, or run commands (after you allow them).

warning: yolo mode - all tool approvals will be skipped
>>> what is the time?
Thinking...
The user is asking about the current time. I don't have access to 
real-time information through the tools available to me. The only tool I 
have is a bash command executor, but I should first check if I can get 
time information through that.

Let me use the bash tool to get the current time.
...done thinking.


running: Bash: date
  Fr 27 Feb 2026 13:22:58 CET
  

Thinking...
I got the time from the system. The current time is Friday, February 27, 
2026, 13:22:58 CET (Central European Time).
...done thinking.

The current time is **Friday, February 27, 2026, 13:22:58 CET** (Central 
European Time).

Would you like me to convert this to a different time zone for you?
<!-- gh-comment-id:3972685672 --> @rick-github commented on GitHub (Feb 27, 2026): ```console $ ollama run qwen3.5:27b --experimental --experimental-yolo This experimental version of Ollama has the bash tool enabled. Models can read files on your computer, or run commands (after you allow them). warning: yolo mode - all tool approvals will be skipped >>> what is the time? Thinking... The user is asking about the current time. I don't have access to real-time information through the tools available to me. The only tool I have is a bash command executor, but I should first check if I can get time information through that. Let me use the bash tool to get the current time. ...done thinking. running: Bash: date Fr 27 Feb 2026 13:22:58 CET Thinking... I got the time from the system. The current time is Friday, February 27, 2026, 13:22:58 CET (Central European Time). ...done thinking. The current time is **Friday, February 27, 2026, 13:22:58 CET** (Central European Time). Would you like me to convert this to a different time zone for you? ```
Author
Owner

@BigBIueWhale commented on GitHub (Feb 27, 2026):

@rick-github Hi, Rick. Do you really want to be responsible for this model not reaching its full potential?

The fact is that with the current configuration, this model is receiving prompt formats that it simply has not been trained on. Especially in tool calling use cases.

My full report has all the necessary proof of these very real bugs.

<!-- gh-comment-id:3973838751 --> @BigBIueWhale commented on GitHub (Feb 27, 2026): @rick-github Hi, Rick. Do you really want to be responsible for this model not reaching its full potential? The fact is that with the current configuration, this model is receiving prompt formats that it simply has not been trained on. **Especially** in tool calling use cases. My [full report](https://github.com/BigBIueWhale/qwen3_5_27b_research/blob/master/qwen3.5_27b_inference_report.md#bug-3-unclosed-think-tag-before-tool-calls-in-multi-turn-conversation-history) has all the necessary proof of these very real bugs.
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

Do you really want to be responsible for this model not reaching its full potential?

That's not up to me. Your claim is that Tool calling completely non-functional which is demonstrably incorrect. If you would like your bug report to be taken seriously, being accurate would be a good start.

<!-- gh-comment-id:3973854808 --> @rick-github commented on GitHub (Feb 27, 2026): > Do you really want to be responsible for this model not reaching its full potential? That's not up to me. Your claim is that `Tool calling completely non-functional` which is demonstrably incorrect. If you would like your bug report to be taken seriously, being accurate would be a good start.
Author
Owner

@Jackallink commented on GitHub (Feb 28, 2026):

Thanks, seems this issue is explained the nature of the issue #14492, though I used qwen3.5:35b series model.
The latest Qwen3.5 models truly have powerful new features which can enhance current agent capabilities.
Hoping a complete compatibility for it.

<!-- gh-comment-id:3976403514 --> @Jackallink commented on GitHub (Feb 28, 2026): Thanks, seems this issue is explained the nature of the issue #14492, though I used qwen3.5:35b series model. The latest Qwen3.5 models truly have powerful new features which can enhance current agent capabilities. Hoping a complete compatibility for it.
Author
Owner

@rick-github commented on GitHub (Mar 1, 2026):

https://github.com/ollama/ollama/pull/14537

<!-- gh-comment-id:3981237453 --> @rick-github commented on GitHub (Mar 1, 2026): https://github.com/ollama/ollama/pull/14537
Author
Owner

@oleg-kiriyenko-2 commented on GitHub (Mar 4, 2026):

After updating to the latest version with tool calling fix it began to work. But still not working inside thinking tag (I suppose). This is from opencode + qwen 3.5 9B

Run server from this directory
Thinking: The user wants to run a server, but I don't have any information about what kind of project is in this directory. Let me first explore the current directory to understand what kind of project it is and what command needs to be run to start the server.
Let me check what's in this directory to determine how to run the server:
# List files in current directory
$ ls -la
итого 12
drwxrwxr-x   2 oleg oleg 4096 мар  3 21:29 .
drwxrwxr-x 111 oleg oleg 4096 мар  3 20:48 ..
-rw-rw-r--   1 oleg oleg  393 мар  4 20:36 server.js
Thinking: I can see there's a server.js file. Let me read it to understand what needs to be run.
<tool_call>
<function=read>
<parameter=filePath>
/home/oleg/projects/26/server.js
</parameter>
</function>
</tool_call>

Fisrt call to list files in directory is ok. But then tool call to read file was not recognized but was shown as thinkig tag.

<!-- gh-comment-id:3999151158 --> @oleg-kiriyenko-2 commented on GitHub (Mar 4, 2026): After updating to the latest version with tool calling fix it began to work. But still not working inside thinking tag (I suppose). This is from opencode + qwen 3.5 9B ``` Run server from this directory Thinking: The user wants to run a server, but I don't have any information about what kind of project is in this directory. Let me first explore the current directory to understand what kind of project it is and what command needs to be run to start the server. Let me check what's in this directory to determine how to run the server: # List files in current directory $ ls -la итого 12 drwxrwxr-x 2 oleg oleg 4096 мар 3 21:29 . drwxrwxr-x 111 oleg oleg 4096 мар 3 20:48 .. -rw-rw-r-- 1 oleg oleg 393 мар 4 20:36 server.js Thinking: I can see there's a server.js file. Let me read it to understand what needs to be run. <tool_call> <function=read> <parameter=filePath> /home/oleg/projects/26/server.js </parameter> </function> </tool_call> ``` Fisrt call to list files in directory is ok. But then tool call to read file was not recognized but was shown as thinkig tag.
Author
Owner

@oggixx commented on GitHub (Mar 8, 2026):

🦀 Coming from OpenClaw (AI agent framework) — can confirm this issue affects production setups.

Our Setup:

  • OpenClaw v2.0 with tool-calling agents
  • Qwen 3.5 27B via ollama/qwen3.5:397b-cloud
  • Tool definitions via /api/chat tools parameter

Observed Behavior:

  1. Tools are defined correctly in the request
  2. Qwen 3.5 acknowledges the tools in initial response
  3. When it's time to actually CALL a tool → silence / repetition / malformed output
  4. Repetition penalties appear to be ignored (model loops endlessly)

Impact:
This makes Qwen 3.5 unusable for agent workflows where tool calling is essential. We've had to fall back to:

  • Minimax M2.5 (works but less capable)
  • Kimi K2.5 (has session expiry issues every 5-10 min)

Workaround Attempted:
We tried the fix from #14603 (merged 2026-03-04) but the issue persists in our setup.

Questions for Ollama Team:

  1. Is this a known regression in the latest Qwen 3.5 builds?
  2. Does the fix in #14603 address the tool-calling issue or just the think-tags?
  3. Any ETA on a comprehensive fix for Qwen 3.5 tool calling?

Environment:

  • Ollama: Latest (self-hosted)
  • Model: ollama/qwen3.5:397b-cloud
  • Framework: OpenClaw (https://openclaw.ai)

Thanks for the great work on Ollama! Hope we can get this resolved soon. 🙏

<!-- gh-comment-id:4019065500 --> @oggixx commented on GitHub (Mar 8, 2026): 🦀 Coming from OpenClaw (AI agent framework) — can confirm this issue affects production setups. **Our Setup:** - OpenClaw v2.0 with tool-calling agents - Qwen 3.5 27B via `ollama/qwen3.5:397b-cloud` - Tool definitions via `/api/chat` tools parameter **Observed Behavior:** 1. Tools are defined correctly in the request 2. Qwen 3.5 acknowledges the tools in initial response 3. When it's time to actually CALL a tool → silence / repetition / malformed output 4. Repetition penalties appear to be ignored (model loops endlessly) **Impact:** This makes Qwen 3.5 unusable for agent workflows where tool calling is essential. We've had to fall back to: - Minimax M2.5 (works but less capable) - Kimi K2.5 (has session expiry issues every 5-10 min) **Workaround Attempted:** We tried the fix from #14603 (merged 2026-03-04) but the issue persists in our setup. **Questions for Ollama Team:** 1. Is this a known regression in the latest Qwen 3.5 builds? 2. Does the fix in #14603 address the tool-calling issue or just the think-tags? 3. Any ETA on a comprehensive fix for Qwen 3.5 tool calling? **Environment:** - Ollama: Latest (self-hosted) - Model: `ollama/qwen3.5:397b-cloud` - Framework: OpenClaw (https://openclaw.ai) Thanks for the great work on Ollama! Hope we can get this resolved soon. 🙏
Author
Owner

@Yamakuzure commented on GitHub (Mar 9, 2026):

I do not know whether this is relevant to this issue, or whether this is a different one, but whenever I want qwen3.5 (I use the 9b variant, though) try to develop a plan using the OpenCode planning agent, it always ends in something like:

Thinking: Let me examine the key files to understand the current CMakeLists.txt structure in each tier and the main Makefile dependencies.
<tool_call>
<function=read>
<parameter=filePath>
***redacted***
</parameter>
</function>
</tool_call>
<tool_call>
<function=read>
<parameter=filePath>
***redacted***
</parameter>
</function>
</tool_call>
<tool_call>
<function=read>
<parameter=filePath>
***redacted***
</parameter>
</function>
</tool_call>
<tool_call>
<function=bash>
<parameter=command>
***redacted***
</parameter>
<parameter=description>
List t2_connect connector files
</parameter>
</function>
</tool_call>

Ollama then stops, without any further output, and so does the agent.

Edit: I have tried with ollama-0.17.6 which has "fixed qwen 3.5 tool calling" in its release notes.

<!-- gh-comment-id:4022174170 --> @Yamakuzure commented on GitHub (Mar 9, 2026): I do not know whether this is relevant to this issue, or whether this is a different one, but whenever I want qwen3.5 (I use the 9b variant, though) try to develop a plan using the OpenCode planning agent, it always ends in something like: ```` Thinking: Let me examine the key files to understand the current CMakeLists.txt structure in each tier and the main Makefile dependencies. <tool_call> <function=read> <parameter=filePath> ***redacted*** </parameter> </function> </tool_call> <tool_call> <function=read> <parameter=filePath> ***redacted*** </parameter> </function> </tool_call> <tool_call> <function=read> <parameter=filePath> ***redacted*** </parameter> </function> </tool_call> <tool_call> <function=bash> <parameter=command> ***redacted*** </parameter> <parameter=description> List t2_connect connector files </parameter> </function> </tool_call> ```` Ollama then stops, without any further output, and so does the agent. Edit: I have tried with ollama-0.17.6 which has "fixed qwen 3.5 tool calling" in its release notes.
Author
Owner

@yadav-prakhar commented on GitHub (Mar 9, 2026):

i've been running 9B model in Qwen Code and Claude Code, served through ollama. I see that ollama can run tool calls in --experimental mode, but is it possible to launch experimental mode when using via claude code or other coding agents?

can something be done to invoke it in experimental mode when wiring it up? @rick-github

<!-- gh-comment-id:4026962588 --> @yadav-prakhar commented on GitHub (Mar 9, 2026): i've been running 9B model in Qwen Code and Claude Code, served through ollama. I see that ollama can run tool calls in `--experimental` mode, but is it possible to launch experimental mode when using via claude code or other coding agents? can something be done to invoke it in experimental mode when wiring it up? @rick-github
Author
Owner

@SirNate0 commented on GitHub (Mar 30, 2026):

I think I'm running into the same bug with ollama version 0.18.2. If I run qwen3.5:122b-a10b I don't get any tool calls out, just a bunch of reasoning cut off by the length limit. Switching to qwen3-vl:32b, I am getting tool calls.

Which version was this fixed in?

<!-- gh-comment-id:4156484693 --> @SirNate0 commented on GitHub (Mar 30, 2026): I think I'm running into the same bug with ollama version 0.18.2. If I run `qwen3.5:122b-a10b` I don't get any tool calls out, just a bunch of reasoning cut off by the length limit. Switching to `qwen3-vl:32b`, I am getting tool calls. Which version was this fixed in?
Author
Owner

@daper commented on GitHub (Mar 30, 2026):

@SirNate0 Looks like the last referenced pr solves it. So v0.19.0.

<!-- gh-comment-id:4157010184 --> @daper commented on GitHub (Mar 30, 2026): @SirNate0 Looks like the last referenced pr solves it. So v0.19.0.
Author
Owner

@yadav-prakhar commented on GitHub (Apr 2, 2026):

At this point of time, someone should take a look at leaked Claude Code and make ollama served llms work with it. If not done, it would be a wasted opportunity.

<!-- gh-comment-id:4174797232 --> @yadav-prakhar commented on GitHub (Apr 2, 2026): At this point of time, someone should take a look at leaked Claude Code and make ollama served llms work with it. If not done, it would be a wasted opportunity.
Author
Owner

@philipp-fuchsenberger commented on GitHub (Apr 2, 2026):

Root cause identified: strings.Contains("qwen3") in getParserName()/getRendererName()

I've been hitting this issue while building a multi-turn tool-calling agent and spent some time tracing the root cause.

Problem

In x/create/client/create.go, the functions getParserName() (line ~529) and getRendererName() (line ~580) use:

if strings.Contains(archLower, "qwen3") {
    return "qwen3"
}

This matches all Qwen3 variants — Qwen3, Qwen3-Coder, Qwen3.5, and Qwen3-VL — and assigns them the same parser/renderer pipeline. However, these models use different tool-call formats:

Model Trained on Gets assigned
Qwen3 Hermes JSON qwen3 parser ✓
Qwen3-Coder XML (<tool_call>) qwen3 parser ✗
Qwen3.5 XML (Qwen3-Coder style) qwen3 parser ✗

What I observed

  • Direct API test (single turn): Tool calls work correctly — Ollama's built-in template handles it
  • Multi-turn conversations (3+ turns): Model falls back to printing XML tool calls as text instead of producing native tool_calls in the API response
  • After ollama create from safetensors: Wrong pipeline assigned, tool calling broken from the start

Suggested fix

Replace the strings.Contains heuristic with an explicit mapping:

var qwen3Parsers = map[string]string{
    "qwen3":       "qwen3",
    "qwen3coder":  "qwen3-coder",
    "qwen35":      "qwen3.5",  // or delegate to qwen3-coder
}

This would also prevent future regressions when new Qwen3 variants are released.

Note

This primarily affects the ollama create (safetensors import) path. Models pulled via ollama pull use registry manifests that include the correct parser/renderer config.

I'd be happy to submit a PR if this approach aligns with the project's direction.


Tested with: Ollama v0.6.x, qwen3.5:27b, qwen2.5:32b (works correctly), RTX 3090

<!-- gh-comment-id:4174871372 --> @philipp-fuchsenberger commented on GitHub (Apr 2, 2026): ## Root cause identified: `strings.Contains("qwen3")` in `getParserName()`/`getRendererName()` I've been hitting this issue while building a multi-turn tool-calling agent and spent some time tracing the root cause. ### Problem In `x/create/client/create.go`, the functions `getParserName()` (line ~529) and `getRendererName()` (line ~580) use: ```go if strings.Contains(archLower, "qwen3") { return "qwen3" } ``` This matches **all** Qwen3 variants — Qwen3, Qwen3-Coder, Qwen3.5, and Qwen3-VL — and assigns them the same parser/renderer pipeline. However, these models use different tool-call formats: | Model | Trained on | Gets assigned | |-------|-----------|---------------| | Qwen3 | Hermes JSON | `qwen3` parser ✓ | | Qwen3-Coder | XML (`<tool_call>`) | `qwen3` parser ✗ | | Qwen3.5 | XML (Qwen3-Coder style) | `qwen3` parser ✗ | ### What I observed - **Direct API test (single turn):** Tool calls work correctly — Ollama's built-in template handles it - **Multi-turn conversations (3+ turns):** Model falls back to printing XML tool calls as text instead of producing native `tool_calls` in the API response - **After `ollama create` from safetensors:** Wrong pipeline assigned, tool calling broken from the start ### Suggested fix Replace the `strings.Contains` heuristic with an explicit mapping: ```go var qwen3Parsers = map[string]string{ "qwen3": "qwen3", "qwen3coder": "qwen3-coder", "qwen35": "qwen3.5", // or delegate to qwen3-coder } ``` This would also prevent future regressions when new Qwen3 variants are released. ### Note This primarily affects the `ollama create` (safetensors import) path. Models pulled via `ollama pull` use registry manifests that include the correct parser/renderer config. I'd be happy to submit a PR if this approach aligns with the project's direction. --- *Tested with: Ollama v0.6.x, qwen3.5:27b, qwen2.5:32b (works correctly), RTX 3090*
Author
Owner

@dhirajlochib commented on GitHub (Apr 2, 2026):

Fix submitted in #15224 — this addresses Bug 2 (wrong parser/renderer mapping for Qwen3 variants).

The root cause is that getParserName() and getRendererName() in x/create/client/create.go use strings.Contains(archLower, "qwen3") which matches all Qwen3 variants indiscriminately. The fix adds specific checks for Qwen3_5*, Qwen3Next*, and Qwen3VL* architectures before the generic qwen3 fallback, so each variant gets its correct parser and renderer.

<!-- gh-comment-id:4179172308 --> @dhirajlochib commented on GitHub (Apr 2, 2026): Fix submitted in #15224 — this addresses Bug 2 (wrong parser/renderer mapping for Qwen3 variants). The root cause is that `getParserName()` and `getRendererName()` in `x/create/client/create.go` use `strings.Contains(archLower, "qwen3")` which matches all Qwen3 variants indiscriminately. The fix adds specific checks for `Qwen3_5*`, `Qwen3Next*`, and `Qwen3VL*` architectures before the generic `qwen3` fallback, so each variant gets its correct parser and renderer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35159