[GH-ISSUE #14601] Qwen3 tool calling via /api/chat tools parameter: malformed tool definitions (and soft switch thinking issue) #71525

Open
opened 2026-05-05 02:03:38 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @BogodaMM on GitHub (Mar 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14601

What is the issue?

Description:

There are two bugs in how Ollama constructs prompts for Qwen3 when tools are passed via the /api/chat tools parameter. Both bugs are absent when tools are embedded directly in the system prompt with the tools parameter omitted. The bugs are visible in the prompt sent by Ollama to the model runner, which is logged when OLLAMA_DEBUG=2 (the prompt can also be seen by monitoring the runner's TCP socket via tcpdump.)

Bug 1: Tool definitions serialised as Go structs rather than valid JSON

When the tools parameter is used, tool definitions in the model prompt are malformed:

{"type": "function", "function": {get_weather Get the current weather for a city {object [city] {"city":{"type":"string","description":"The name of the city"}}}}

The correct format per the official Qwen3 HuggingFace chat template:

{"type": "function", "function": {"name": "get_weather", "description": "Get the current weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The name of the city"}}, "required": ["city"]}}}

The root cause is visible in the Qwen3 modelfile template:

{"type": "function", "function": {{ .Function }}}

The .Function variable is rendered using its default Go struct string representation rather than being serialised as JSON. There is no toJson or equivalent function available in Ollama's template engine to fix this at the modelfile level — it requires a code change in Ollama's Go source.

Bug 2: Assistant tool call content stripped from conversation history

When the tools parameter is used and a conversation history containing previous tool calls is passed to Ollama, those tool calls are automatically stripped from assistant turns before the prompt is sent to the runner.

For example, attempting to pass this conversation history:

<|im_start|>assistant
<tool_call>
{"name": "get_weather", "arguments": {"city": "Paris"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": "14C", "condition": "Clear"}
</tool_response><|im_end|>

Results in the model receiving:

<|im_start|>assistant
<|im_end|>
<|im_start|>user
<tool_response>
{"temperature": "14C", "condition": "Clear"}
</tool_response><|im_end|>

The model therefore has no visibility of its own previous tool calls, only the responses. The stripping occurs upstream of the template renderer in Ollama's context building code and cannot be fixed via the modelfile.

Update: This was a client-side issue — see comments below.

Additional finding: redundant /think//no_think text instructions

Ollama appends /think or /no_think to the model prompt when the think parameter is set. This leaks into conversation history as visible text and the model may comment on it. Per the official Qwen3 chat template, thinking mode is on by default and is switched off by appending <think>\n</think> to the prompt — which Ollama already does correctly. The /think//no_think instructions are irrelevant.

Unlike Bugs 1 and 2, this can be fixed at the modelfile level by removing the following lines from the template:

{{- if and $.IsThinkSet (eq $i $lastUserIdx) }}
   {{- if $.Think -}}
      {{- " "}}/think
   {{- else -}}
      {{- " "}}/no_think
   {{- end -}}
{{- end }}

Workaround
Both bugs can be avoided by bypassing the tools parameter and embedding tool definitions directly in the system prompt as JSON strings in the Hermes format, matching the official Qwen3 chat template.
Apply the /think//no_think fix via a custom modelfile by removing the redundant text instructions from the template

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.17.5

Model

qwen3:8b

Originally created by @BogodaMM on GitHub (Mar 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14601 ### What is the issue? **Description:** There are two bugs in how Ollama constructs prompts for Qwen3 when tools are passed via the /api/chat `tools` parameter. Both bugs are absent when tools are embedded directly in the system prompt with the `tools` parameter omitted. The bugs are visible in the prompt sent by Ollama to the model runner, which is logged when OLLAMA_DEBUG=2 (the prompt can also be seen by monitoring the runner's TCP socket via tcpdump.) **Bug 1: Tool definitions serialised as Go structs rather than valid JSON** When the `tools` parameter is used, tool definitions in the model prompt are malformed: ``` {"type": "function", "function": {get_weather Get the current weather for a city {object [city] {"city":{"type":"string","description":"The name of the city"}}}} ``` The correct format per the official Qwen3 HuggingFace chat template: ``` {"type": "function", "function": {"name": "get_weather", "description": "Get the current weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The name of the city"}}, "required": ["city"]}}} ``` The root cause is visible in the Qwen3 modelfile template: ``` {"type": "function", "function": {{ .Function }}} ``` The `.Function` variable is rendered using its default Go struct string representation rather than being serialised as JSON. There is no `toJson` or equivalent function available in Ollama's template engine to fix this at the modelfile level — it requires a code change in Ollama's Go source. ~~**Bug 2: Assistant tool call content stripped from conversation history**~~ ~~When the `tools` parameter is used and a conversation history containing previous tool calls is passed to Ollama, those tool calls are automatically stripped from assistant turns before the prompt is sent to the runner.~~ ~~For example, attempting to pass this conversation history:~~ ``` <|im_start|>assistant <tool_call> {"name": "get_weather", "arguments": {"city": "Paris"}} </tool_call><|im_end|> <|im_start|>user <tool_response> {"temperature": "14C", "condition": "Clear"} </tool_response><|im_end|> ``` ~~Results in the model receiving:~~ ``` <|im_start|>assistant <|im_end|> <|im_start|>user <tool_response> {"temperature": "14C", "condition": "Clear"} </tool_response><|im_end|> ``` ~~The model therefore has no visibility of its own previous tool calls, only the responses. The stripping occurs upstream of the template renderer in Ollama's context building code and cannot be fixed via the modelfile.~~ **Update:** This was a client-side issue — see comments below. **Additional finding: redundant `/think`/`/no_think` text instructions** Ollama appends `/think` or `/no_think` to the model prompt when the `think` parameter is set. This leaks into conversation history as visible text and the model may comment on it. Per the official Qwen3 chat template, thinking mode is on by default and is switched off by appending `<think>\n</think>` to the prompt — which Ollama already does correctly. The `/think`/`/no_think` instructions are irrelevant. Unlike Bugs 1 and 2, this can be fixed at the modelfile level by removing the following lines from the template: ``` {{- if and $.IsThinkSet (eq $i $lastUserIdx) }} {{- if $.Think -}} {{- " "}}/think {{- else -}} {{- " "}}/no_think {{- end -}} {{- end }} ``` **Workaround** Both bugs can be avoided by bypassing the `tools` parameter and embedding tool definitions directly in the system prompt as JSON strings in the Hermes format, matching the official Qwen3 chat template. Apply the `/think`/`/no_think` fix via a custom modelfile by removing the redundant text instructions from the template ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.17.5 ### Model qwen3:8b
GiteaMirror added the bug label 2026-05-05 02:03:38 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 3, 2026):

The model you are looking at is a previous iteration of the qwen3 series. That version of the model used /think and /nothink to control thinking. ollama has the json function for serializing content and you can update the template, but realistically you would be better served moving to a more recent model.

<!-- gh-comment-id:3994218670 --> @rick-github commented on GitHub (Mar 3, 2026): The model you are looking at is a previous iteration of the [qwen3](https://huggingface.co/Qwen/Qwen3-8B) series. That version of the model used `/think` and `/nothink` to [control thinking](https://huggingface.co/Qwen/Qwen3-8B#:~:text=Specifically%2C%20you%20can%20add%20/think%20and%20/no_think%20to%20user). ollama has the `json` function for serializing content and you can update the template, but realistically you would be better served moving to a more recent model.
Author
Owner

@BogodaMM commented on GitHub (Mar 4, 2026):

Thanks for your reply. I've confirmed Bug 1 is fixed by replacing {{ .Function }} with {{ .Function | json }} in the modelfile template. From further investigation I've discovered Bug 2 was a client-side issue: due to ollama returning an empty message.content when a tool call is made. To provide the tool call details in the conversation history simply requires the tool call content to be extracted from message.tool_calls instead.

Regarding the thinking issue, as per the documentation you referenced, I've verified that the /think and /no_think switches do work as dynamic switches during a chat session. But they should not be confused with the session-level setting, which is already correctly accomplished by ollama appending (or not) a <think>\n</think> pre-fill, as mentioned above. Note that, as per the documentation, the /think and /no_think switches only work when the session allows thinking (i.e. the default, when there is no <think>\n</think> pre-fill). Unfortunately, by appending /think to the very end of the prompt, as the present modelfile template does, any /no_think the user sends in a message to temporarily switch off thinking is overridden by that end-of-prompt /think — therefore completely preventing users from using the dynamic switching that is documented and intended for user control. The solution is to remove the offending lines in the modelfile as indicated above.

Regarding the model (qwen3:8b) I assumed it was supported as it's listed on the ollama webpage and referenced in the examples in the ollama tool calling documentation. What would be a recommended model?

<!-- gh-comment-id:3997158663 --> @BogodaMM commented on GitHub (Mar 4, 2026): Thanks for your reply. I've confirmed Bug 1 is fixed by replacing `{{ .Function }} `with `{{ .Function | json }}` in the modelfile template. From further investigation I've discovered Bug 2 was a client-side issue: due to ollama returning an empty `message.content` when a tool call is made. To provide the tool call details in the conversation history simply requires the tool call content to be extracted from `message.tool_calls` instead. Regarding the thinking issue, as per the documentation you referenced, I've verified that the `/think` and `/no_think` switches do work as dynamic switches during a chat session. But they should not be confused with the session-level setting, which is already correctly accomplished by ollama appending (or not) a `<think>\n</think>` pre-fill, as mentioned above. Note that, as per the documentation, the `/think` and `/no_think` switches only work when the session allows thinking (i.e. the default, when there is no `<think>\n</think>` pre-fill). Unfortunately, by appending `/think` to the very end of the prompt, as the present modelfile template does, any `/no_think` the user sends in a message to temporarily switch off thinking is overridden by that end-of-prompt `/think` — therefore completely preventing users from using the dynamic switching that is documented and intended for user control. The solution is to remove the offending lines in the modelfile as indicated above. Regarding the model (qwen3:8b) I assumed it was supported as it's listed on the ollama webpage and referenced in the examples in the ollama tool calling [documentation](https://docs.ollama.com/capabilities/tool-calling). What would be a recommended model?
Author
Owner

@rick-github commented on GitHub (Mar 4, 2026):

qwen3:8b is supported, it's just old and the model creators have superseded it. Tool using models can be found here. If you want to stick to the qwen family, Qwen has just released qwen3.5, reports so far indicate it's quite capable.

<!-- gh-comment-id:3997262901 --> @rick-github commented on GitHub (Mar 4, 2026): qwen3:8b is supported, it's just old and the model creators have superseded it. Tool using models can be found [here](https://ollama.com/search?c=tools). If you want to stick to the qwen family, Qwen has just released [qwen3.5](https://ollama.com/library/qwen3.5), reports so far indicate it's quite capable.
Author
Owner

@oggixx commented on GitHub (Mar 8, 2026):

🦀 OpenClaw user here — can confirm this issue affects our production setup.

Our Experience:

  • Using Qwen 3.5 via ollama/qwen3.5:397b-cloud in OpenClaw framework
  • Tool definitions via /api/chat tools parameter
  • Seeing malformed tool calls and inconsistent behavior

Specific Issues:

  1. Tool definitions sometimes rejected as "malformed" even when following docs
  2. Qwen 3.5 acknowledges tools but fails to call them correctly
  3. Switching thinking mode on/off changes behavior unpredictably

Impact:
Makes Qwen 3.5 unreliable for production agent workflows. We've implemented fallbacks to other models but would prefer using Qwen 3.5 for its capabilities.

Related Issues:

  • #14493 (Qwen 3.5 tool calling non-functional)
  • #14603 (fix merged 2026-03-04 — but issue persists for us)

Question:
Is the fix in #14603 included in the latest ollama/qwen3.5:397b-cloud build? Or do we need to wait for a newer release?

Thanks for the work on Ollama! 🙏

<!-- gh-comment-id:4019068888 --> @oggixx commented on GitHub (Mar 8, 2026): 🦀 OpenClaw user here — can confirm this issue affects our production setup. **Our Experience:** - Using Qwen 3.5 via `ollama/qwen3.5:397b-cloud` in OpenClaw framework - Tool definitions via `/api/chat` tools parameter - Seeing malformed tool calls and inconsistent behavior **Specific Issues:** 1. Tool definitions sometimes rejected as "malformed" even when following docs 2. Qwen 3.5 acknowledges tools but fails to call them correctly 3. Switching thinking mode on/off changes behavior unpredictably **Impact:** Makes Qwen 3.5 unreliable for production agent workflows. We've implemented fallbacks to other models but would prefer using Qwen 3.5 for its capabilities. **Related Issues:** - #14493 (Qwen 3.5 tool calling non-functional) - #14603 (fix merged 2026-03-04 — but issue persists for us) **Question:** Is the fix in #14603 included in the latest `ollama/qwen3.5:397b-cloud` build? Or do we need to wait for a newer release? Thanks for the work on Ollama! 🙏
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71525