[GH-ISSUE #11563] Support for zai-org/GLM-4.5 (Thinking & Non-Thinking Modes + Tool Use) #69690

Closed
opened 2026-05-04 18:50:41 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @zytoh0 on GitHub (Jul 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11563

Hi Ollama team,

I’d like to request support for the zai-org/GLM-4.5 model and its FP8 variant zai-org/GLM-4.5-FP8. These are hybrid reasoning models that provide two distinct modes:

  • Thinking mode: for complex reasoning and tool usage

  • Non-thinking mode: for fast, immediate responses

It would be great if Ollama could support both modes and the model’s tool use capabilities. The FP8 version would also be valuable for users seeking optimized performance and reduced memory usage.

Thanks for considering this request!

Originally created by @zytoh0 on GitHub (Jul 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11563 Hi Ollama team, I’d like to request support for the [zai-org/GLM-4.5](https://huggingface.co/zai-org/GLM-4.5) model and its FP8 variant [zai-org/GLM-4.5-FP8](https://huggingface.co/zai-org/GLM-4.5-FP8). These are hybrid reasoning models that provide two distinct modes: - **Thinking mode**: for complex reasoning and tool usage - **Non-thinking mode**: for fast, immediate responses It would be great if Ollama could support both modes and the model’s **tool use capabilities**. The FP8 version would also be valuable for users seeking optimized performance and reduced memory usage. Thanks for considering this request!
GiteaMirror added the model label 2026-05-04 18:50:41 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 29, 2025):

https://github.com/ggml-org/llama.cpp/issues/14921

<!-- gh-comment-id:3130655337 --> @rick-github commented on GitHub (Jul 29, 2025): https://github.com/ggml-org/llama.cpp/issues/14921
Author
Owner

@olumolu commented on GitHub (Jul 29, 2025):

Please support this Mainly air model i want to run it...

<!-- gh-comment-id:3132874705 --> @olumolu commented on GitHub (Jul 29, 2025): Please support this Mainly air model i want to run it...
Author
Owner

@matbgn commented on GitHub (Aug 4, 2025):

Job done on the side of llama.cpp: https://github.com/ggml-org/llama.cpp/pull/14939

Just amazing what they did 👀

<!-- gh-comment-id:3151990674 --> @matbgn commented on GitHub (Aug 4, 2025): Job done on the side of llama.cpp: https://github.com/ggml-org/llama.cpp/pull/14939 Just amazing what they did 👀
Author
Owner

@fragrusti commented on GitHub (Aug 7, 2025):

Plus one. Antirez (aka Salvatore Sanfilippo) and others are saying that GLM4.5 is > OpenAI OSS. Thank you in advance

<!-- gh-comment-id:3163320264 --> @fragrusti commented on GitHub (Aug 7, 2025): Plus one. Antirez (aka Salvatore Sanfilippo) and others are saying that GLM4.5 is > OpenAI OSS. Thank you in advance
Author
Owner

@FawadAbbas12 commented on GitHub (Aug 7, 2025):

+1

<!-- gh-comment-id:3166167723 --> @FawadAbbas12 commented on GitHub (Aug 7, 2025): +1
Author
Owner

@stereifberger commented on GitHub (Aug 9, 2025):

It would be great if the new GLM models could be added.

<!-- gh-comment-id:3170688761 --> @stereifberger commented on GitHub (Aug 9, 2025): It would be great if the new GLM models could be added.
Author
Owner

@richard1912 commented on GitHub (Aug 9, 2025):

+1

<!-- gh-comment-id:3171005324 --> @richard1912 commented on GitHub (Aug 9, 2025): +1
Author
Owner

@simfor99 commented on GitHub (Aug 11, 2025):

Job done on the side of llama.cpp: ggml-org/llama.cpp#14939

Just amazing what they did 👀

thank you! just uninstalled ollama and installed llama.ccp!

<!-- gh-comment-id:3174315079 --> @simfor99 commented on GitHub (Aug 11, 2025): > Job done on the side of llama.cpp: [ggml-org/llama.cpp#14939](https://github.com/ggml-org/llama.cpp/pull/14939) > > Just amazing what they did 👀 thank you! just uninstalled ollama and installed llama.ccp!
Author
Owner

@MichelRosselli commented on GitHub (Aug 11, 2025):

+1

<!-- gh-comment-id:3174485382 --> @MichelRosselli commented on GitHub (Aug 11, 2025): +1
Author
Owner

@olumolu commented on GitHub (Aug 11, 2025):

https://huggingface.co/zai-org/GLM-4.5V
Add support
For glm 4.5v which is a fully vision capable model thanks.

<!-- gh-comment-id:3175038137 --> @olumolu commented on GitHub (Aug 11, 2025): https://huggingface.co/zai-org/GLM-4.5V Add support For glm 4.5v which is a fully vision capable model thanks.
Author
Owner

@cruzanstx commented on GitHub (Aug 12, 2025):

any moment now?

<!-- gh-comment-id:3179877131 --> @cruzanstx commented on GitHub (Aug 12, 2025): any moment now?
Author
Owner

@rick-github commented on GitHub (Aug 12, 2025):

#11823

<!-- gh-comment-id:3179885353 --> @rick-github commented on GitHub (Aug 12, 2025): #11823
Author
Owner

@olumolu commented on GitHub (Aug 15, 2025):

Will you provide this models

<!-- gh-comment-id:3190516752 --> @olumolu commented on GitHub (Aug 15, 2025): Will you provide this models
Author
Owner

@MichelRosselli commented on GitHub (Aug 15, 2025):

Finally, with release v0.11.5-rc2, support for the model has been added.
The only limitation is that, as explained in this issue Ollama still doesn’t support sharded models.

The working solution I’ve found is to use quantized, non-sharded models. For example, I was able to run it successfully with the quantized version of Unsloth: hf.co/unsloth/GLM-4.5-Air-GGUF:Q2_K_XL, though performance is not amazing.

Alternatively, one could try merging the .gguf shards into a single .gguf file, but I haven’t tested this yet to confirm if it works.

<!-- gh-comment-id:3191095615 --> @MichelRosselli commented on GitHub (Aug 15, 2025): Finally, with release [v0.11.5-rc2](https://github.com/ollama/ollama/releases/tag/v0.11.5-rc2), support for the model has been added. The only limitation is that, as explained in this [issue](https://github.com/ollama/ollama/issues/5245) Ollama still doesn’t support sharded models. The working solution I’ve found is to use quantized, non-sharded models. For example, I was able to run it successfully with the quantized version of Unsloth: hf.co/unsloth/GLM-4.5-Air-GGUF:Q2_K_XL, though performance is not amazing. Alternatively, one could try merging the .gguf shards into a single .gguf file, but I haven’t tested this yet to confirm if it works.
Author
Owner

@rick-github commented on GitHub (Aug 15, 2025):

Alternatively, one could try merging the .gguf shards into a single .gguf file, but I haven’t tested this yet to confirm if it works.

It works, tested with unsloth/GLM-4.5-Air-GGUF:Q4_K_M. However the template is not configured for thinking so the <think> block appears in the output.

$ ollama run GLM-4.5-Air-Q4_K_M:latest hello
<think>Okay, the user just said "hello." That's a very simple greeting. Hmm, I 
should respond warmly to make them feel welcomed since it's their first interaction. 


I wonder if they're new here or just testing the waters. The brevity makes me think 
they might be in a hurry or unsure how to start. Maybe they'll follow up with 
something more specific soon. 

Better keep my reply light and open-ended - offer help but don't overload them. A 
smiley face could soften the tone since text lacks warmth otherwise. Should I ask 
about their day? No, that's too generic... better invite questions directly. 

Ah, right - end with an offer of assistance to nudge them toward actual needs if 
they have any. The "how can I assist" phrasing feels professional yet 
approachable.</think>Hello! 👋 How can I assist you today? Feel free to ask me 
anything or share what's on your mind—I'm here to help! 😊
<!-- gh-comment-id:3191189536 --> @rick-github commented on GitHub (Aug 15, 2025): > Alternatively, one could try merging the .gguf shards into a single .gguf file, but I haven’t tested this yet to confirm if it works. It works, tested with unsloth/GLM-4.5-Air-GGUF:Q4_K_M. However the template is not configured for thinking so the `<think>` block appears in the output. ```console $ ollama run GLM-4.5-Air-Q4_K_M:latest hello <think>Okay, the user just said "hello." That's a very simple greeting. Hmm, I should respond warmly to make them feel welcomed since it's their first interaction. I wonder if they're new here or just testing the waters. The brevity makes me think they might be in a hurry or unsure how to start. Maybe they'll follow up with something more specific soon. Better keep my reply light and open-ended - offer help but don't overload them. A smiley face could soften the tone since text lacks warmth otherwise. Should I ask about their day? No, that's too generic... better invite questions directly. Ah, right - end with an offer of assistance to nudge them toward actual needs if they have any. The "how can I assist" phrasing feels professional yet approachable.</think>Hello! 👋 How can I assist you today? Feel free to ask me anything or share what's on your mind—I'm here to help! 😊 ```
Author
Owner

@rick-github commented on GitHub (Aug 15, 2025):

FROM GLM-4.5-Air-Q4_K_M:latest
PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
TEMPLATE """
{{- $lastUserIdx := -1 }}
{{- range $i, $_ := .Messages }}
{{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }}
{{- end -}}
[gMASK]<sop>
{{- if .System }}<|system|>
{{ .System }}{{ end }}

{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|user|>
{{ .Content }}
{{- else if eq .Role "assistant" }}<|assistant|>
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
<think>{{ .Thinking }}</think>
{{- end }}
{{ .Content }}
{{- end }}
{{- if and (ne .Role "assistant") $last }}<|assistant|>
{{ end }}
{{- end }}"""
$ ollama run glm-4.5:110b-air-thinking-q4_K_M hello
Thinking...
We are starting a new conversation. The user just said "hello".
 Since there's no specific task or question, I should respond politely and offer help.
 Let me keep it simple and friendly.
...done thinking.

Hello! 👋 How can I assist you today? Feel free to ask questions, request information, or
chat about anything on your mind—I'm here to help!
<!-- gh-comment-id:3191420666 --> @rick-github commented on GitHub (Aug 15, 2025): ```dockerfile FROM GLM-4.5-Air-Q4_K_M:latest PARAMETER stop <|system|> PARAMETER stop <|user|> PARAMETER stop <|assistant|> TEMPLATE """ {{- $lastUserIdx := -1 }} {{- range $i, $_ := .Messages }} {{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }} {{- end -}} [gMASK]<sop> {{- if .System }}<|system|> {{ .System }}{{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if eq .Role "user" }}<|user|> {{ .Content }} {{- else if eq .Role "assistant" }}<|assistant|> {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think>{{ .Thinking }}</think> {{- end }} {{ .Content }} {{- end }} {{- if and (ne .Role "assistant") $last }}<|assistant|> {{ end }} {{- end }}""" ``` ```console $ ollama run glm-4.5:110b-air-thinking-q4_K_M hello Thinking... We are starting a new conversation. The user just said "hello". Since there's no specific task or question, I should respond politely and offer help. Let me keep it simple and friendly. ...done thinking. Hello! 👋 How can I assist you today? Feel free to ask questions, request information, or chat about anything on your mind—I'm here to help! ```
Author
Owner

@MichelRosselli commented on GitHub (Aug 15, 2025):

FROM GLM-4.5-Air-Q4_K_M:latest
PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
TEMPLATE """
{{- $lastUserIdx := -1 }}
{{- range $i, $_ := .Messages }}
{{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }}
{{- end -}}
[gMASK]
{{- if .System }}<|system|>
{{ .System }}{{ end }}

{{- range $i, $_ := .Messages }}
{{- last := eq (len (slice .Messages $i)) 1 }}
{{- if eq .Role "user" }}<|user|>
{{ .Content }}
{{- else if eq .Role "assistant" }}<|assistant|>
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
{{ .Thinking }}
{{- end }}
{{ .Content }}
{{- end }}
{{- if and (ne .Role "assistant") $last }}<|assistant|>
{{ end }}
{{- end }}"""

$ ollama run glm-4.5:110b-air-thinking-q4_K_M hello
Thinking...
We are starting a new conversation. The user just said "hello".
Since there's no specific task or question, I should respond politely and offer help.
Let me keep it simple and friendly.
...done thinking.

Hello! 👋 How can I assist you today? Feel free to ask questions, request information, or
chat about anything on your mind—I'm here to help!

I’ve tested it as well, works fine on my side too.
Thanks for sharing the chat template! 😄

<!-- gh-comment-id:3192579181 --> @MichelRosselli commented on GitHub (Aug 15, 2025): > FROM GLM-4.5-Air-Q4_K_M:latest > PARAMETER stop <|system|> > PARAMETER stop <|user|> > PARAMETER stop <|assistant|> > TEMPLATE """ > {{- $lastUserIdx := -1 }} > {{- range $i, $_ := .Messages }} > {{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }} > {{- end -}} > [gMASK]<sop> > {{- if .System }}<|system|> > {{ .System }}{{ end }} > > {{- range $i, $_ := .Messages }} > {{- $last := eq (len (slice $.Messages $i)) 1 }} > {{- if eq .Role "user" }}<|user|> > {{ .Content }} > {{- else if eq .Role "assistant" }}<|assistant|> > {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} > <think>{{ .Thinking }}</think> > {{- end }} > {{ .Content }} > {{- end }} > {{- if and (ne .Role "assistant") $last }}<|assistant|> > {{ end }} > {{- end }}""" > > $ ollama run glm-4.5:110b-air-thinking-q4_K_M hello > Thinking... > We are starting a new conversation. The user just said "hello". > Since there's no specific task or question, I should respond politely and offer help. > Let me keep it simple and friendly. > ...done thinking. > > Hello! 👋 How can I assist you today? Feel free to ask questions, request information, or > chat about anything on your mind—I'm here to help! I’ve tested it as well, works fine on my side too. Thanks for sharing the chat template! :smile:
Author
Owner

@rick-github commented on GitHub (Aug 22, 2025):

Now with tools. The tool call format that the model was trained with doesn't really match well with the current ollama tool call parser, so it may sometimes not get it right. but in my limited testing so far it's been OK.

FROM GLM-4.5-Air-Q4_K_M:latest
PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
TEMPLATE """
{{- $lastUserIdx := -1 }}
{{- range $i, $_ := .Messages }}
{{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }}
{{- end -}}
[gMASK]<sop>
{{- if or .System .Tools }}<|system|>
{{- if .Tools }}
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{{ json .Function }}
{{ end }}
</tools>

For each function call, output the function name and arguments within <tool_call></tool_call> XML tags using the following JSON format:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>

{{ end }}
{{- if .System }}
{{ .System }}{{ end }}
{{- end }}

{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|user|>
{{ .Content }}
{{- else if eq .Role "assistant" }}<|assistant|>
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
<think>{{ .Thinking }}</think>
{{- end }}
{{- if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}
{{- else if eq .Role "tool" }}<|observation|>
<tool_response>
{{ .Content }}
</tool_response>
{{- end }}
{{- if and (ne .Role "assistant") $last }}<|assistant|>
{{ end }}
{{- end }}"""
$ ollama-tool.py --model glm-4.5:110b-air-thinking-tool-q4_K_M --tools add,power -p "what is 10 ^ 2.3 and 5 ^ 4.6?  Then add them together." -i
calling power({'x': 10, 'y': 2.3})
calling power({'x': 5, 'y': 4.6})
calling add({'l': [199.52623149688787, 1641.5798777523537]})
Perfect! I've calculated both exponential expressions and added them together.

The results are:
- 10^2.3 = 199.52623149688787
- 5^4.6 = 1641.5798777523537  
- Their sum = 1841.1061092492416</think>
Here are the calculations:

**10^2.3** = 199.53 (rounded to 2 decimal places)

**5^4.6** = 1,641.58 (rounded to 2 decimal places)

**Sum**: 199.53 + 1,641.58 = **1,841.11**

So the exact result is approximately 1841.1061092492416.
[
    {
        "role": "user",
        "content": "what is 10 ^ 2.3 and 5 ^ 4.6?  Then add them together."
    },
    {
        "role": "assistant",
        "thinking": "The user wants me to:\n1. Calculate 10^2.3 \n2. Calculate 5^4.6\n3. Add the two results together\n\nI have a power function that can calculate x^y, and an add function that can sum multiple numbers.\n\nLet me start by calculating both powers first.",
        "content": "I'll help you calculate those exponential expressions and then add them together.\n",
        "tool_calls": [
            "function=Function(name='power', arguments={'x': 10, 'y': 2.3})",
            "function=Function(name='power', arguments={'x': 5, 'y': 4.6})"
        ]
    },
    {
        "role": "tool",
        "content": "199.52623149688787"
    },
    {
        "role": "tool",
        "content": "1641.5798777523537"
    },
    {
        "role": "assistant",
        "thinking": "Now I have the results:\n- 10^2.3 = 199.52623149688787\n- 5^4.6 = 1641.5798777523537\n\nNow I need to add these two numbers together using the add function.",
        "content": "",
        "tool_calls": [
            "function=Function(name='add', arguments={'l': [199.52623149688787, 1641.5798777523537]})"
        ]
    },
    {
        "role": "tool",
        "content": "1841.1061092492416"
    },
    {
        "role": "assistant",
        "thinking": null,
        "content": "Perfect! I've calculated both exponential expressions and added them together.\n\nThe results are:\n- 10^2.3 = 199.52623149688787\n- 5^4.6 = 1641.5798777523537  \n- Their sum = 1841.1061092492416</think>\nHere are the calculations:\n\n**10^2.3** = 199.53 (rounded to 2 decimal places)\n\n**5^4.6** = 1,641.58 (rounded to 2 decimal places)\n\n**Sum**: 199.53 + 1,641.58 = **1,841.11**\n\nSo the exact result is approximately 1841.1061092492416.",
        "tool_calls": null
    }
]
<!-- gh-comment-id:3215935418 --> @rick-github commented on GitHub (Aug 22, 2025): Now with tools. The tool call format that the model was trained with doesn't really match well with the current ollama tool call parser, so it may sometimes not get it right. but in my limited testing so far it's been OK. ```dockerfile FROM GLM-4.5-Air-Q4_K_M:latest PARAMETER stop <|system|> PARAMETER stop <|user|> PARAMETER stop <|assistant|> TEMPLATE """ {{- $lastUserIdx := -1 }} {{- range $i, $_ := .Messages }} {{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }} {{- end -}} [gMASK]<sop> {{- if or .System .Tools }}<|system|> {{- if .Tools }} # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {{ json .Function }} {{ end }} </tools> For each function call, output the function name and arguments within <tool_call></tool_call> XML tags using the following JSON format: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{ end }} {{- if .System }} {{ .System }}{{ end }} {{- end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if eq .Role "user" }}<|user|> {{ .Content }} {{- else if eq .Role "assistant" }}<|assistant|> {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think>{{ .Thinking }}</think> {{- end }} {{- if .Content }}{{ .Content }} {{- else if .ToolCalls }}<tool_call> {{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{ end }}</tool_call> {{- end }} {{- else if eq .Role "tool" }}<|observation|> <tool_response> {{ .Content }} </tool_response> {{- end }} {{- if and (ne .Role "assistant") $last }}<|assistant|> {{ end }} {{- end }}""" ``` ```console $ ollama-tool.py --model glm-4.5:110b-air-thinking-tool-q4_K_M --tools add,power -p "what is 10 ^ 2.3 and 5 ^ 4.6? Then add them together." -i calling power({'x': 10, 'y': 2.3}) calling power({'x': 5, 'y': 4.6}) calling add({'l': [199.52623149688787, 1641.5798777523537]}) Perfect! I've calculated both exponential expressions and added them together. The results are: - 10^2.3 = 199.52623149688787 - 5^4.6 = 1641.5798777523537 - Their sum = 1841.1061092492416</think> Here are the calculations: **10^2.3** = 199.53 (rounded to 2 decimal places) **5^4.6** = 1,641.58 (rounded to 2 decimal places) **Sum**: 199.53 + 1,641.58 = **1,841.11** So the exact result is approximately 1841.1061092492416. ``` ```json [ { "role": "user", "content": "what is 10 ^ 2.3 and 5 ^ 4.6? Then add them together." }, { "role": "assistant", "thinking": "The user wants me to:\n1. Calculate 10^2.3 \n2. Calculate 5^4.6\n3. Add the two results together\n\nI have a power function that can calculate x^y, and an add function that can sum multiple numbers.\n\nLet me start by calculating both powers first.", "content": "I'll help you calculate those exponential expressions and then add them together.\n", "tool_calls": [ "function=Function(name='power', arguments={'x': 10, 'y': 2.3})", "function=Function(name='power', arguments={'x': 5, 'y': 4.6})" ] }, { "role": "tool", "content": "199.52623149688787" }, { "role": "tool", "content": "1641.5798777523537" }, { "role": "assistant", "thinking": "Now I have the results:\n- 10^2.3 = 199.52623149688787\n- 5^4.6 = 1641.5798777523537\n\nNow I need to add these two numbers together using the add function.", "content": "", "tool_calls": [ "function=Function(name='add', arguments={'l': [199.52623149688787, 1641.5798777523537]})" ] }, { "role": "tool", "content": "1841.1061092492416" }, { "role": "assistant", "thinking": null, "content": "Perfect! I've calculated both exponential expressions and added them together.\n\nThe results are:\n- 10^2.3 = 199.52623149688787\n- 5^4.6 = 1641.5798777523537 \n- Their sum = 1841.1061092492416</think>\nHere are the calculations:\n\n**10^2.3** = 199.53 (rounded to 2 decimal places)\n\n**5^4.6** = 1,641.58 (rounded to 2 decimal places)\n\n**Sum**: 199.53 + 1,641.58 = **1,841.11**\n\nSo the exact result is approximately 1841.1061092492416.", "tool_calls": null } ] ```
Author
Owner

@MichelRosselli commented on GitHub (Aug 24, 2025):

Now with tools. The tool call format that the model was trained with doesn't really match well with the current ollama tool call parser, so it may sometimes not get it right. but in my limited testing so far it's been OK.

Thanks for sharing your template!

I was also working on something similar and tried to replicate as closely as possible the functionality of the GLM-4.5-Air jinja template and the format described in the paper.

I noticed that the tool call format is quite different from Ollama’s parser. Initially, I considered writing a custom parser to handle the model responses, but then I realized that Ollama removes everything wrapped inside <tool_call>...</tool_call> from the message content.

Do you know if there’s a way to bypass this behavior in Ollama?

For reference, here’s the template I put together with tool calls in the GLM-4.5 style:

FROM GLM-4.5-Air:Q4_K_M
PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
TEMPLATE """[gMASK]<sop>

{{- if .Tools }}<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"function": {{ .Function }}}
{{- end }}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>
{{- end -}}

{{- $lastUserIdx := -1 }}
{{- range $i, $_ := .Messages }}
{{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }}
{{- end -}}

{{- $prevWasTool := false -}}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- $curIsTool := eq .Role "tool" -}}
{{- $startToolBlock := and $curIsTool (not $prevWasTool) -}}

{{- if eq .Role "user" }}<|user|>
{{ .Content }}

{{- if and $.IsThinkSet (not $.Think) -}}
/nothink
{{- end -}}

{{- else if eq .Role "assistant" }}<|assistant|>
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) }}
<think>{{ .Thinking }}</think>
{{- else if $.IsThinkSet }}
<think></think>
{{- end }}
{{- if .Content }}
{{ .Content }}
{{- end -}}

{{ if .ToolCalls }}
{{- range .ToolCalls }}
<tool_call>{{ .Function.Name }}
{{- range $key, $value := .Function.Arguments }}
<arg_key>{{ $key }}</arg_key>
<arg_value>{{ $value }}</arg_value>
{{- end }}
</tool_call>
{{- end }}
{{- end }}

{{- else if $curIsTool -}}
{{ if not $prevWasTool }}<|observation|>
{{- end }}
<tool_response>
{{ .Content }}
</tool_response>
{{- $prevWasTool = true -}}

{{- else if eq .Role "system" -}}<|system|>
{{ .Content }}
{{- end }}

{{- if and (ne .Role "assistant") $last }}<|assistant|>
{{- if and $.IsThinkSet (not $.Think) }}
<think></think>
{{- end -}}
{{- end }}

{{- $prevWasTool = $curIsTool -}}
{{- end }}"""

And here’s the version I adapted to be compatible with Ollama:

FROM GLM-4.5-Air:Q4_K_M
PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
TEMPLATE """[gMASK]<sop>

{{- if .Tools }}<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end -}}

{{- $lastUserIdx := -1 }}
{{- range $i, $_ := .Messages }}
{{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }}
{{- end -}}

{{- $prevWasTool := false -}}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- $curIsTool := eq .Role "tool" -}}
{{- $startToolBlock := and $curIsTool (not $prevWasTool) -}}

{{- if eq .Role "user" }}<|user|>
{{ .Content }}

{{- if and $.IsThinkSet (not $.Think) -}}
/nothink
{{- end -}}

{{- else if eq .Role "assistant" }}<|assistant|>
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) }}
<think>{{ .Thinking }}</think>
{{- else if $.IsThinkSet }}
<think></think>
{{- end }}
{{- if .Content }}
{{ .Content }}
{{- end -}}

{{ if .ToolCalls }}
<tool_call>
{{- range .ToolCalls }}
{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}
</tool_call>
{{- end }}

{{- else if $curIsTool -}}
{{ if not $prevWasTool }}<|observation|>
{{- end }}
<tool_response>
{{ .Content }}
</tool_response>
{{- $prevWasTool = true -}}

{{- else if eq .Role "system" -}}<|system|>
{{ .Content }}
{{- end }}

{{- if and (ne .Role "assistant") $last }}<|assistant|>
{{- if and $.IsThinkSet (not $.Think) }}
<think></think>
{{- end -}}
{{- end }}

{{- $prevWasTool = $curIsTool -}}
{{- end }}"""

Any feedback or suggestions would be very welcome!

<!-- gh-comment-id:3218169593 --> @MichelRosselli commented on GitHub (Aug 24, 2025): > Now with tools. The tool call format that the model was trained with doesn't really match well with the current ollama tool call parser, so it may sometimes not get it right. but in my limited testing so far it's been OK. Thanks for sharing your template! I was also working on something similar and tried to replicate as closely as possible the functionality of the GLM-4.5-Air [jinja template](https://huggingface.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja) and the format described in the [paper](https://arxiv.org/pdf/2508.06471). I noticed that the tool call format is quite different from Ollama’s parser. Initially, I considered writing a custom parser to handle the model responses, but then I realized that Ollama removes everything wrapped inside <tool_call>...</tool_call> from the message content. Do you know if there’s a way to bypass this behavior in Ollama? For reference, here’s the template I put together with tool calls in the GLM-4.5 style: ``` FROM GLM-4.5-Air:Q4_K_M PARAMETER stop <|system|> PARAMETER stop <|user|> PARAMETER stop <|assistant|> TEMPLATE """[gMASK]<sop> {{- if .Tools }}<|system|> # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {"function": {{ .Function }}} {{- end }} </tools> For each function call, output the function name and arguments within the following XML format: <tool_call>{function-name} <arg_key>{arg-key-1}</arg_key> <arg_value>{arg-value-1}</arg_value> <arg_key>{arg-key-2}</arg_key> <arg_value>{arg-value-2}</arg_value> ... </tool_call> {{- end -}} {{- $lastUserIdx := -1 }} {{- range $i, $_ := .Messages }} {{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }} {{- end -}} {{- $prevWasTool := false -}} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- $curIsTool := eq .Role "tool" -}} {{- $startToolBlock := and $curIsTool (not $prevWasTool) -}} {{- if eq .Role "user" }}<|user|> {{ .Content }} {{- if and $.IsThinkSet (not $.Think) -}} /nothink {{- end -}} {{- else if eq .Role "assistant" }}<|assistant|> {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) }} <think>{{ .Thinking }}</think> {{- else if $.IsThinkSet }} <think></think> {{- end }} {{- if .Content }} {{ .Content }} {{- end -}} {{ if .ToolCalls }} {{- range .ToolCalls }} <tool_call>{{ .Function.Name }} {{- range $key, $value := .Function.Arguments }} <arg_key>{{ $key }}</arg_key> <arg_value>{{ $value }}</arg_value> {{- end }} </tool_call> {{- end }} {{- end }} {{- else if $curIsTool -}} {{ if not $prevWasTool }}<|observation|> {{- end }} <tool_response> {{ .Content }} </tool_response> {{- $prevWasTool = true -}} {{- else if eq .Role "system" -}}<|system|> {{ .Content }} {{- end }} {{- if and (ne .Role "assistant") $last }}<|assistant|> {{- if and $.IsThinkSet (not $.Think) }} <think></think> {{- end -}} {{- end }} {{- $prevWasTool = $curIsTool -}} {{- end }}""" ``` And here’s the version I adapted to be compatible with Ollama: ``` FROM GLM-4.5-Air:Q4_K_M PARAMETER stop <|system|> PARAMETER stop <|user|> PARAMETER stop <|assistant|> TEMPLATE """[gMASK]<sop> {{- if .Tools }}<|system|> # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {"function": {{ .Function }}} {{- end }} </tools> For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{- end -}} {{- $lastUserIdx := -1 }} {{- range $i, $_ := .Messages }} {{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }} {{- end -}} {{- $prevWasTool := false -}} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- $curIsTool := eq .Role "tool" -}} {{- $startToolBlock := and $curIsTool (not $prevWasTool) -}} {{- if eq .Role "user" }}<|user|> {{ .Content }} {{- if and $.IsThinkSet (not $.Think) -}} /nothink {{- end -}} {{- else if eq .Role "assistant" }}<|assistant|> {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) }} <think>{{ .Thinking }}</think> {{- else if $.IsThinkSet }} <think></think> {{- end }} {{- if .Content }} {{ .Content }} {{- end -}} {{ if .ToolCalls }} <tool_call> {{- range .ToolCalls }} {"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{- end }} </tool_call> {{- end }} {{- else if $curIsTool -}} {{ if not $prevWasTool }}<|observation|> {{- end }} <tool_response> {{ .Content }} </tool_response> {{- $prevWasTool = true -}} {{- else if eq .Role "system" -}}<|system|> {{ .Content }} {{- end }} {{- if and (ne .Role "assistant") $last }}<|assistant|> {{- if and $.IsThinkSet (not $.Think) }} <think></think> {{- end -}} {{- end }} {{- $prevWasTool = $curIsTool -}} {{- end }}""" ``` Any feedback or suggestions would be very welcome!
Author
Owner

@chigkim commented on GitHub (Oct 4, 2025):

Any progress update on GLM models? Now GLM-4.6 is out.

<!-- gh-comment-id:3368328665 --> @chigkim commented on GitHub (Oct 4, 2025): Any progress update on GLM models? Now [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6) is out.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69690