[GH-ISSUE #12468] [Model Request] Apriel-1.5-15b-Thinker #34044

Closed
opened 2026-04-22 17:16:15 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @joleszczuk-jpg on GitHub (Oct 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12468

Hello, please consider adding https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker, as it has very good results for a small model

Originally created by @joleszczuk-jpg on GitHub (Oct 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12468 Hello, please consider adding https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker, as it has very good results for a small model
GiteaMirror added the model label 2026-04-22 17:16:15 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 1, 2025):

ollama run hf.co/mradermacher/Apriel-1.5-15b-Thinker-GGUF:Q4_K_M

The template is basic though, doesn't handle tool calls.

<!-- gh-comment-id:3356537227 --> @rick-github commented on GitHub (Oct 1, 2025): ``` ollama run hf.co/mradermacher/Apriel-1.5-15b-Thinker-GGUF:Q4_K_M ``` The template is basic though, doesn't handle tool calls.
Author
Owner

@rick-github commented on GitHub (Oct 1, 2025):

FROM hf.co/mradermacher/Apriel-1.5-15b-Thinker-GGUF:q4_k_m

SYSTEM "You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE]."

TEMPLATE """
{{- if or .System .Tools }}<|system|>
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools -}}
You are provided with function signatures within <available_tools></available_tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about the arguments. You should infer the argument values from previous user responses and the system message. Here are the available tools:
<available_tools>
{{ range .Tools }}
{{ . }}
{{ end }}
</available_tools>
Return all function calls as a list of json objects within <tool_call></tool_call> XML tags. Each json object should contain a function name and arguments as follows:
<tool_calls>[{"name": <function-name-1>, "arguments": <args-dict-1>}, {"name": <function-name-2>, "arguments": <args-dict-2>},...]</tool_calls>
{{- end }}
<|end|>
{{- end }}

{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}

{{- if eq .Role "user" }}
<|user|>
{{ .Content }}
<|end|>
{{- end }}

{{- if eq .Role "content" }}
<|content|>
{{ .Content }}
<|end|>
{{- end }}

{{- if eq .Role "assistant" }}
<|assistant|>
{{ .Content }}
{{- if and $.IsThinkSet .Thinking }}<thinking>{{ .Thinking }}</thinking>{{ end -}}
{{- if .ToolCalls }}
<tool_calls>[{{ range .ToolCalls }}{"name":"{{ .Function.Name }}","arguments":{{ .Function.Arguments }}{{ end }}]</tool_calls>
{{- end }}
<|end|>
</s>
{{- end }}

{{- if false }}Here are my reasoning steps:
{{ .Thinking }}
[BEGIN FINAL RESPONSE]
{{- end }}

{{- if eq .Role "tool" }}<|tool_result|>
{{ .Content }}
<|end|>
{{- end }}

{{- if and (ne .Role "assistant") $last }}
<|assistant|>
{{- if $.IsThinkSet }}
Here are my reasoning steps:
{{- if not $.Think }}

[BEGIN FINAL RESPONSE]
{{- end }}
{{- end }}
{{ end }}

{{- end -}}
"""
PARAMETER stop [END FINAL RESPONSE]
PARAMETER stop <|end|>
$ ollama create apriel-1.5:15b-thinker-q4_K_M -f Modelfile
$ ollama run apriel-1.5:15b-thinker hello
Thinking...
The user says "hello". This is a simple greeting. The assistant should respond politely, perhaps ask how can I help.

Given the policy: It's allowed content (non-violent). So we just reply with a friendly greeting.

We might also mention any relevant context? Not needed. Just answer.
...done thinking.

Hello! How can I assist you today?

$ ollama run apriel-1.5:15b-thinker --think=false hello
Hello! How can I help you today?

$ ./ollama-tool.py --model apriel-1.5:15b-thinker-q4_K_M --tools power,add --prompt "What's 5.6 ^ 4.3 and 2.6 ^ 7.1?  Then add them together." 
calling power({'x': 5.6, 'y': 4.3})
calling power({'x': 2.6, 'y': 7.1})
calling add({'l': [1648.9539073462636, 883.7120371106503]})
The results:
- 5.6 ^ 4.3 ≈ **1,648.95**
- 2.6 ^ 7.1 ≈ **883.71**

Added together: **≈ 2,532.67**.
<!-- gh-comment-id:3357117173 --> @rick-github commented on GitHub (Oct 1, 2025): ```modelfile FROM hf.co/mradermacher/Apriel-1.5-15b-Thinker-GGUF:q4_k_m SYSTEM "You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE]." TEMPLATE """ {{- if or .System .Tools }}<|system|> {{- if .System }} {{ .System }} {{- end }} {{- if .Tools -}} You are provided with function signatures within <available_tools></available_tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about the arguments. You should infer the argument values from previous user responses and the system message. Here are the available tools: <available_tools> {{ range .Tools }} {{ . }} {{ end }} </available_tools> Return all function calls as a list of json objects within <tool_call></tool_call> XML tags. Each json object should contain a function name and arguments as follows: <tool_calls>[{"name": <function-name-1>, "arguments": <args-dict-1>}, {"name": <function-name-2>, "arguments": <args-dict-2>},...]</tool_calls> {{- end }} <|end|> {{- end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }} <|user|> {{ .Content }} <|end|> {{- end }} {{- if eq .Role "content" }} <|content|> {{ .Content }} <|end|> {{- end }} {{- if eq .Role "assistant" }} <|assistant|> {{ .Content }} {{- if and $.IsThinkSet .Thinking }}<thinking>{{ .Thinking }}</thinking>{{ end -}} {{- if .ToolCalls }} <tool_calls>[{{ range .ToolCalls }}{"name":"{{ .Function.Name }}","arguments":{{ .Function.Arguments }}{{ end }}]</tool_calls> {{- end }} <|end|> </s> {{- end }} {{- if false }}Here are my reasoning steps: {{ .Thinking }} [BEGIN FINAL RESPONSE] {{- end }} {{- if eq .Role "tool" }}<|tool_result|> {{ .Content }} <|end|> {{- end }} {{- if and (ne .Role "assistant") $last }} <|assistant|> {{- if $.IsThinkSet }} Here are my reasoning steps: {{- if not $.Think }} [BEGIN FINAL RESPONSE] {{- end }} {{- end }} {{ end }} {{- end -}} """ PARAMETER stop [END FINAL RESPONSE] PARAMETER stop <|end|> ``` ```console $ ollama create apriel-1.5:15b-thinker-q4_K_M -f Modelfile ``` ```console $ ollama run apriel-1.5:15b-thinker hello Thinking... The user says "hello". This is a simple greeting. The assistant should respond politely, perhaps ask how can I help. Given the policy: It's allowed content (non-violent). So we just reply with a friendly greeting. We might also mention any relevant context? Not needed. Just answer. ...done thinking. Hello! How can I assist you today? $ ollama run apriel-1.5:15b-thinker --think=false hello Hello! How can I help you today? $ ./ollama-tool.py --model apriel-1.5:15b-thinker-q4_K_M --tools power,add --prompt "What's 5.6 ^ 4.3 and 2.6 ^ 7.1? Then add them together." calling power({'x': 5.6, 'y': 4.3}) calling power({'x': 2.6, 'y': 7.1}) calling add({'l': [1648.9539073462636, 883.7120371106503]}) The results: - 5.6 ^ 4.3 ≈ **1,648.95** - 2.6 ^ 7.1 ≈ **883.71** Added together: **≈ 2,532.67**. ```
Author
Owner

@mikestaub commented on GitHub (Oct 1, 2025):

I pushed a version here: https://ollama.com/mikestaub/apriel-1.5

<!-- gh-comment-id:3357495153 --> @mikestaub commented on GitHub (Oct 1, 2025): I pushed a version here: https://ollama.com/mikestaub/apriel-1.5
Author
Owner

@rick-github commented on GitHub (Oct 1, 2025):

The "num_ctx": 131072 is going to catch some people off guard.

<!-- gh-comment-id:3357546541 --> @rick-github commented on GitHub (Oct 1, 2025): The `"num_ctx": 131072` is going to catch some people off guard.
Author
Owner

@mikestaub commented on GitHub (Oct 1, 2025):

@rick-github thanks for the catch, I changed it back to 262400

<!-- gh-comment-id:3357755809 --> @mikestaub commented on GitHub (Oct 1, 2025): @rick-github thanks for the catch, I changed it back to 262400
Author
Owner

@rick-github commented on GitHub (Oct 1, 2025):

Not sure if you are making a joke or not. Setting num_ctx in the parameter list will cause ollama to allocate a context buffer to hold that many tokens when the model is loaded. Not many people will have 79G of VRAM/RAM available.

<!-- gh-comment-id:3357782544 --> @rick-github commented on GitHub (Oct 1, 2025): Not sure if you are making a joke or not. Setting `num_ctx` in the parameter list will cause ollama to allocate a context buffer to hold that many tokens when the model is loaded. Not many people will have 79G of VRAM/RAM available.
Author
Owner

@mikestaub commented on GitHub (Oct 1, 2025):

Ah thanks for the clarification, I just removed it entirely then. I thought that was the way we told ollama what the max window size could be.

<!-- gh-comment-id:3357934800 --> @mikestaub commented on GitHub (Oct 1, 2025): Ah thanks for the clarification, I just removed it entirely then. I thought that was the way we told ollama what the max window size *could* be.
Author
Owner

@brianjking commented on GitHub (Oct 2, 2025):

following, which is the best quant for vision + text?

<!-- gh-comment-id:3361638903 --> @brianjking commented on GitHub (Oct 2, 2025): following, which is the best quant for vision + text?
Author
Owner

@rick-github commented on GitHub (Oct 2, 2025):

FP16.

<!-- gh-comment-id:3361653105 --> @rick-github commented on GitHub (Oct 2, 2025): FP16.
Author
Owner

@brianjking commented on GitHub (Oct 2, 2025):

I understand that, but there are several GGUFs people have created. It seems the chat template isn't yet fully implemented.

I know ServiceNow mentions to use their system prompt and to provide all additional context in the user message in the Apriel paper.

<!-- gh-comment-id:3361690917 --> @brianjking commented on GitHub (Oct 2, 2025): I understand that, but there are several GGUFs people have created. It seems the chat template isn't yet fully implemented. I know ServiceNow mentions to use their system prompt and to provide all additional context in the user message in the Apriel paper.
Author
Owner

@rick-github commented on GitHub (Oct 2, 2025):

The template above is derived from the original chat_template.json.

<!-- gh-comment-id:3361738424 --> @rick-github commented on GitHub (Oct 2, 2025): The template above is derived from the original [chat_template.json](https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker/blob/main/chat_template.json).
Author
Owner

@rick-github commented on GitHub (Oct 2, 2025):

Updated the template to remove an extraneous end token and add a missing EOS token.

<!-- gh-comment-id:3363648266 --> @rick-github commented on GitHub (Oct 2, 2025): Updated the template to remove an extraneous end token and add a missing EOS token.
Author
Owner

@maternion commented on GitHub (Oct 4, 2025):

@rick-github If models like these which are Multimodal can just work with the correct template and using ollama create then why can't the same thing be done for Qwen3-vl for example? Apriel was made using Pixtral right? So is Pixtral supported by Ollama or some other reason?

<!-- gh-comment-id:3367986360 --> @maternion commented on GitHub (Oct 4, 2025): @rick-github If models like these which are Multimodal can just work with the correct template and using `ollama create` then why can't the same thing be done for Qwen3-vl for example? Apriel was made using Pixtral right? So is Pixtral supported by Ollama or some other reason?
Author
Owner

@rick-github commented on GitHub (Oct 4, 2025):

It's not a matter of the template, it's a matter of ollama (and other inference engines) supporting the architecture of the model. In this case (and with Pixtral), the text architecture is llava, which has been supported by ollama for a long time, and pixtral for the vision architecture, which is a more recent addition. Qwen3-VL has a different architecture, qwen3_vl_moe, which is not yet supported. You can check the architecture of a new model by looking in the config.json file of the safetensors format for the model_type field.

The template here is just for enabling the tool and think APIs for ollama.

As an example of the differences in templates, you can run Pixtral using this Modelfile:

FROM pixtral-12b-Q4_K_M.gguf
FROM mmproj-pixtral-12b-Q8_0.gguf
TEMPLATE [INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]
PARAMETER stop [INST]
PARAMETER stop [/INST]

Same architecture, different template.

<!-- gh-comment-id:3368147273 --> @rick-github commented on GitHub (Oct 4, 2025): It's not a matter of the template, it's a matter of ollama (and other inference engines) supporting the architecture of the model. In this case (and with Pixtral), the text architecture is `llava`, which has been supported by ollama for a long time, and `pixtral` for the vision architecture, which is a more recent addition. Qwen3-VL has a different architecture, `qwen3_vl_moe`, which is not yet supported. You can check the architecture of a new model by looking in the `config.json` file of the safetensors format for the [`model_type`](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct/blob/main/config.json#L6) field. The template here is just for enabling the tool and think APIs for ollama. As an example of the differences in templates, you can run [Pixtral](https://huggingface.co/ggml-org/pixtral-12b-GGUF/tree/main) using this Modelfile: ```modelfile FROM pixtral-12b-Q4_K_M.gguf FROM mmproj-pixtral-12b-Q8_0.gguf TEMPLATE [INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST] PARAMETER stop [INST] PARAMETER stop [/INST] ``` Same architecture, different template.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34044