[GH-ISSUE #12126] [Model Request] NVIDIA-Nemotron-Nano-9B-v2 #54570

Closed
opened 2026-04-29 06:23:15 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @BoguesUser on GitHub (Aug 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12126

Nemotron support has been added upstream to llama.cpp in b6316

Any plans to also add it to ollama?

Originally created by @BoguesUser on GitHub (Aug 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12126 Nemotron support has been added upstream to llama.cpp [in b6316](https://github.com/ggml-org/llama.cpp/releases/tag/b6315) Any plans to also add it to ollama?
GiteaMirror added the model label 2026-04-29 06:23:15 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 30, 2025):

It will be added when the next vendor sync happens.

<!-- gh-comment-id:3239413865 --> @rick-github commented on GitHub (Aug 30, 2025): It will be added when the next vendor sync happens.
Author
Owner

@NEWbie0709 commented on GitHub (Sep 2, 2025):

Can I know when the next vendor sync will be?

<!-- gh-comment-id:3243750306 --> @NEWbie0709 commented on GitHub (Sep 2, 2025): Can I know when the next vendor sync will be?
Author
Owner

@donhuvy commented on GitHub (Sep 3, 2025):

I experiment NVIDIA Nemotron 9B v2 today, it is great reasoning model with only 9B parameters. Let's integration it to Olama.

<!-- gh-comment-id:3247519421 --> @donhuvy commented on GitHub (Sep 3, 2025): I experiment NVIDIA Nemotron 9B v2 today, it is great reasoning model with only 9B parameters. Let's integration it to Olama.
Author
Owner

@NEWbie0709 commented on GitHub (Sep 5, 2025):

I think Ollama was updated today, but it still doesn’t support the Nemotron model.

<!-- gh-comment-id:3257027515 --> @NEWbie0709 commented on GitHub (Sep 5, 2025): I think Ollama was updated today, but it still doesn’t support the Nemotron model.
Author
Owner

@ithax-wb commented on GitHub (Sep 8, 2025):

It’s a hybrid model. Can you also support the think and no_think option in the API.

P.S There is also a 12B variant of this model

<!-- gh-comment-id:3267015000 --> @ithax-wb commented on GitHub (Sep 8, 2025): It’s a hybrid model. Can you also support the think and no_think option in the API. P.S There is also a 12B variant of this model
Author
Owner

@rick-github commented on GitHub (Sep 14, 2025):

Vendor sync in progress in https://github.com/ollama/ollama/pull/12245.

<!-- gh-comment-id:3289950757 --> @rick-github commented on GitHub (Sep 14, 2025): Vendor sync in progress in https://github.com/ollama/ollama/pull/12245.
Author
Owner

@Chovus13 commented on GitHub (Oct 7, 2025):

Please friends, add this model to Ollama. Thanks in advance

<!-- gh-comment-id:3374755323 --> @Chovus13 commented on GitHub (Oct 7, 2025): Please friends, add this model to Ollama. Thanks in advance
Author
Owner

@jozzo402 commented on GitHub (Oct 14, 2025):

Vendor sync in progress in #12245.

The linked pull request has been merged, is this model supported now?

<!-- gh-comment-id:3401271091 --> @jozzo402 commented on GitHub (Oct 14, 2025): > Vendor sync in progress in [#12245](https://github.com/ollama/ollama/pull/12245). The linked pull request has been merged, is this model supported now?
Author
Owner

@harbachonak commented on GitHub (Oct 17, 2025):

Hi! @rick-github Are there any updates on this? Can't find the model yet

<!-- gh-comment-id:3414869941 --> @harbachonak commented on GitHub (Oct 17, 2025): Hi! @rick-github Are there any updates on this? Can't find the model yet
Author
Owner

@rick-github commented on GitHub (Oct 17, 2025):

The model can be pulled from hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF. However, the template for that model is not great, I'm testing a rewrite and will post it here shortly.

<!-- gh-comment-id:3414944030 --> @rick-github commented on GitHub (Oct 17, 2025): The model can be pulled from hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF. However, the template for that model is not great, I'm testing a rewrite and will post it here shortly.
Author
Owner

@rick-github commented on GitHub (Oct 17, 2025):

FROM hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q6_K_L
TEMPLATE """<SPECIAL_10>System
{{- if or .System .Tools }}
{{- if .System }}
{{ .System }}
{{ end }}
{{- if .Tools }}
You can use the following tools to assist the user if required:
<AVAILABLE_TOOLS>[
{{- range $i, $tool := $.Tools }}
{{- $last := eq (len (slice $.Tools $i)) 1 -}}
{"description": "{{ .Function.Description }}", "name": "{{ .Function.Name }}", "parameters": {"properties": {{ .Function.Parameters.Properties |json }}, "required": {{ .Function.Parameters.Required |json }} } }
{{- if not $last }}{{ `, ` }}
{{- end }}
{{- end -}}
]</AVAILABLE_TOOLS>

If you decide to call any tool(s), use the following format:
<TOOLCALL>[{"name": "tool_name1", "arguments": "tool_args1"}, {"name": "tool_name2", "arguments": "tool_args2"}]</TOOLCALL>

The user will execute tool-calls and return responses from tool(s) in this format:
<TOOL_RESPONSE>[{"tool_response1"}, {"tool_response2"}]</TOOL_RESPONSE>

Based on the tool responses, you can call additional tools if needed, correct tool calls if any errors are found, or just respond to the user.
{{ end }}
{{- end }}

{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<SPECIAL_11>User
{{ .Content }}
{{ else if eq .Role "tool" }}<SPECIAL_11>User
<TOOL_RESPONSE>[
{{- .Content -}}
]</TOOL_RESPONSE>
{{ else if eq .Role "assistant" }}<SPECIAL_11>Assistant
{{- if and $.IsThinkSet (and $last and (.Thinking false)) -}}
<think>
{{ .Thinking }}
</think>
{{ end }}
{{- if .Content }}
{{ .Content }}
{{- end }}
{{- if .ToolCalls }}
<TOOLCALL>[
{{- range $i, $_ := .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ end -}}
]</TOOLCALL>
{{- end }}
<SPECIAL_12>
{{ end }}
{{- if and $last (ne .Role "assistant") }}<SPECIAL_11>Assistant
<think>
{{- if and $.IsThinkSet (not $.Think) -}}
</think>
{{- end }}
{{- end }}
{{- end }}
"""
PARAMETER stop <SPECIAL_10>
PARAMETER stop <SPECIAL_11>
PARAMETER stop <SPECIAL_12>
PARAMETER stop <think>

Save this as Modelfile. Then run:

ollama pull hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q6_K_L
ollama create nemotron-nano:9b-v2-q6_K_L -f Modelfile

Feel free to change Q6_K_L/q6_K_L to a different quantization.

The modelfile adds thinking and tool support and in my quick tests is seems to work fine, but YMMV.

$ ollama run nemotron-nano:9b-v2-q6_K_L hello
Thinking...
Okay, the user just said "hello". That's a greeting. I should respond politely and offer help. Let me make sure my response is friendly and open-ended so they feel comfortable asking questions or sharing what's on their mind.
...done thinking.

Hello! How can I assist you today? 😊


$ ollama run nemotron-nano:9b-v2-q6_K_L hello --think=false
Hello! How can I assist you today? 😊

$ ./ollama-run.py nemotron-nano:9b-v2-q6_K_L what time is it --tool get_datetime
Thinking...
Okay, the user asked "what time is it". I need to provide the current time. The available tool is get_datetime, which returns the current date and time in a specific format. Since the user didn't specify a timezone, I should use the local time as per the tool's description. The parameters for get_datetime include an optional timezone_name, but since it's not required and the user didn't mention any specific timezone, I can leave it blank or assume local time. So I'll call get_datetime without specifying the timezone to get the current local time.

...done thinking

calling get_datetime({'timezone_name': ''})
Thinking...
Okay, the user asked "what time is it". I need to provide the current time. The tool get_datetime was called without a timezone, so it returned local time. The response from the tool shows the time as 18:29. I should format this in a friendly way. Let me check the date and time details again. The fulldate is Friday, October 17, 2025 at 18:29. Since the user didn't specify a timezone, using local time is correct. I'll just state the current time clearly.

...done thinking
The current time is **18:29** (6:29 PM) on Friday, October 17, 2025. Let me know if you need help with anything else!

<!-- gh-comment-id:3416288252 --> @rick-github commented on GitHub (Oct 17, 2025): ```dockerfile FROM hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q6_K_L TEMPLATE """<SPECIAL_10>System {{- if or .System .Tools }} {{- if .System }} {{ .System }} {{ end }} {{- if .Tools }} You can use the following tools to assist the user if required: <AVAILABLE_TOOLS>[ {{- range $i, $tool := $.Tools }} {{- $last := eq (len (slice $.Tools $i)) 1 -}} {"description": "{{ .Function.Description }}", "name": "{{ .Function.Name }}", "parameters": {"properties": {{ .Function.Parameters.Properties |json }}, "required": {{ .Function.Parameters.Required |json }} } } {{- if not $last }}{{ `, ` }} {{- end }} {{- end -}} ]</AVAILABLE_TOOLS> If you decide to call any tool(s), use the following format: <TOOLCALL>[{"name": "tool_name1", "arguments": "tool_args1"}, {"name": "tool_name2", "arguments": "tool_args2"}]</TOOLCALL> The user will execute tool-calls and return responses from tool(s) in this format: <TOOL_RESPONSE>[{"tool_response1"}, {"tool_response2"}]</TOOL_RESPONSE> Based on the tool responses, you can call additional tools if needed, correct tool calls if any errors are found, or just respond to the user. {{ end }} {{- end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1}} {{- if eq .Role "user" }}<SPECIAL_11>User {{ .Content }} {{ else if eq .Role "tool" }}<SPECIAL_11>User <TOOL_RESPONSE>[ {{- .Content -}} ]</TOOL_RESPONSE> {{ else if eq .Role "assistant" }}<SPECIAL_11>Assistant {{- if and $.IsThinkSet (and $last and (.Thinking false)) -}} <think> {{ .Thinking }} </think> {{ end }} {{- if .Content }} {{ .Content }} {{- end }} {{- if .ToolCalls }} <TOOLCALL>[ {{- range $i, $_ := .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ end -}} ]</TOOLCALL> {{- end }} <SPECIAL_12> {{ end }} {{- if and $last (ne .Role "assistant") }}<SPECIAL_11>Assistant <think> {{- if and $.IsThinkSet (not $.Think) -}} </think> {{- end }} {{- end }} {{- end }} """ PARAMETER stop <SPECIAL_10> PARAMETER stop <SPECIAL_11> PARAMETER stop <SPECIAL_12> PARAMETER stop <think> ``` Save this as `Modelfile`. Then run: ``` ollama pull hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q6_K_L ollama create nemotron-nano:9b-v2-q6_K_L -f Modelfile ``` Feel free to change Q6_K_L/q6_K_L to a different quantization. The modelfile adds thinking and tool support and in my quick tests is seems to work fine, but YMMV. ```console $ ollama run nemotron-nano:9b-v2-q6_K_L hello Thinking... Okay, the user just said "hello". That's a greeting. I should respond politely and offer help. Let me make sure my response is friendly and open-ended so they feel comfortable asking questions or sharing what's on their mind. ...done thinking. Hello! How can I assist you today? 😊 $ ollama run nemotron-nano:9b-v2-q6_K_L hello --think=false Hello! How can I assist you today? 😊 $ ./ollama-run.py nemotron-nano:9b-v2-q6_K_L what time is it --tool get_datetime Thinking... Okay, the user asked "what time is it". I need to provide the current time. The available tool is get_datetime, which returns the current date and time in a specific format. Since the user didn't specify a timezone, I should use the local time as per the tool's description. The parameters for get_datetime include an optional timezone_name, but since it's not required and the user didn't mention any specific timezone, I can leave it blank or assume local time. So I'll call get_datetime without specifying the timezone to get the current local time. ...done thinking calling get_datetime({'timezone_name': ''}) Thinking... Okay, the user asked "what time is it". I need to provide the current time. The tool get_datetime was called without a timezone, so it returned local time. The response from the tool shows the time as 18:29. I should format this in a friendly way. Let me check the date and time details again. The fulldate is Friday, October 17, 2025 at 18:29. Since the user didn't specify a timezone, using local time is correct. I'll just state the current time clearly. ...done thinking The current time is **18:29** (6:29 PM) on Friday, October 17, 2025. Let me know if you need help with anything else! ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54570