[GH-ISSUE #8517] Missing tool support for DeepSeek-R1 Distillates based on Qwen #67546

Closed
opened 2026-05-04 10:45:03 -05:00 by GiteaMirror · 67 comments
Owner

Originally created by @odrobnik on GitHub (Jan 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8517

What is the issue?

I tried deepseek-r1:70B and ollama claims that it doesn't support tools.

{
  "error": {
    "message": "registry.ollama.ai/library/deepseek-r1:70B does not support tools",
    "type": "api_error",
    "param": null,
    "code": null
  }

Looks to me like the template you have is missing the rules for tools.

The current Ollama template:

{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<|User|>{{ .Content }}
{{- else if eq .Role "assistant" }}<|Assistant|>{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }}
{{- end }}
{{- if and $last (ne .Role "assistant") }}<|Assistant|>{{- end }}
{{- end }}

The template from https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF has tool calls stuff:

{% if not add_generation_prompt is defined %}
    {% set add_generation_prompt = false %}
{% endif %}
{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
{%- for message in messages -%}
    {%- if message['role'] == 'system' -%}
        {% set ns.system_prompt = message['content'] %}
    {%- endif -%}
{%- endfor -%}
{{ bos_token }}{{ ns.system_prompt }}
{%- for message in messages -%}
    {%- if message['role'] == 'user' -%}
        {%- set ns.is_tool = false -%}
        {{ '<|User|>' + message['content'] }}
    {%- endif -%}
    
    {%- if message['role'] == 'assistant' and message['content'] is none -%}
        {%- set ns.is_tool = false -%}
        {%- for tool in message['tool_calls'] -%}
            {%- if not ns.is_first -%}
                {{ '<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>' }}
                {%- set ns.is_first = true -%}
            {%- else -%}
                {{ '\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>' }}
                {{ '<|tool▁calls▁end|><|end▁of▁sentence|>' }}
            {%- endif -%}
        {%- endfor -%}
    {%- endif -%}
    
    {%- if message['role'] == 'assistant' and message['content'] is not none -%}
        {%- if ns.is_tool -%}
            {{ '<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>' }}
            {%- set ns.is_tool = false -%}
        {%- else -%}
            {% set content = message['content'] %}
            {% if '</think>' in content %}
                {% set content = content.split('</think>')[-1] %}
            {% endif %}
            {{ '<|Assistant|>' + content + '<|end▁of▁sentence|>' }}
        {%- endif -%}
    {%- endif -%}
    
    {%- if message['role'] == 'tool' -%}
        {%- set ns.is_tool = true -%}
        {%- if ns.is_output_first -%}
            {{ '<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>' }}
            {%- set ns.is_output_first = false -%}
        {%- else -%}
            {{ '\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>' }}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}

{% if ns.is_tool %}
    {{ '<|tool▁outputs▁end|>' }}
{% endif %}

{% if add_generation_prompt and not ns.is_tool %}
    {{ '<|Assistant|>' }}
{% endif %}

OS

macOS

GPU

Apple

CPU

No response

Ollama version

0.5.7

Originally created by @odrobnik on GitHub (Jan 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8517 ### What is the issue? I tried `deepseek-r1:70B` and ollama claims that it doesn't support tools. ``` { "error": { "message": "registry.ollama.ai/library/deepseek-r1:70B does not support tools", "type": "api_error", "param": null, "code": null } ``` Looks to me like the template you have is missing the rules for tools. The current Ollama template: ``` {{- if .System }}{{ .System }}{{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1}} {{- if eq .Role "user" }}<|User|>{{ .Content }} {{- else if eq .Role "assistant" }}<|Assistant|>{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }} {{- end }} {{- if and $last (ne .Role "assistant") }}<|Assistant|>{{- end }} {{- end }} ``` The template from https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF has tool calls stuff: ``` {% if not add_generation_prompt is defined %} {% set add_generation_prompt = false %} {% endif %} {% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %} {%- for message in messages -%} {%- if message['role'] == 'system' -%} {% set ns.system_prompt = message['content'] %} {%- endif -%} {%- endfor -%} {{ bos_token }}{{ ns.system_prompt }} {%- for message in messages -%} {%- if message['role'] == 'user' -%} {%- set ns.is_tool = false -%} {{ '<|User|>' + message['content'] }} {%- endif -%} {%- if message['role'] == 'assistant' and message['content'] is none -%} {%- set ns.is_tool = false -%} {%- for tool in message['tool_calls'] -%} {%- if not ns.is_first -%} {{ '<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>' }} {%- set ns.is_first = true -%} {%- else -%} {{ '\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>' }} {{ '<|tool▁calls▁end|><|end▁of▁sentence|>' }} {%- endif -%} {%- endfor -%} {%- endif -%} {%- if message['role'] == 'assistant' and message['content'] is not none -%} {%- if ns.is_tool -%} {{ '<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>' }} {%- set ns.is_tool = false -%} {%- else -%} {% set content = message['content'] %} {% if '</think>' in content %} {% set content = content.split('</think>')[-1] %} {% endif %} {{ '<|Assistant|>' + content + '<|end▁of▁sentence|>' }} {%- endif -%} {%- endif -%} {%- if message['role'] == 'tool' -%} {%- set ns.is_tool = true -%} {%- if ns.is_output_first -%} {{ '<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>' }} {%- set ns.is_output_first = false -%} {%- else -%} {{ '\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>' }} {%- endif -%} {%- endif -%} {%- endfor -%} {% if ns.is_tool %} {{ '<|tool▁outputs▁end|>' }} {% endif %} {% if add_generation_prompt and not ns.is_tool %} {{ '<|Assistant|>' }} {% endif %} ``` ### OS macOS ### GPU Apple ### CPU _No response_ ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-05-04 10:45:03 -05:00
Author
Owner

@cbjuan commented on GitHub (Jan 22, 2025):

I observe the same error for the deepseek-r1:1.5b model

{'error': {'message': 'registry.ollama.ai/library/deepseek-r1:1.5b does not support tools', 'type': 'api_error', 'param': None, 'code': None}}
<!-- gh-comment-id:2607942006 --> @cbjuan commented on GitHub (Jan 22, 2025): I observe the same error for the `deepseek-r1:1.5b` model ``` {'error': {'message': 'registry.ollama.ai/library/deepseek-r1:1.5b does not support tools', 'type': 'api_error', 'param': None, 'code': None}} ```
Author
Owner

@IpslWon commented on GitHub (Jan 22, 2025):

The same issue is observed in deepseek-r1:8b

OS
Ubuntu 22.04.5 LTS
GPU
Nvidia
Ollama Version
0.5.7

<!-- gh-comment-id:2608300030 --> @IpslWon commented on GitHub (Jan 22, 2025): The same issue is observed in deepseek-r1:8b **OS** Ubuntu 22.04.5 LTS **GPU** Nvidia **Ollama Version** 0.5.7
Author
Owner

@arraylabs commented on GitHub (Jan 22, 2025):

And deepseek-r1:14b
Windows 11
Nvidia latest drivers
Latest Ollama

<!-- gh-comment-id:2608439122 --> @arraylabs commented on GitHub (Jan 22, 2025): And deepseek-r1:14b Windows 11 Nvidia latest drivers Latest Ollama
Author
Owner

@oceanapplications commented on GitHub (Jan 23, 2025):

Same on deepseek-r1:8b on MacOS.

<!-- gh-comment-id:2608874933 --> @oceanapplications commented on GitHub (Jan 23, 2025): Same on deepseek-r1:8b on MacOS.
Author
Owner

@majian159 commented on GitHub (Jan 23, 2025):

It seems that DeepSeek’s Reasoning model, including R1, currently does not support FunctionCall.
However, DeepSeek v3 supports FunctionCall.
For more details, refer to: https://api-docs.deepseek.com/guides/reasoning_model#api-parameters

Image
<!-- gh-comment-id:2610119930 --> @majian159 commented on GitHub (Jan 23, 2025): It seems that DeepSeek’s Reasoning model, including R1, currently does not support FunctionCall. However, DeepSeek v3 supports FunctionCall. For more details, refer to: https://api-docs.deepseek.com/guides/reasoning_model#api-parameters <img width="905" alt="Image" src="https://github.com/user-attachments/assets/ab23667c-d8ad-456e-a6e7-fdc24d1f8a69" />
Author
Owner

@odrobnik commented on GitHub (Jan 23, 2025):

@majian159 That might be true, but the model that Ollama calls deepseek-r1 isn't the same, those are distilled versions where they took the 600k reasoning traces and fine-tuned Qwen 2.5 and Llama 3 with them.

Qwen 2.5 supports tool calls very well and so should be the deepseek r1 distillates based on it.

I've been experimenting many hours with replacing the template for deepseek-r1:32B with one that I borrowed from Qwen2.5. It works quite well and is calling functions properly. The only problem I am unhappy with at this point is that it loses the initial <think> in the final response. The original template didn't have this issue. It must be something with the white space handing in go templates, which I haven't figured out yet.

Original template:

{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<|User|>{{ .Content }}
{{- else if eq .Role "assistant" }}<|Assistant|>{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }}
{{- end }}
{{- if and $last (ne .Role "assistant") }}<|Assistant|>{{- end }}
{{- end }}

This shows the <think> and </think> tokens.

My best attempt so far:

{{- if .Messages }}
{{- if or .System .Tools }}system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a JSON object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|end▁of▁sentence|>
{{ end }}

{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|begin▁of▁sentence|>user
{{ .Content }}<|end▁of▁sentence|>

{{- else if eq .Role "assistant" }}<|begin▁of▁sentence|>assistant
{{- if .Content }}
{{ .Content }}
{{- end }}
{{- if .ToolCalls }}
<tool_call>
{{- range .ToolCalls }}
{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}
</tool_call>
{{- end }}
{{- if not $last }}<|end▁of▁sentence|>{{ end }}

{{- else if eq .Role "tool" }}<|begin▁of▁sentence|>user
<tool_response>
{{ .Content }}
</tool_response><|end▁of▁sentence|>
{{- end }}

{{- if and (ne .Role "assistant") $last }}<|begin▁of▁sentence|>assistant
{{- end }}

{{- end }}

{{- else }}
{{- if .System }}<|begin▁of▁sentence|>system
{{ .System }}<|end▁of▁sentence|>
{{- end }}

{{- if .Prompt }}<|begin▁of▁sentence|>user
{{ .Prompt }}<|end▁of▁sentence|>
{{- end }}

<|begin▁of▁sentence|>assistant
{{ .Response }}
{{- if .Response }}<|end▁of▁sentence|>{{ end }}
{{- end }}

This does tool calling, but a message without tool calls gets the <think> chomped off somehow.

I've changed the <|im_start|> and <|im_end|> to <|begin▁of▁sentence|> and <|end▁of▁sentence|> respectively. And also removed the outermost BOS and EOS because I would get a warning about duplicate BOS. Without those tags I think the model doesn't see the tool responses.

I think the problem has something to do with {{- in go templates, those would remove whitespace and also a leading tag. But I am a noob at these things.

Who can help give it the final polish?

Also a question: I tried multiple things, but as soon as I have the section on the tool calls, there is never any content. But without the tool calls part I see thinking with embedded tool calls. Does Ollama remove all content if there are tool calls in it?

<!-- gh-comment-id:2610160129 --> @odrobnik commented on GitHub (Jan 23, 2025): @majian159 That might be true, but the model that Ollama calls [deepseek-r1](https://ollama.com/library/deepseek-r1) isn't the same, those are distilled versions where they took the 600k reasoning traces and fine-tuned Qwen 2.5 and Llama 3 with them. Qwen 2.5 supports tool calls very well and so should be the deepseek r1 distillates based on it. I've been experimenting many hours with replacing the template for deepseek-r1:32B with one that I borrowed from Qwen2.5. It works quite well and is calling functions properly. The only problem I am unhappy with at this point is that it loses the initial `<think>` in the final response. The original template didn't have this issue. It must be something with the white space handing in go templates, which I haven't figured out yet. Original template: ``` {{- if .System }}{{ .System }}{{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1}} {{- if eq .Role "user" }}<|User|>{{ .Content }} {{- else if eq .Role "assistant" }}<|Assistant|>{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }} {{- end }} {{- if and $last (ne .Role "assistant") }}<|Assistant|>{{- end }} {{- end }} ``` This shows the `<think>` and `</think>` tokens. My best attempt so far: ``` {{- if .Messages }} {{- if or .System .Tools }}system {{- if .System }} {{ .System }} {{- end }} {{- if .Tools }} # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {"type": "function", "function": {{ .Function }}} {{- end }} </tools> For each function call, return a JSON object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{- end }}<|end▁of▁sentence|> {{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|begin▁of▁sentence|>user {{ .Content }}<|end▁of▁sentence|> {{- else if eq .Role "assistant" }}<|begin▁of▁sentence|>assistant {{- if .Content }} {{ .Content }} {{- end }} {{- if .ToolCalls }} <tool_call> {{- range .ToolCalls }} {"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{- end }} </tool_call> {{- end }} {{- if not $last }}<|end▁of▁sentence|>{{ end }} {{- else if eq .Role "tool" }}<|begin▁of▁sentence|>user <tool_response> {{ .Content }} </tool_response><|end▁of▁sentence|> {{- end }} {{- if and (ne .Role "assistant") $last }}<|begin▁of▁sentence|>assistant {{- end }} {{- end }} {{- else }} {{- if .System }}<|begin▁of▁sentence|>system {{ .System }}<|end▁of▁sentence|> {{- end }} {{- if .Prompt }}<|begin▁of▁sentence|>user {{ .Prompt }}<|end▁of▁sentence|> {{- end }} <|begin▁of▁sentence|>assistant {{ .Response }} {{- if .Response }}<|end▁of▁sentence|>{{ end }} {{- end }} ``` This does tool calling, but a message without tool calls gets the `<think>` chomped off somehow. I've changed the `<|im_start|>` and `<|im_end|>` to `<|begin▁of▁sentence|>` and `<|end▁of▁sentence|>` respectively. And also removed the outermost BOS and EOS because I would get a warning about duplicate BOS. Without those tags I think the model doesn't see the tool responses. I think the problem has something to do with `{{-` in go templates, those would remove whitespace and also a leading tag. But I am a noob at these things. Who can help give it the final polish? Also a question: I tried multiple things, but as soon as I have the section on the tool calls, there is never any content. But without the tool calls part I see thinking with embedded tool calls. Does Ollama remove all content if there are tool calls in it?
Author
Owner

@nikito commented on GitHub (Jan 23, 2025):

I thought I had read somewhere that the distillation process breaks function calling, or at least has a negative impact on it?

<!-- gh-comment-id:2610329302 --> @nikito commented on GitHub (Jan 23, 2025): I thought I had read somewhere that the distillation process breaks function calling, or at least has a negative impact on it?
Author
Owner

@odrobnik commented on GitHub (Jan 23, 2025):

I thought I had read somewhere that the distillation process breaks function calling, or at least has a negative impact on it?

What you are probably referring to is that during training of the first DeepSeek-R1 Zero they had issues with mixing languages and basic functionality, but that didn't affect the distills.

The function calling works fine in my tests. My only problem is that something removes the leading <think> tag from the response stream.

<!-- gh-comment-id:2610434978 --> @odrobnik commented on GitHub (Jan 23, 2025): > I thought I had read somewhere that the distillation process breaks function calling, or at least has a negative impact on it? What you are probably referring to is that during training of the first DeepSeek-R1 Zero they had issues with mixing languages and basic functionality, but that didn't affect the distills. The function calling works fine in my tests. My only problem is that something removes the leading `<think>` tag from the response stream.
Author
Owner

@odrobnik commented on GitHub (Jan 23, 2025):

I found that the problem is in Ollama itself, as soon as you add tool support to the template the leading <think> tag gets cut off: #8552

<!-- gh-comment-id:2610476418 --> @odrobnik commented on GitHub (Jan 23, 2025): I found that the problem is in Ollama itself, as soon as you add tool support to the template the leading `<think>` tag gets cut off: #8552
Author
Owner

@odrobnik commented on GitHub (Jan 23, 2025):

PS: I just tested function calling on LM Server, with this DeepSeek-R1 Qwen 32B distill: https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-3bit:

Image Image Image

The model creates thinking token always, but they get discarded for tool calls. Nevertheless they come through in the final message. I'm doing a collapsible section showing them.

Of course I can hack it such that I add a leading <think> if I see a </think> in the content, but that's definitely an ollama bug.

<!-- gh-comment-id:2610522663 --> @odrobnik commented on GitHub (Jan 23, 2025): PS: I just tested function calling on LM Server, with this DeepSeek-R1 Qwen 32B distill: https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-3bit: <img width="625" alt="Image" src="https://github.com/user-attachments/assets/026fa43c-23e8-4811-b97e-1301d99aacaa" /> <img width="627" alt="Image" src="https://github.com/user-attachments/assets/7404febb-ee16-4ee8-8c33-a58c4f77bd26" /> <img width="629" alt="Image" src="https://github.com/user-attachments/assets/fd9ff1dd-6e91-41b1-8d3f-2aefd7361596" /> The model creates thinking token always, but they get discarded for tool calls. Nevertheless they come through in the final message. I'm doing a collapsible section showing them. Of course I can hack it such that I add a leading `<think>` if I see a `</think>` in the content, but that's definitely an ollama bug.
Author
Owner

@odrobnik commented on GitHub (Jan 24, 2025):

I found that the problem with the disappearing <think> was caused by superfluous BOS tokens. Apparently those aren't necessary and the one that the tokenizer automatically adds is enough.

Another possible reason for the issues might be that several Qwen tokens used in the template might be missing due to a new pre-tokenizer that's not yet in llama.cpp #8547

Apparently this was released just now: https://github.com/ggerganov/llama.cpp/releases/tag/b4547

How can I tell that which llama.cpp version is used by a particular ollama version? @rick-github

<!-- gh-comment-id:2613362734 --> @odrobnik commented on GitHub (Jan 24, 2025): I found that the problem with the disappearing `<think>` was caused by superfluous BOS tokens. Apparently those aren't necessary and the one that the tokenizer automatically adds is enough. Another possible reason for the issues might be that several Qwen tokens used in the template might be missing due to a new pre-tokenizer that's not yet in llama.cpp #8547 Apparently this was released just now: https://github.com/ggerganov/llama.cpp/releases/tag/b4547 How can I tell that which llama.cpp version is used by a particular ollama version? @rick-github
Author
Owner

@rick-github commented on GitHub (Jan 24, 2025):

Latest vendor commit is in the header.

<!-- gh-comment-id:2613402078 --> @rick-github commented on GitHub (Jan 24, 2025): Latest vendor commit is in the [header](https://github.com/ollama/ollama/blob/main/llama/llama-cpp.h).
Author
Owner

@oceanapplications commented on GitHub (Jan 27, 2025):

Built latest Ollama from main and still the same no tools error.

<!-- gh-comment-id:2614697329 --> @oceanapplications commented on GitHub (Jan 27, 2025): Built latest Ollama from main and still the same no tools error.
Author
Owner

@tompipe commented on GitHub (Jan 27, 2025):

Could be related, or a combination of issues (or my current lack of knowledge in this realm), but I'm experiencing similar issues, and can share some findings which may help shed some light on the issue.

I'm using litellm, and when submitting a request which contains a tools object to a deepseek model on ollama, litellm performs a check to verify if the model supports function calling. It does this by retreiving the model info and "checks if the 'template' field in the ollama_model_info contains a 'tools' or 'function' key"

If litellm thinks the model doesn't support function calling (i.e no tools or function keywords in the template), then within the map_openai_params function in ollama_chat.py - it sets up some logic so the function_call_prompt will inject function support into the prompt.

In doing this, it also forces "format": "json" 😩

Whilst trying to debug, I've observed a few things, and I've verified these same issues with curl requests to ollama, adding the same adapted function calling prompt that litellm injects (as without this prompt 'massaging' I get the
{"error":{"message":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M does not support tools","type":"api_error","param":null,"code":null}} error).

I've tried these with the hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M and deepseek-r1 models

Posting to the /api/chat endpoint, if both "format": "json" and "stream": true is set in the request, I observe the following message in the ollama logs:

2025-01-27 10:50:16 time=2025-01-27T10:50:16.161Z level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached"

And I get repeating chunks like this in the responses, but no stop/done message (or think content):

{"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.606899556Z","message":{"role":"assistant","content":" \n\n"},"done":false}
{"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.627168908Z","message":{"role":"assistant","content":" \n\n"},"done":false}
{"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.646847462Z","message":{"role":"assistant","content":" \n\n"},"done":false}
{"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.667172833Z","message":{"role":"assistant","content":" \n\n"},"done":false}
{"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.700766745Z","message":{"role":"assistant","content":" \n\n"},"done":false}
<repeated about 25 times in total>

If I set "stream": false I get the same error in the logs, the following response, and no stop/done message (or think content)

{"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:51:47.19645733Z","message":{"role":"assistant","content":"{\n  \"name\": \"execute_shell_command\",\n  \"parameters\": {\n    \"shell_command\": \"echo 'Hello, World!' \u0026\u0026 date\"\n  }\n}\n \n\n  \t\t\t   \t                        \n\t\t\t                                                                \n\t\t\t                                                                 \n\t\t\t            \n\t\t            \n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t  \n\t\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\n\n\n\n  \t\t\t\t\t\t\t\n\t\t\n"},"done":false}

Without "format": "json" (and either non/streaming) the response completes with think content, and what appears to be a correctly formatted function call in the message, along with a stop/done message, but "done_reason" is always stop, and not "function_call"

Posting to the /v1/chat/completions openai compatible endpoint, "format": "json" and "stream": true work together here, and I don't see the superflous \n \t tokens in any messages, again though, the stop message doesn't indicate a function call, despite the model generating one in the message.

If I switch to a deepseek model such as MFDoom/deepseek-r1-tool-calling:latest, which reports to litellm that it supports function calling support (and therefore litellm doesn't inject to the prompt, and the "tools" object remains in the request).

I don't get the does not support tools error from ollama, and the model generates a response, but I see similar issues when posting to the /api/chat endpoint when "format": "json"

2025-01-27 11:38:03 time=2025-01-27T11:38:03.501Z level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached"

And never seem to get a done/end response, nor a correct "finish_reason": "tool_calls" message

However, posting to the /v1/chat/completions endpoint, with any combination of "format": "json" and "stream": true I do get responses running to completion without errors, valid tool call responses, but no think content.

If "stream": true then I get a chunk which looks like a valid function call response, except "finish_reason": null followed by another chunk with "finish_reason": "stop"

With streaming off, I get a valid tool call response (but no think content 😒):

{
    "id": "chatcmpl-53",
    "object": "chat.completion",
    "created": 1737978619,
    "model": "MFDoom/deepseek-r1-tool-calling:latest",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "call_j7shlgzh",
                        "index": 0,
                        "type": "function",
                        "function": {
                            "name": "execute_shell_command",
                            "arguments": "{\"shell_command\":\"echo 'hello' | exit\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 163,
        "completion_tokens": 220,
        "total_tokens": 383
    }
}
<!-- gh-comment-id:2615698454 --> @tompipe commented on GitHub (Jan 27, 2025): Could be related, or a combination of issues (or my current lack of knowledge in this realm), but I'm experiencing similar issues, and can share some findings which may help shed some light on the issue. I'm using litellm, and when submitting a request which contains a tools object to a deepseek model on ollama, litellm performs a check to verify if the model supports function calling. It does [this](https://github.com/BerriAI/litellm/blob/6bafdbc546a0c081b686e4044f4f552acc9ce41f/litellm/llms/ollama/completion/transformation.py#L180) by retreiving the model info and _"checks if the 'template' field in the ollama_model_info contains a 'tools' or 'function' key"_ If litellm thinks the model doesn't support function calling (i.e no tools or function keywords in the template), then within the [map_openai_params function](https://github.com/BerriAI/litellm/blob/6bafdbc546a0c081b686e4044f4f552acc9ce41f/litellm/llms/ollama_chat.py#L159-L194) in ollama_chat.py - it sets up some logic so the [function_call_prompt](https://github.com/BerriAI/litellm/blob/6bafdbc546a0c081b686e4044f4f552acc9ce41f/litellm/litellm_core_utils/prompt_templates/factory.py#L2966) will inject function support into the prompt. In doing this, it also forces ```"format": "json"``` 😩 Whilst trying to debug, I've observed a few things, and I've verified these same issues with curl requests to ollama, adding the same adapted function calling prompt that litellm injects (as without this prompt 'massaging' I get the ```{"error":{"message":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M does not support tools","type":"api_error","param":null,"code":null}}``` error). I've tried these with the ```hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M``` and ```deepseek-r1``` models Posting to the ```/api/chat``` endpoint, if both ```"format": "json"``` and ```"stream": true``` is set in the request, I observe the following message in the ollama logs: ``` 2025-01-27 10:50:16 time=2025-01-27T10:50:16.161Z level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached" ``` And I get repeating chunks like this in the responses, but no stop/done message (or think content): ``` {"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.606899556Z","message":{"role":"assistant","content":" \n\n"},"done":false} {"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.627168908Z","message":{"role":"assistant","content":" \n\n"},"done":false} {"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.646847462Z","message":{"role":"assistant","content":" \n\n"},"done":false} {"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.667172833Z","message":{"role":"assistant","content":" \n\n"},"done":false} {"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:50:15.700766745Z","message":{"role":"assistant","content":" \n\n"},"done":false} <repeated about 25 times in total> ``` If I set ```"stream": false``` I get the same error in the logs, the following response, and no stop/done message (or think content) ``` {"model":"hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-01-27T10:51:47.19645733Z","message":{"role":"assistant","content":"{\n \"name\": \"execute_shell_command\",\n \"parameters\": {\n \"shell_command\": \"echo 'Hello, World!' \u0026\u0026 date\"\n }\n}\n \n\n \t\t\t \t \n\t\t\t \n\t\t\t \n\t\t\t \n\t\t \n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \n\t\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\n\n\n\n \t\t\t\t\t\t\t\n\t\t\n"},"done":false} ``` Without ```"format": "json"``` (and either non/streaming) the response completes with think content, and what appears to be a correctly formatted function call in the message, along with a stop/done message, but "done_reason" is always stop, and not "function_call" Posting to the ```/v1/chat/completions``` openai compatible endpoint, ```"format": "json"``` and ```"stream": true``` work together here, and I don't see the superflous ```\n \t``` tokens in any messages, again though, the stop message doesn't indicate a function call, despite the model generating one in the message. If I switch to a deepseek model such as ```MFDoom/deepseek-r1-tool-calling:latest```, which reports to litellm that it supports function calling support (and therefore litellm doesn't inject to the prompt, and the ```"tools"``` object remains in the request). I don't get the ```does not support tools``` error from ollama, and the model generates a response, but I see similar issues when posting to the ```/api/chat``` endpoint when ```"format": "json"``` ``` 2025-01-27 11:38:03 time=2025-01-27T11:38:03.501Z level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached" ``` And never seem to get a done/end response, nor a correct ```"finish_reason": "tool_calls"``` message However, posting to the ```/v1/chat/completions``` endpoint, with any combination of ```"format": "json"``` and ```"stream": true``` I do get responses running to completion without errors, valid tool call responses, but no think content. If ```"stream": true``` then I get a chunk which looks like a valid function call response, except ```"finish_reason": null``` followed by another chunk with ```"finish_reason": "stop"``` With streaming off, I get a valid tool call response (but no think content 😒): ``` { "id": "chatcmpl-53", "object": "chat.completion", "created": 1737978619, "model": "MFDoom/deepseek-r1-tool-calling:latest", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_j7shlgzh", "index": 0, "type": "function", "function": { "name": "execute_shell_command", "arguments": "{\"shell_command\":\"echo 'hello' | exit\"}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 163, "completion_tokens": 220, "total_tokens": 383 } } ```
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Built latest Ollama from main and still the same no tools error.

Tool calling is a function of the model template, not the ollama binary.

it sets up some logic so the function_call_prompt will inject function support into the prompt.

This is a common way to get a non-tool using model to use tools: https://github.com/ollama/ollama/issues/6061

In doing this, it also forces "format": "json" 😩

It does this because if it doesn't models generate a lot of whitespace.

2025-01-27 10:50:16 time=2025-01-27T10:50:16.161Z level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached"

This is caused by the repeating whitespace.

If "stream": true then I get a chunk which looks like a valid function call response, except "finish_reason": null followed by another chunk with "finish_reason": "stop"

Streaming is currently not supported with tool support.

With streaming off, I get a valid tool call response (but no think content 😒):

You can't get a tool response and a text response in the same completion: https://github.com/ollama/ollama/issues/8337

<!-- gh-comment-id:2615743142 --> @rick-github commented on GitHub (Jan 27, 2025): > Built latest Ollama from main and still the same no tools error. Tool calling is a function of the model template, not the ollama binary. > it sets up some logic so the [function_call_prompt](https://github.com/BerriAI/litellm/blob/6bafdbc546a0c081b686e4044f4f552acc9ce41f/litellm/litellm_core_utils/prompt_templates/factory.py#L2966) will inject function support into the prompt. This is a common way to get a non-tool using model to use tools: https://github.com/ollama/ollama/issues/6061 > In doing this, it also forces "format": "json" 😩 It does this because if it doesn't models generate a lot of [whitespace](https://github.com/ollama/ollama/blob/2ef3c803a151a0a9b1776c9ebe6a7e86b3971660/docs/api.md?plain=1#L67). ``` 2025-01-27 10:50:16 time=2025-01-27T10:50:16.161Z level=DEBUG source=server.go:816 msg="prediction aborted, token repeat limit reached" ``` This is caused by the repeating whitespace. > If "stream": true then I get a chunk which looks like a valid function call response, except "finish_reason": null followed by another chunk with "finish_reason": "stop" Streaming is currently [not supported](https://github.com/ollama/ollama/issues/7886) with tool support. > With streaming off, I get a valid tool call response (but no think content 😒): You can't get a tool response and a text response in the same completion: https://github.com/ollama/ollama/issues/8337
Author
Owner

@tompipe commented on GitHub (Jan 27, 2025):

Thanks! That's helped clear a few things up. As I say, I'm brand new to this, and it's a pretty steep learning curve!

In doing this, it also forces "format": "json" 😩

It does this because if it doesn't models generate a lot of whitespace.

Yeah, my concern here was that allegedly Deepseek doesn't support json output, so forcing it on when the model might not support it, seemed a bit odd. And I seemed to get more whitespace issues with it forced on, than with it off.

Image

And I just found it strange that ollama's openai compatible endpoint doesn't have the same issues as the /api/chat when requesting json output

<!-- gh-comment-id:2615794923 --> @tompipe commented on GitHub (Jan 27, 2025): Thanks! That's helped clear a few things up. As I say, I'm brand new to this, and it's a pretty steep learning curve! > > In doing this, it also forces "format": "json" 😩 > > It does this because if it doesn't models generate a lot of [whitespace](https://github.com/ollama/ollama/blob/2ef3c803a151a0a9b1776c9ebe6a7e86b3971660/docs/api.md?plain=1#L67). Yeah, my concern here was that allegedly Deepseek doesn't support json output, so forcing it on when the model might not support it, seemed a bit odd. And I seemed to get more whitespace issues with it forced on, than with it off. > <img alt="Image" width="905" src="https://private-user-images.githubusercontent.com/10176784/406092248-ab23667c-d8ad-456e-a6e7-fdc24d1f8a69.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzc5ODQ4OTMsIm5iZiI6MTczNzk4NDU5MywicGF0aCI6Ii8xMDE3Njc4NC80MDYwOTIyNDgtYWIyMzY2N2MtZDhhZC00NTZlLWE2ZTctZmRjMjRkMWY4YTY5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAxMjclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMTI3VDEzMjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg2YjJmNDM4OTg2ZGRhNjIxMmNlZDY3MDI3NzM3M2UzYzc0YmJhNDM3M2VjZjNlNDA4NjZkZTA1YjdkZWQ2Y2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.d49RSjnSy2_OkCgx10rwpbGeMWxuTbdlmKTzBVGuIwk"> And I just found it strange that ollama's openai compatible endpoint doesn't have the same issues as the /api/chat when requesting json output
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

I answered this is a different issue, but it's probably of interest to the folk subscribed to this thread.

A tool-enabled deepseek-r1 does "thinking" so in theory is likely to respond more consistently and accurately. In theory: I haven't tested this yet.

If I make a request that uses a tool:

$ curl -s localhost:11434/api/chat -d '{
  "model":"MFDoom/deepseek-r1-tool-calling",
  "stream":false,
  "messages":[
    {"role":"user","content":"what is the weather in paris"}
  ],
  "tools":[
    {"type":"function","function":{"name":"get_current_weather","description":"get the weather"}}
  ]}'

I get the result:

{
  "model": "MFDoom/deepseek-r1-tool-calling",
  "created_at": "2025-01-27T15:25:07.447047756Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "get_current_weather",
          "arguments": {
            "location": "Paris",
            "units": "metric"
          }
        }
      }
    ]
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 2937773985,
  "load_duration": 359848546,
  "prompt_eval_count": 95,
  "prompt_eval_duration": 17000000,
  "eval_count": 206,
  "eval_duration": 2559000000
}

The text response from the model is converted to a tool call and the content is discarded. However, if we look behind the scenes and monitor the direct output of the model, we can see that it is doing "thinking":

<think>

Alright, I'm trying to figure out how to respond to this user's query.
They want me to call a function called get _current _weather and provide
its parameters.  The prompt asks for the current weather in Paris.
First, I need to look at the existing JSON template they provided.
It has an example with "get_current_weather " as the name and an
empty dictionary for parameters.  My task is to fill that dictionary
properly.  I remember that when making API calls, you usually include
parameters like location.  Since the user specified Paris, I should
set the location parameter to "Paris ".  Next, considering the units
whether they want metric or imperial measurements I'll default to
"metric" unless otherwise specified.  So, I'll add "units": "metric".
Putting it all together, the JSON should have these two keys : "location"
and "units", with their respective values.
</think>

{"name": "get_current_weather", "parameters": {"location": "Paris", "units": "metric"}}
<!-- gh-comment-id:2616152233 --> @rick-github commented on GitHub (Jan 27, 2025): I answered this is a different issue, but it's probably of interest to the folk subscribed to this thread. A tool-enabled deepseek-r1 does "thinking" so in theory is likely to respond more consistently and accurately. In theory: I haven't tested this yet. If I make a request that uses a tool: ```sh $ curl -s localhost:11434/api/chat -d '{ "model":"MFDoom/deepseek-r1-tool-calling", "stream":false, "messages":[ {"role":"user","content":"what is the weather in paris"} ], "tools":[ {"type":"function","function":{"name":"get_current_weather","description":"get the weather"}} ]}' ``` I get the result: ```json { "model": "MFDoom/deepseek-r1-tool-calling", "created_at": "2025-01-27T15:25:07.447047756Z", "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "get_current_weather", "arguments": { "location": "Paris", "units": "metric" } } } ] }, "done_reason": "stop", "done": true, "total_duration": 2937773985, "load_duration": 359848546, "prompt_eval_count": 95, "prompt_eval_duration": 17000000, "eval_count": 206, "eval_duration": 2559000000 } ``` The text response from the model is converted to a tool call and the content is discarded. However, if we look behind the scenes and monitor the direct output of the model, we can see that it is doing "thinking": ```console <think> Alright, I'm trying to figure out how to respond to this user's query. They want me to call a function called get _current _weather and provide its parameters. The prompt asks for the current weather in Paris. First, I need to look at the existing JSON template they provided. It has an example with "get_current_weather " as the name and an empty dictionary for parameters. My task is to fill that dictionary properly. I remember that when making API calls, you usually include parameters like location. Since the user specified Paris, I should set the location parameter to "Paris ". Next, considering the units whether they want metric or imperial measurements I'll default to "metric" unless otherwise specified. So, I'll add "units": "metric". Putting it all together, the JSON should have these two keys : "location" and "units", with their respective values. </think> {"name": "get_current_weather", "parameters": {"location": "Paris", "units": "metric"}} ```
Author
Owner

@odrobnik commented on GitHub (Jan 27, 2025):

There are three main issues:

  1. the current template is missing the tool calling support parts you can see eg in the one for Qwen2.5

  2. there is some part of ollama that looks for tool calls in the original output and if there are any then it discards the other text context. Ideally we would still get the generated thinking in the context. We am not certain about this, but whenever the template go tool calls from the model, the context was empty.

  3. I semi-successfully added tool calling to the template, but it seems a bit unreliable. I think that this is because the special tokens for the tool calls are missing, due to the tokenizer not implementing it.

<!-- gh-comment-id:2616470306 --> @odrobnik commented on GitHub (Jan 27, 2025): There are three main issues: 1) the current template is missing the tool calling support parts you can see eg in the one for Qwen2.5 2) there is some part of ollama that looks for tool calls in the original output and if there are any then it discards the other text context. Ideally we would still get the generated thinking in the context. We am not certain about this, but whenever the template go tool calls from the model, the context was empty. 3) I semi-successfully added tool calling to the template, but it seems a bit unreliable. I think that this is because the special tokens for the tool calls are missing, due to the tokenizer not implementing it.
Author
Owner

@DevinduSamarasinghe commented on GitHub (Jan 30, 2025):

https://ollama.com/MFDoom/deepseek-r1-tool-calling:8b

The above model supports tool calling.

<!-- gh-comment-id:2624265173 --> @DevinduSamarasinghe commented on GitHub (Jan 30, 2025): https://ollama.com/MFDoom/deepseek-r1-tool-calling:8b The above model supports tool calling.
Author
Owner

@niltonvasques commented on GitHub (Jan 31, 2025):

I tried here on langflow, but it not calls the tool correctly, at least it is not raising errors informing that the model has not tool calling features.

<!-- gh-comment-id:2627661685 --> @niltonvasques commented on GitHub (Jan 31, 2025): I tried here on langflow, but it not calls the tool correctly, at least it is not raising errors informing that the model has not tool calling features.
Author
Owner

@rick-github commented on GitHub (Jan 31, 2025):

I believe that some frameworks hide the fact that some models don't do tools, and use the old insert-tool-in-system-prompt method to implement function calling.

<!-- gh-comment-id:2627800049 --> @rick-github commented on GitHub (Jan 31, 2025): I believe that some frameworks hide the fact that some models don't do tools, and use the old insert-tool-in-system-prompt method to implement function calling.
Author
Owner

@odrobnik commented on GitHub (Jan 31, 2025):

https://ollama.com/MFDoom/deepseek-r1-tool-calling:8b

The above model supports tool calling.

I tested it, there are several issue:

  1. for the first couple of tries it didn't even call any tools.

  2. There is a leading </think> in the output.

"</think>\n\n**Step 1:** The user wants to know whose birthday is next between Oliver and Sylvia.\n\n**Step 2:** I\'ll use the `birthdate` tool to find both of their birthdates.\n\n**Step 3:** After getting both dates, I\'ll apply the `date_difference` tool to determine how many days each has left until their birthday this year.\n\n**Step 4:** Compare the two numbers to see who has a closer birthday in terms of days remaining.\n</think>\n\n**Final Answer:**\n\nSylvia\'s birthday is next this year."

The 14B version does call some tools, but it doesn't come to the right conclusions in my test case involving multiple reasoning steps.

Even with the 32B version I get results like this:

[{"index":0,"message":{"role":"assistant","content":"\u003c/think\u003e\n\n{\n \"name\": \"date_ difference\",\n \"parameters\": {\n  \"first Date\": get_current_date (),\n  \"last_Date\": birthdate ({user: \"Oliver\"})\n }\n}"},"finish_reason":"stop"}],"usage":{"prompt_tokens":471,"completion_tokens":42,"total_tokens":513}}

At other times there are spaces in the names of functions. Very odd and unreliable behavior.

<!-- gh-comment-id:2628145688 --> @odrobnik commented on GitHub (Jan 31, 2025): > https://ollama.com/MFDoom/deepseek-r1-tool-calling:8b > > The above model supports tool calling. I tested it, there are several issue: 1. for the first couple of tries it didn't even call any tools. 2. There is a leading `</think>` in the output. ``` "</think>\n\n**Step 1:** The user wants to know whose birthday is next between Oliver and Sylvia.\n\n**Step 2:** I\'ll use the `birthdate` tool to find both of their birthdates.\n\n**Step 3:** After getting both dates, I\'ll apply the `date_difference` tool to determine how many days each has left until their birthday this year.\n\n**Step 4:** Compare the two numbers to see who has a closer birthday in terms of days remaining.\n</think>\n\n**Final Answer:**\n\nSylvia\'s birthday is next this year." ``` 3. The 14B version does call some tools, but it doesn't come to the right conclusions in my test case involving multiple reasoning steps. 4. Even with the 32B version I get results like this: ``` [{"index":0,"message":{"role":"assistant","content":"\u003c/think\u003e\n\n{\n \"name\": \"date_ difference\",\n \"parameters\": {\n \"first Date\": get_current_date (),\n \"last_Date\": birthdate ({user: \"Oliver\"})\n }\n}"},"finish_reason":"stop"}],"usage":{"prompt_tokens":471,"completion_tokens":42,"total_tokens":513}} ``` At other times there are spaces in the names of functions. Very odd and unreliable behavior.
Author
Owner

@tompipe commented on GitHub (Feb 2, 2025):

So I gave it a full test through using the openai examples - hopeuflly this adds further context

Request 1
curl -s localhost:33821/v1/chat/completions -d '{
    "model": "MFDoom/deepseek-r1-tool-calling",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Paris today?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current temperature for a given location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Bogotá, Colombia"
                        }
                    },
                    "required": [
                        "location"
                    ],
                    "additionalProperties": false
                },
                "strict": true
            }
        }
    ]
}'

Ollama log entry

2025-02-02 16:28:39 time=2025-02-02T16:28:39.269Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="\r\nWhen using a tool, format as:\r\n{\"name\": \"function_name\", \"parameters\": {\"param1\": \"value1\"}}\r\nThe following tools are available when needed for specific tasks:\r\n[{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"description\":\"Get current temperature for a given location.\",\"parameters\":{\"type\":\"object\",\"required\":[\"location\"],\"properties\":{\"location\":{\"type\":\"string\",\"description\":\"City and country e.g. Bogotá, Colombia\"}}}}}]<|User|>What is the weather like in Paris today?\r\nGiven the tools, please respond with a JSON object for a function call with its proper arguments that best answers the given prompt.\r\nRespond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}. Do not use variables.<|Assistant|>\r\n\r\n"

Response

{
    "id": "chatcmpl-295",
    "object": "chat.completion",
    "created": 1738513724,
    "model": "MFDoom/deepseek-r1-tool-calling",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "call_y4oi4to8",
                        "index": 0,
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\":\"Paris, France\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 160,
        "completion_tokens": 263,
        "total_tokens": 423
    }
}

I then submitted the follow up request, appending the tool call message and the output of the tool message to the original request

Request 2
curl -s localhost:33821/v1/chat/completions -d '{
    "model": "MFDoom/deepseek-r1-tool-calling",
    "messages": [
        {
            "role": "user",
            "content": "What is the weather like in Paris today?"
        },
        {
            "role": "assistant",
            "content": "",
            "tool_calls": [
                {
                    "id": "call_0r878gkj",
                    "index": 0,
                    "type": "function",
                    "function": {
                        "name": "get_weather",
                        "arguments": "{\"location\":\"Paris, France\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "tool_call_id": "call_0r878gkj",
            "content": "The current temperature in Paris is 14°C (57.2°F)."
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current temperature for a given location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Bogotá, Colombia"
                        }
                    },
                    "required": [
                        "location"
                    ],
                    "additionalProperties": false
                },
                "strict": true
            }
        }
    ]
}'

Ollama log entry

2025-02-02 16:30:50 time=2025-02-02T16:30:50.573Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="\r\nWhen using a tool, format as:\r\n{\"name\": \"function_name\", \"parameters\": {\"param1\": \"value1\"}}\r\nThe following tools are available when needed for specific tasks:\r\n[{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"description\":\"Get current temperature for a given location.\",\"parameters\":{\"type\":\"object\",\"required\":[\"location\"],\"properties\":{\"location\":{\"type\":\"string\",\"description\":\"City and country e.g. Bogotá, Colombia\"}}}}}]<|User|>What is the weather like in Paris today?\r\n<|Assistant|>\r\n<|end▁of▁sentence|>\r\n<|tool▁outputs▁begin|>\r\n<|tool▁output▁begin|>\r\nThe current temperature in Paris is 14°C (57.2°F).\r\n<|tool▁output▁end|>\r\n<|tool▁outputs▁end|><|Assistant|>\r\n\r\n"

Response

{
    "id": "chatcmpl-65",
    "object": "chat.completion",
    "created": 1738513851,
    "model": "MFDoom/deepseek-r1-tool-calling",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "call_2idcm56l",
                        "index": 0,
                        "type": "function",
                        "function": {
                            "name": "get_weather",
                            "arguments": "{\"location\":\"Paris\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 174,
        "completion_tokens": 23,
        "total_tokens": 197
    }
}

Which seems like it just wants to call the tool again. But if the follow up request is submitted without the tools object

Response 3

Ollama log entry

2025-02-02 16:33:52 time=2025-02-02T16:33:52.068Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<|User|>What is the weather like in Paris today?\r\n<|Assistant|>\r\n<|end▁of▁sentence|>\r\n<|tool▁outputs▁begin|>\r\n<|tool▁output▁begin|>\r\nThe current temperature in Paris is 14°C (57.2°F).\r\n<|tool▁output▁end|>\r\n<|tool▁outputs▁end|><|Assistant|>\r\n\r\n"
{
    "id": "chatcmpl-75",
    "object": "chat.completion",
    "created": 1738514032,
    "model": "MFDoom/deepseek-r1-tool-calling",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "</think>\n\nThe current weather conditions in Paris are as follows: the temperature is **14°C** (57.2°F)."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 78,
        "completion_tokens": 28,
        "total_tokens": 106
    }
}

Then it does digest the tool output, but as @odrobnik observed, it has a leading </think> along with some other occasional weirdness. Like

  "message": {
      "role": "assistant",
      "content": "</think>\n\n<|tool▁outputs●begin|>\n<|tool●outputs|>\n<|tool output starts here|>\n\nThe current weather in Paris is [insert live weather description].\n\n[insert real-time data such as temperature, precipitation chances, wind speed, and conditions like sunny, cloudy, rainy.]\n\n<|tool output ends here|>\n<|tool●outputs|>\n<|tool outputs●end |>\n\nPlease note: The information provided should be checked through a reliable weather service for accuracy."
  }

Other notes

  • Ensuring "additionalProperties": false and "strict": true were present in the request, prevented it from 'making up' function arguments like time=today or calling non existent functions like get_weather_today
  • Re-submitting the same request appears to not use the given tool output if it's already been consumed. And it'll respond with something like "I can't get current weather, consider consulting a reliable weather service like AccuWeather..." but resubmitting the request with an edited messages[1].tool_calls[0].id and messages[2].tool_call_id seems to work (i.e from call_0r878gkj to call_0r878gkk). Is there some sort of caching or a mechanism ensuring tool calls can only be used once?
  • At some stage during this testing, I did get a valid think output along with it digesting the results of a tool call, and the message was something like "Okay, so I'm looking at this problem where the user asked to get the weather in Paris. They provided a chat history showing previous interactions with me. I see that in earlier interactions that .... " and it correctly spat out the current weather after it's reasoning. But unfortunately haven't been able to reproduce it again.

I'll keep digging

<!-- gh-comment-id:2629472177 --> @tompipe commented on GitHub (Feb 2, 2025): So I gave it a full test through using the [openai examples](https://platform.openai.com/docs/guides/function-calling) - hopeuflly this adds further context <details> <summary>Request 1</summary> ```json curl -s localhost:33821/v1/chat/completions -d '{ "model": "MFDoom/deepseek-r1-tool-calling", "messages": [ { "role": "user", "content": "What is the weather like in Paris today?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": false }, "strict": true } } ] }' ``` Ollama log entry ``` 2025-02-02 16:28:39 time=2025-02-02T16:28:39.269Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="\r\nWhen using a tool, format as:\r\n{\"name\": \"function_name\", \"parameters\": {\"param1\": \"value1\"}}\r\nThe following tools are available when needed for specific tasks:\r\n[{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"description\":\"Get current temperature for a given location.\",\"parameters\":{\"type\":\"object\",\"required\":[\"location\"],\"properties\":{\"location\":{\"type\":\"string\",\"description\":\"City and country e.g. Bogotá, Colombia\"}}}}}]<|User|>What is the weather like in Paris today?\r\nGiven the tools, please respond with a JSON object for a function call with its proper arguments that best answers the given prompt.\r\nRespond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}. Do not use variables.<|Assistant|>\r\n\r\n" ``` Response ```json { "id": "chatcmpl-295", "object": "chat.completion", "created": 1738513724, "model": "MFDoom/deepseek-r1-tool-calling", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_y4oi4to8", "index": 0, "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris, France\"}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 160, "completion_tokens": 263, "total_tokens": 423 } } ``` </details> I then submitted the follow up request, appending the tool call message and the output of the tool message to the original request <details> <summary>Request 2</summary> ```json curl -s localhost:33821/v1/chat/completions -d '{ "model": "MFDoom/deepseek-r1-tool-calling", "messages": [ { "role": "user", "content": "What is the weather like in Paris today?" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_0r878gkj", "index": 0, "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris, France\"}" } } ] }, { "role": "tool", "tool_call_id": "call_0r878gkj", "content": "The current temperature in Paris is 14°C (57.2°F)." } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": false }, "strict": true } } ] }' ``` Ollama log entry ``` 2025-02-02 16:30:50 time=2025-02-02T16:30:50.573Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="\r\nWhen using a tool, format as:\r\n{\"name\": \"function_name\", \"parameters\": {\"param1\": \"value1\"}}\r\nThe following tools are available when needed for specific tasks:\r\n[{\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"description\":\"Get current temperature for a given location.\",\"parameters\":{\"type\":\"object\",\"required\":[\"location\"],\"properties\":{\"location\":{\"type\":\"string\",\"description\":\"City and country e.g. Bogotá, Colombia\"}}}}}]<|User|>What is the weather like in Paris today?\r\n<|Assistant|>\r\n<|end▁of▁sentence|>\r\n<|tool▁outputs▁begin|>\r\n<|tool▁output▁begin|>\r\nThe current temperature in Paris is 14°C (57.2°F).\r\n<|tool▁output▁end|>\r\n<|tool▁outputs▁end|><|Assistant|>\r\n\r\n" ``` Response ```json { "id": "chatcmpl-65", "object": "chat.completion", "created": 1738513851, "model": "MFDoom/deepseek-r1-tool-calling", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_2idcm56l", "index": 0, "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris\"}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 174, "completion_tokens": 23, "total_tokens": 197 } } ``` </details> Which seems like it just wants to call the tool again. But if the follow up request is submitted without the ```tools``` object <details> <summary>Response 3</summary> Ollama log entry ``` 2025-02-02 16:33:52 time=2025-02-02T16:33:52.068Z level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<|User|>What is the weather like in Paris today?\r\n<|Assistant|>\r\n<|end▁of▁sentence|>\r\n<|tool▁outputs▁begin|>\r\n<|tool▁output▁begin|>\r\nThe current temperature in Paris is 14°C (57.2°F).\r\n<|tool▁output▁end|>\r\n<|tool▁outputs▁end|><|Assistant|>\r\n\r\n" ``` ```json { "id": "chatcmpl-75", "object": "chat.completion", "created": 1738514032, "model": "MFDoom/deepseek-r1-tool-calling", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "</think>\n\nThe current weather conditions in Paris are as follows: the temperature is **14°C** (57.2°F)." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 78, "completion_tokens": 28, "total_tokens": 106 } } ``` </details> Then it does digest the tool output, but as @odrobnik observed, it has a leading ```</think>``` along with some other occasional weirdness. Like ```json "message": { "role": "assistant", "content": "</think>\n\n<|tool▁outputs●begin|>\n<|tool●outputs|>\n<|tool output starts here|>\n\nThe current weather in Paris is [insert live weather description].\n\n[insert real-time data such as temperature, precipitation chances, wind speed, and conditions like sunny, cloudy, rainy.]\n\n<|tool output ends here|>\n<|tool●outputs|>\n<|tool outputs●end |>\n\nPlease note: The information provided should be checked through a reliable weather service for accuracy." } ``` ### Other notes - Ensuring ```"additionalProperties": false``` and ```"strict": true``` were present in the request, prevented it from 'making up' function arguments like _time=today_ or calling non existent functions like _get_weather_today_ - Re-submitting the same request appears to not use the given tool output if it's already been consumed. And it'll respond with something like _"I can't get current weather, consider consulting a reliable weather service like AccuWeather..."_ but resubmitting the request with an edited ```messages[1].tool_calls[0].id``` and ```messages[2].tool_call_id``` seems to work (i.e from call_0r878gkj to call_0r878gkk). Is there some sort of caching or a mechanism ensuring tool calls can only be used once? - At some stage during this testing, I did get a valid think output along with it digesting the results of a tool call, and the message was something like _"<think>Okay, so I'm looking at this problem where the user asked to get the weather in Paris. They provided a chat history showing previous interactions with me. I see that in earlier interactions that .... </think>"_ and it correctly spat out the current weather after it's reasoning. But unfortunately haven't been able to reproduce it again. I'll keep digging
Author
Owner

@odrobnik commented on GitHub (Feb 2, 2025):

I am pretty sure that ollama doesn’t know to assign the tool call responses properly to tool calls if there are more than one.

If there is just one then often it works, but if you have multiple responses it won’t work reliably because the id field doesn’t get put in the template so the model only sees the pure response text.

To prove that you can add the function and parameters in front of the response and then the LLM will know which answer belongs to which call.I think that this might be an ollama issue in general because LM Studio seems to have no issue with multiple tool responses.

<!-- gh-comment-id:2629481299 --> @odrobnik commented on GitHub (Feb 2, 2025): I am pretty sure that ollama doesn’t know to assign the tool call responses properly to tool calls if there are more than one. If there is just one then often it works, but if you have multiple responses it won’t work reliably because the id field doesn’t get put in the template so the model only sees the pure response text. To prove that you can add the function and parameters in front of the response and then the LLM will know which answer belongs to which call.I think that this might be an ollama issue in general because LM Studio seems to have no issue with multiple tool responses.
Author
Owner

@tompipe commented on GitHub (Feb 4, 2025):

Ok, so a little more digging, and I think I've uncovered one of the possible/related issues. This one I think is the cause of the occasional multiple/duplicate tool calls, and perhaps also for the sporadic random function names in the tool_calls array (followed by a valid tool call).

If (as I understand it) the ollama code for parsing the tool calls receives the full response, including the content between the <think> tags, then suppose the model returned:

<think>
Okay, I need to figure out how to respond to the user's question about the weather in Paris today using the tools provided. The available tool is "get_weather," which requires a location parameter. 

First, the user's query is pretty straightforward: they just want the current temperature and maybe some additional info for Paris, France. Since the tool specifically asks for the location as a string that includes both the city and country, I should make sure to use "Paris, France" as the location.

I should structure my response according to the required JSON format. That means calling the function with the appropriate parameters. So it would be {"name": "get_weather", "parameters":{"location":"Paris, France"}}.

Once I send this request, the tool will process it and return a JSON object containing the temperature data for that location today. After receiving the response, I'll format it into an explanation so the user understands how to interpret the results.
</think>

{"name": "get_weather", "parameters":{"location":"Paris, France"}}

Ollama's parseObjects function will try parse the content (including anything inside the think output), for any valid json. If this maps to a valid function call with name and arguments, then it appears that multiple/duplicate tool call entries will be spat out, something like:

{
    "role": "assistant",
    "content": "",
    "tool_calls": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "arguments": "{\"location\":\"Paris, France\"}"
            }
        },
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "arguments": "{\"location\":\"Paris, France\"}"
            }
        }
    ]
}

This may also explain a few of the responses where i've seen it want to call non existant functions like get_weather_today.

If, during the thinking, the model considered an ideal function (such as "get_weather_today", or evaluated other tools before settling on the correct one to call, these considerations would still be output in ollama's response, and could/would end up being called by the client.

Slightly concerning given the contrived scenario where a model is provided CRUD tools, and it considers each one, e.g:

<think>
... OK, I won't call {\"name\": \"delete_record\", \"parameters\":{\"id\":\"1\"}} because the documentation says that will delete the record. I think the right tool call would be {\"name\": \"update_record\", \"parameters\":{\"id\":\"1\"}}
</think>
{\"name\": \"update_record\", \"parameters\":{\"id\":\"1\"}}

I've set up an example go playground using the code lifted from ollama, as a quick 'proof of concept'. If anyone wants to confirm my findings.

I'm sure this isn't just going to affect the Deepseek/qwen distillates, so probably wants to be hacked off into a seperate issue.

<!-- gh-comment-id:2634644030 --> @tompipe commented on GitHub (Feb 4, 2025): Ok, so a little more digging, and I think I've uncovered one of the possible/related issues. This one I think is the cause of the occasional multiple/duplicate tool calls, and perhaps also for the sporadic random function names in the tool_calls array (followed by a valid tool call). If (as I understand it) the ollama code for [parsing the tool calls](https://github.com/ollama/ollama/blob/f9d2d8913554d78b1cae47c5eaa9cbbd0ea79273/server/model.go#L132) receives the full response, including the content between the ```<think>``` tags, then suppose the model returned: ``` <think> Okay, I need to figure out how to respond to the user's question about the weather in Paris today using the tools provided. The available tool is "get_weather," which requires a location parameter. First, the user's query is pretty straightforward: they just want the current temperature and maybe some additional info for Paris, France. Since the tool specifically asks for the location as a string that includes both the city and country, I should make sure to use "Paris, France" as the location. I should structure my response according to the required JSON format. That means calling the function with the appropriate parameters. So it would be {"name": "get_weather", "parameters":{"location":"Paris, France"}}. Once I send this request, the tool will process it and return a JSON object containing the temperature data for that location today. After receiving the response, I'll format it into an explanation so the user understands how to interpret the results. </think> {"name": "get_weather", "parameters":{"location":"Paris, France"}} ``` Ollama's ```parseObjects``` function will try parse the content (including anything inside the think output), for any valid json. If this maps to a valid function call with name and arguments, then it appears that multiple/duplicate tool call entries will be spat out, something like: ```json { "role": "assistant", "content": "", "tool_calls": [ { "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris, France\"}" } }, { "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris, France\"}" } } ] } ``` This may also explain a few of the responses where i've seen it want to call non existant functions like ```get_weather_today```. If, during the thinking, the model considered an ideal function (such as "get_weather_today", or evaluated other tools before settling on the correct one to call, these considerations would still be output in ollama's response, and could/would end up being called by the client. Slightly concerning given the contrived scenario where a model is provided CRUD tools, and it considers each one, e.g: ``` <think> ... OK, I won't call {\"name\": \"delete_record\", \"parameters\":{\"id\":\"1\"}} because the documentation says that will delete the record. I think the right tool call would be {\"name\": \"update_record\", \"parameters\":{\"id\":\"1\"}} </think> {\"name\": \"update_record\", \"parameters\":{\"id\":\"1\"}} ``` I've set up an [example go playground](https://go.dev/play/p/qK4vKE3Z6r9) using the code lifted from ollama, as a quick 'proof of concept'. If anyone wants to confirm my findings. I'm sure this isn't just going to affect the Deepseek/qwen distillates, so probably wants to be hacked off into a seperate issue.
Author
Owner

@odrobnik commented on GitHub (Feb 4, 2025):

I believe that the tool call parsing should ignore the thinking part. Ideally this would put into the reasoning_content property, as original Deepseek is doing it

<!-- gh-comment-id:2634653062 --> @odrobnik commented on GitHub (Feb 4, 2025): I believe that the tool call parsing should ignore the thinking part. Ideally this would put into the reasoning_content property, as original Deepseek is doing it
Author
Owner

@rick-github commented on GitHub (Feb 4, 2025):

You're expending a lot of effort to get the distillate to use tools. Why not just use the original base model?

<!-- gh-comment-id:2634665325 --> @rick-github commented on GitHub (Feb 4, 2025): You're expending a lot of effort to get the distillate to use tools. Why not just use the original base model?
Author
Owner

@tompipe commented on GitHub (Feb 4, 2025):

I believe that the tool call parsing should ignore the thinking part.

As far as I can tell, parseToolCalls is being passed the full content, which I believe still includes the think tags, though I'm not familar with go, or the ollama pipeline, so can't be certain. But it definitely aligns with my findings/testing

Ideally this would put into the reasoning_content property, as original Deepseek is doing it

Absolutely. This would be great

<!-- gh-comment-id:2634666238 --> @tompipe commented on GitHub (Feb 4, 2025): > I believe that the tool call parsing should ignore the thinking part. As far as I can tell, [parseToolCalls](https://github.com/ollama/ollama/blob/65b7ecac7bd4346fae8f49764b0d6d2eb8de39ae/server/routes.go#L1505) is being passed the full content, which I believe still includes the think tags, though I'm not familar with go, or the ollama pipeline, so can't be certain. But it definitely aligns with my findings/testing > Ideally this would put into the reasoning_content property, as original Deepseek is doing it Absolutely. This would be great
Author
Owner

@tompipe commented on GitHub (Feb 4, 2025):

You're expending a lot of effort to get the distillate to use tools. Why not just use the original base model?

Ha, I would if I could download moar vramz 😄

<!-- gh-comment-id:2634671438 --> @tompipe commented on GitHub (Feb 4, 2025): > You're expending a lot of effort to get the distillate to use tools. Why not just use the original base model? Ha, I would if I could download moar vramz 😄
Author
Owner

@rick-github commented on GitHub (Feb 4, 2025):

Not deepseek-r1:671b, qwen2.5:7b or llama3.1:8b.

<!-- gh-comment-id:2634679679 --> @rick-github commented on GitHub (Feb 4, 2025): Not deepseek-r1:671b, qwen2.5:7b or llama3.1:8b.
Author
Owner

@tompipe commented on GitHub (Feb 4, 2025):

Not deepseek-r1:671b, qwen2.5:7b or llama3.1:8b.

Ah I see. Well if something doesn't work as it should, I tend to get an itch to understand why. And this one got me scratching 🤣

<!-- gh-comment-id:2634694944 --> @tompipe commented on GitHub (Feb 4, 2025): > Not deepseek-r1:671b, qwen2.5:7b or llama3.1:8b. Ah I see. Well if something doesn't work as it should, I tend to get an itch to understand why. And this one got me scratching 🤣
Author
Owner

@wangjiyang commented on GitHub (Feb 5, 2025):

https://ollama.com/MFDoom/deepseek-r1-tool-calling:8b
The above model supports tool calling.

I tested it, there are several issue:

  1. for the first couple of tries it didn't even call any tools.
  2. There is a leading </think> in the output.
"</think>\n\n**Step 1:** The user wants to know whose birthday is next between Oliver and Sylvia.\n\n**Step 2:** I\'ll use the `birthdate` tool to find both of their birthdates.\n\n**Step 3:** After getting both dates, I\'ll apply the `date_difference` tool to determine how many days each has left until their birthday this year.\n\n**Step 4:** Compare the two numbers to see who has a closer birthday in terms of days remaining.\n</think>\n\n**Final Answer:**\n\nSylvia\'s birthday is next this year."

The 14B version does call some tools, but it doesn't come to the right conclusions in my test case involving multiple reasoning steps.

Even with the 32B version I get results like this:

[{"index":0,"message":{"role":"assistant","content":"\u003c/think\u003e\n\n{\n \"name\": \"date_ difference\",\n \"parameters\": {\n  \"first Date\": get_current_date (),\n  \"last_Date\": birthdate ({user: \"Oliver\"})\n }\n}"},"finish_reason":"stop"}],"usage":{"prompt_tokens":471,"completion_tokens":42,"total_tokens":513}}

At other times there are spaces in the names of functions. Very odd and unreliable behavior.

I also noticed some extra spaces are generated after "." or "/". Eg, stdio.h became "stdio. h". This is happened even normal chat creation without tool support. After some investigation, it is very probably caused by tokenizer.

<!-- gh-comment-id:2637885076 --> @wangjiyang commented on GitHub (Feb 5, 2025): > > https://ollama.com/MFDoom/deepseek-r1-tool-calling:8b > > The above model supports tool calling. > > I tested it, there are several issue: > > 1. for the first couple of tries it didn't even call any tools. > 2. There is a leading `</think>` in the output. > > ``` > "</think>\n\n**Step 1:** The user wants to know whose birthday is next between Oliver and Sylvia.\n\n**Step 2:** I\'ll use the `birthdate` tool to find both of their birthdates.\n\n**Step 3:** After getting both dates, I\'ll apply the `date_difference` tool to determine how many days each has left until their birthday this year.\n\n**Step 4:** Compare the two numbers to see who has a closer birthday in terms of days remaining.\n</think>\n\n**Final Answer:**\n\nSylvia\'s birthday is next this year." > ``` > > > The 14B version does call some tools, but it doesn't come to the right conclusions in my test case involving multiple reasoning steps. > > > Even with the 32B version I get results like this: > > ``` > [{"index":0,"message":{"role":"assistant","content":"\u003c/think\u003e\n\n{\n \"name\": \"date_ difference\",\n \"parameters\": {\n \"first Date\": get_current_date (),\n \"last_Date\": birthdate ({user: \"Oliver\"})\n }\n}"},"finish_reason":"stop"}],"usage":{"prompt_tokens":471,"completion_tokens":42,"total_tokens":513}} > ``` > > At other times there are spaces in the names of functions. Very odd and unreliable behavior. I also noticed some extra spaces are generated after "." or "/". Eg, stdio.h became "stdio. h". This is happened even normal chat creation without tool support. After some investigation, it is very probably caused by tokenizer.
Author
Owner

@mozophe commented on GitHub (Feb 11, 2025):

I think this issue is partly related to https://github.com/ollama/ollama/issues/8982.

Also, there are a few tool supported versions of DeepSeek-R1-Qwen2.5 available in ollama library: https://ollama.com/search?c=tools&q=deepseek

Could be a good idea to either test the models or use its template.

<!-- gh-comment-id:2649628323 --> @mozophe commented on GitHub (Feb 11, 2025): I think this issue is partly related to https://github.com/ollama/ollama/issues/8982. Also, there are a few tool supported versions of DeepSeek-R1-Qwen2.5 available in ollama library: https://ollama.com/search?c=tools&q=deepseek Could be a good idea to either test the models or use its template.
Author
Owner

@robwilkes commented on GitHub (Feb 12, 2025):

This may not help however Groq supports R1 distill models for tools calling so it definitely seems possible to do.

https://console.groq.com/docs/tool-use

<!-- gh-comment-id:2652510229 --> @robwilkes commented on GitHub (Feb 12, 2025): This may not help however Groq supports R1 distill models for tools calling so it definitely seems possible to do. https://console.groq.com/docs/tool-use
Author
Owner

@xiaoming2624 commented on GitHub (Feb 14, 2025):

hi,
And deepseek-r1:14b
ubuntu 20.04.6 LTS
Latest Ollama
webui-dify
search-duckduckgo or searxng

[ollama] Error: APl request failed with status
code 400:
{"error":"registry.ollama.ai/library/deepseek-
r1:14b does not support tools"}

Image

<!-- gh-comment-id:2658543566 --> @xiaoming2624 commented on GitHub (Feb 14, 2025): hi, And deepseek-r1:14b ubuntu 20.04.6 LTS Latest Ollama webui-dify search-duckduckgo or searxng [ollama] Error: APl request failed with status code 400: {"error":"registry.ollama.ai/library/deepseek- r1:14b does not support tools"} ![Image](https://github.com/user-attachments/assets/9aafd1b7-18dc-4632-877b-663e671cb2cb)
Author
Owner

@codex-horizon commented on GitHub (Feb 26, 2025):

我观察到模型出现相同的错误deepseek-r1:1.5b

{'error': {'message': 'registry.ollama.ai/library/deepseek-r1:1.5b does not support tools', 'type': 'api_error', 'param': None, 'code': None}}

java.lang.RuntimeException: [400] Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:70b does not support tools"}

<!-- gh-comment-id:2684250663 --> @codex-horizon commented on GitHub (Feb 26, 2025): > 我观察到模型出现相同的错误`deepseek-r1:1.5b` > > ``` > {'error': {'message': 'registry.ollama.ai/library/deepseek-r1:1.5b does not support tools', 'type': 'api_error', 'param': None, 'code': None}} > ``` java.lang.RuntimeException: [400] Bad Request - {"error":"registry.ollama.ai/library/deepseek-r1:70b does not support tools"}
Author
Owner

@f2bo commented on GitHub (Feb 26, 2025):

@odrobnik Have you seen this issue https://github.com/ggml-org/llama.cpp/issues/11861#issuecomment-2660718488? It seems to be the same and apparently has been fixed, though I'm not sure when the fix will be taken by ollama.

<!-- gh-comment-id:2685286519 --> @f2bo commented on GitHub (Feb 26, 2025): @odrobnik Have you seen this issue https://github.com/ggml-org/llama.cpp/issues/11861#issuecomment-2660718488? It seems to be the same and apparently has been fixed, though I'm not sure when the fix will be taken by ollama.
Author
Owner

@zoeshawwang commented on GitHub (Mar 1, 2025):

same here

<!-- gh-comment-id:2692204783 --> @zoeshawwang commented on GitHub (Mar 1, 2025): same here
Author
Owner

@jesusmogollon commented on GitHub (Mar 3, 2025):

+1

<!-- gh-comment-id:2695592175 --> @jesusmogollon commented on GitHub (Mar 3, 2025): +1
Author
Owner

@vishalmakwana111 commented on GitHub (Mar 6, 2025):

+1

<!-- gh-comment-id:2702822503 --> @vishalmakwana111 commented on GitHub (Mar 6, 2025): +1
Author
Owner

@blackhawkee commented on GitHub (Mar 9, 2025):

same issue +1

<!-- gh-comment-id:2708864440 --> @blackhawkee commented on GitHub (Mar 9, 2025): same issue +1
Author
Owner

@liupums commented on GitHub (Mar 12, 2025):

figured out and confirmed that qwen2.5 supports tools

https://ollama.com/library/qwen2.5

$ curl -s localhost:11434/api/chat -d '{ "model":"qwen2.5", "stream":false, "messages":[ {"role":"user","content":"what is the weather in paris"} ], "tools":[ {"type":"function","function":{"name":"get_current_weather","description":"get the weather"}} ]}' {"model":"qwen2.5","created_at":"2025-03-12T01:25:02.5036295Z","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"get_current_weather","arguments":{"query":"Paris"}}}]},"done_reason":"stop","done":true,"total_duration":4318202200,"load_duration":2173890400,"prompt_eval_count":147,"prompt_eval_duration":770000000,"eval_count":21,"eval_duration":1111000000}

<!-- gh-comment-id:2716136498 --> @liupums commented on GitHub (Mar 12, 2025): figured out and confirmed that qwen2.5 supports tools https://ollama.com/library/[qwen2.5](https://ollama.com/library/qwen2.5) ` $ curl -s localhost:11434/api/chat -d '{ "model":"qwen2.5", "stream":false, "messages":[ {"role":"user","content":"what is the weather in paris"} ], "tools":[ {"type":"function","function":{"name":"get_current_weather","description":"get the weather"}} ]}' {"model":"qwen2.5","created_at":"2025-03-12T01:25:02.5036295Z","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"get_current_weather","arguments":{"query":"Paris"}}}]},"done_reason":"stop","done":true,"total_duration":4318202200,"load_duration":2173890400,"prompt_eval_count":147,"prompt_eval_duration":770000000,"eval_count":21,"eval_duration":1111000000} `
Author
Owner

@crystal-coding-time commented on GitHub (Apr 27, 2025):

Same issue +1

<!-- gh-comment-id:2832918913 --> @crystal-coding-time commented on GitHub (Apr 27, 2025): Same issue +1
Author
Owner

@maccman commented on GitHub (May 20, 2025):

Same issue

<!-- gh-comment-id:2892659905 --> @maccman commented on GitHub (May 20, 2025): Same issue
Author
Owner

@JunHyeokYoo commented on GitHub (May 20, 2025):

+1

<!-- gh-comment-id:2893166795 --> @JunHyeokYoo commented on GitHub (May 20, 2025): +1
Author
Owner

@YuSheng1223 commented on GitHub (May 23, 2025):

same issue

<!-- gh-comment-id:2904947857 --> @YuSheng1223 commented on GitHub (May 23, 2025): same issue
Author
Owner

@udaykumarbpatel commented on GitHub (May 24, 2025):

+1

<!-- gh-comment-id:2906395831 --> @udaykumarbpatel commented on GitHub (May 24, 2025): +1
Author
Owner

@wuhongsheng commented on GitHub (May 26, 2025):

+1

<!-- gh-comment-id:2908773145 --> @wuhongsheng commented on GitHub (May 26, 2025): +1
Author
Owner

@anishcorratech commented on GitHub (Jun 7, 2025):

same issue,

ollama version is 0.9.0

2025/06/07 13:40:42 INFO Model loaded provider=ollama model=deepseek-r1:1.5b
2025/06/07 13:40:42 INFO Initializing server... name=filesystem
2025/06/07 13:40:42 INFO Initializing server... name=sqlite
2025/06/07 13:40:44 INFO Server connected name=filesystem
2025/06/07 13:40:44 INFO Server connected name=sqlite
2025/06/07 13:40:44 INFO Tools loaded server=filesystem count=11
2025/06/07 13:40:44 INFO Tools loaded server=sqlite count=6

  You: hi
2025/06/07 13:40:47 INFO Shutting down MCP servers...
2025/06/07 13:40:47 INFO Server closed name=filesystem
2025/06/07 13:40:47 INFO Server closed name=sqlite
Error: registry.ollama.ai/library/deepseek-r1:1.5b does not support tools
<!-- gh-comment-id:2952149424 --> @anishcorratech commented on GitHub (Jun 7, 2025): same issue, ollama version is `0.9.0` ``` 2025/06/07 13:40:42 INFO Model loaded provider=ollama model=deepseek-r1:1.5b 2025/06/07 13:40:42 INFO Initializing server... name=filesystem 2025/06/07 13:40:42 INFO Initializing server... name=sqlite 2025/06/07 13:40:44 INFO Server connected name=filesystem 2025/06/07 13:40:44 INFO Server connected name=sqlite 2025/06/07 13:40:44 INFO Tools loaded server=filesystem count=11 2025/06/07 13:40:44 INFO Tools loaded server=sqlite count=6 You: hi 2025/06/07 13:40:47 INFO Shutting down MCP servers... 2025/06/07 13:40:47 INFO Server closed name=filesystem 2025/06/07 13:40:47 INFO Server closed name=sqlite Error: registry.ollama.ai/library/deepseek-r1:1.5b does not support tools ```
Author
Owner

@ctcanbol commented on GitHub (Jun 23, 2025):

Are there any plan to fix this?

<!-- gh-comment-id:2998156119 --> @ctcanbol commented on GitHub (Jun 23, 2025): Are there any plan to fix this?
Author
Owner

@Jayian1890 commented on GitHub (Jun 26, 2025):

Since January with no fix is kinda absurd, ngl

<!-- gh-comment-id:3010613891 --> @Jayian1890 commented on GitHub (Jun 26, 2025): Since January with no fix is kinda absurd, ngl
Author
Owner

@qwerty108109 commented on GitHub (Jul 3, 2025):

I do not think there's a question. This model definitely has tool support. But Ollama is lacking in tool support for this model.
This is definitely a bug that has been encountered by more than a few people.

<!-- gh-comment-id:3032798160 --> @qwerty108109 commented on GitHub (Jul 3, 2025): I do not think there's a question. This model definitely has tool support. But Ollama is lacking in tool support for this model. This is definitely a bug that has been encountered by more than a few people.
Author
Owner

@qwerty108109 commented on GitHub (Jul 3, 2025):

I can confirm this issue has not been fixed on the latest version of Ollama v0.9.5.

<!-- gh-comment-id:3032819269 --> @qwerty108109 commented on GitHub (Jul 3, 2025): I can confirm this issue has not been fixed on the latest version of Ollama v0.9.5.
Author
Owner

@tko commented on GitHub (Jul 3, 2025):

might help: https://github.com/ollama/ollama/pull/11273

<!-- gh-comment-id:3032909911 --> @tko commented on GitHub (Jul 3, 2025): might help: https://github.com/ollama/ollama/pull/11273
Author
Owner

@ctcanbol commented on GitHub (Jul 3, 2025):

I just switched to vLLM. Plus, better tokens per sec. Not fixing this bug for such a popular model for more then a month is unacceptable.

<!-- gh-comment-id:3032981233 --> @ctcanbol commented on GitHub (Jul 3, 2025): I just switched to vLLM. Plus, better tokens per sec. Not fixing this bug for such a popular model for more then a month is unacceptable.
Author
Owner

@qwerty108109 commented on GitHub (Jul 3, 2025):

Ollama is a completely volunteer project to my knowledge.
Where VLLM has money and institutional backers.
@ctcanbol

<!-- gh-comment-id:3033028489 --> @qwerty108109 commented on GitHub (Jul 3, 2025): Ollama is a completely volunteer project to my knowledge. Where VLLM has money and institutional backers. @ctcanbol
Author
Owner

@Jayian1890 commented on GitHub (Jul 4, 2025):

Ollama is a completely volunteer project to my knowledge. Where VLLM has money and institutional backers. @ctcanbol

I can submit a pull request if it’s worth the effort. Don’t want to waste the time if it won’t be merged.

<!-- gh-comment-id:3034396103 --> @Jayian1890 commented on GitHub (Jul 4, 2025): > Ollama is a completely volunteer project to my knowledge. Where VLLM has money and institutional backers. [@ctcanbol](https://github.com/ctcanbol) I can submit a pull request if it’s worth the effort. Don’t want to waste the time if it won’t be merged.
Author
Owner

@qwerty108109 commented on GitHub (Jul 4, 2025):

Just to caveat this, I do not speak on behalf of the Ollama Project, but to my knowledge, Ollama is constantly looking to add new contributors and does have a very high quality control standards. Here is a link to the developer guide if you're interested in contributing. https://github.com/ollama/ollama/blob/main/docs/development.md
@Jayian1890

<!-- gh-comment-id:3036888565 --> @qwerty108109 commented on GitHub (Jul 4, 2025): Just to caveat this, I do not speak on behalf of the Ollama Project, but to my knowledge, Ollama is constantly looking to add new contributors and does have a very high quality control standards. Here is a link to the developer guide if you're interested in contributing. [https://github.com/ollama/ollama/blob/main/docs/development.md](https://github.com/ollama/ollama/blob/main/docs/development.md) @Jayian1890
Author
Owner

@ver007 commented on GitHub (Jul 7, 2025):

I just switched to vLLM. Plus, better tokens per sec. Not fixing this bug for such a popular model for more then a month is unacceptable.

vLLM + Deepseek tools call is ok?

<!-- gh-comment-id:3044492391 --> @ver007 commented on GitHub (Jul 7, 2025): > I just switched to vLLM. Plus, better tokens per sec. Not fixing this bug for such a popular model for more then a month is unacceptable. vLLM + Deepseek tools call is ok?
Author
Owner

@ver007 commented on GitHub (Jul 7, 2025):

Ollama is a completely volunteer project to my knowledge. Where VLLM has money and institutional backers. @ctcanbol

You are delaying the solution because the model originated from China. This is regional discrimination.

<!-- gh-comment-id:3044496868 --> @ver007 commented on GitHub (Jul 7, 2025): > Ollama is a completely volunteer project to my knowledge. Where VLLM has money and institutional backers. [@ctcanbol](https://github.com/ctcanbol) You are delaying the solution because the model originated from China. This is regional discrimination.
Author
Owner

@mcr-ksh commented on GitHub (Jul 26, 2025):

+1

registry.ollama.ai/library/deepseek-r1:7b does not support tools

Also for the 7b model. A pitty.

<!-- gh-comment-id:3123123518 --> @mcr-ksh commented on GitHub (Jul 26, 2025): +1 registry.ollama.ai/library/deepseek-r1:7b does not support tools Also for the 7b model. A pitty.
Author
Owner

@tomaszkiewicz commented on GitHub (Jul 26, 2025):

https://docs.boundaryml.com/home - this is NOT a solution of the problem, but... the project is VERY interesting in the way it handles prompting and tooling, moreover it doesn't require native tool calling support, so works with ANY model, I tried it with many models on Ollama, including DeepSeek and Qwen.

<!-- gh-comment-id:3123291801 --> @tomaszkiewicz commented on GitHub (Jul 26, 2025): https://docs.boundaryml.com/home - this is NOT a solution of the problem, but... the project is VERY interesting in the way it handles prompting and tooling, moreover it doesn't require native tool calling support, so works with ANY model, I tried it with many models on Ollama, including DeepSeek and Qwen.
Author
Owner

@ParthSareen commented on GitHub (Aug 3, 2025):

Hey folks! Sorry didn't see this issue earlier but we have been working for a while to try and get tool calling working on the distills.

There are a couple issues which make it difficult to support it right now.

  1. The distills do not output the right token sequence compared to the full sized model for tool calling. Our parser needs the prefix to match correctly in order to parse the tool call. We need this information to be accurate as we have to distinguish between thinking, chatting, and tool calling.

  2. Custom instruction can get shifted out. The is kind of fixable just through putting a system prompt but it's inconsistent and can potentially be shifted out on long prompts - I have some fixes for this in the near future.

The only realistic way to get tool calling on these models is to early execute into a constrained output. That comes with its own challenges like output quality but I do intend to scope this in.

Sorry that you guys are running into this but we are trying to improve this process. As for work arounds - I'd recommend modifying the template yourself with a custom prompt. YMMV which is why we haven't done it yet. But worth a shot. Going to close this out for now.

<!-- gh-comment-id:3148627402 --> @ParthSareen commented on GitHub (Aug 3, 2025): Hey folks! Sorry didn't see this issue earlier but we have been working for a while to try and get tool calling working on the distills. There are a couple issues which make it difficult to support it right now. 1. The distills do not output the right token sequence compared to the full sized model for tool calling. Our parser needs the prefix to match correctly in order to parse the tool call. We need this information to be accurate as we have to distinguish between thinking, chatting, and tool calling. 2. Custom instruction can get shifted out. The is kind of fixable just through putting a system prompt but it's inconsistent and can potentially be shifted out on long prompts - I have some fixes for this in the near future. The only realistic way to get tool calling on these models is to early execute into a constrained output. That comes with its own challenges like output quality but I do intend to scope this in. Sorry that you guys are running into this but we are trying to improve this process. As for work arounds - I'd recommend modifying the template yourself with a custom prompt. YMMV which is why we haven't done it yet. But worth a shot. Going to close this out for now.
Author
Owner

@antarasi commented on GitHub (Dec 3, 2025):

If you're not going to support tool calling for deepseek-r1, please consider removing the "tools" tag from the model page:
https://ollama.com/library/deepseek-r1

It's confusing users that this model supports tool calling.

<!-- gh-comment-id:3605252647 --> @antarasi commented on GitHub (Dec 3, 2025): If you're not going to support tool calling for deepseek-r1, please consider removing the "tools" tag from the model page: https://ollama.com/library/deepseek-r1 It's confusing users that this model supports tool calling.
Author
Owner

@ver007 commented on GitHub (Dec 4, 2025):

If you're not going to support tool calling for deepseek-r1, please consider removing the "tools" tag from the model page: https://ollama.com/library/deepseek-r1

It's confusing users that this model supports tool calling.

The developers of this project harbored racial prejudice and deliberately selectively supported certain features of the model.

<!-- gh-comment-id:3610568495 --> @ver007 commented on GitHub (Dec 4, 2025): > If you're not going to support tool calling for deepseek-r1, please consider removing the "tools" tag from the model page: https://ollama.com/library/deepseek-r1 > > It's confusing users that this model supports tool calling. The developers of this project harbored racial prejudice and deliberately selectively supported certain features of the model.
Author
Owner

@rafaelcapucho commented on GitHub (Mar 6, 2026):

If you're not going to support tool calling for deepseek-r1, please consider removing the "tools" tag from the model page: https://ollama.com/library/deepseek-r1

It's confusing users that this model supports tool calling.

You're I just lost many hours trying to make it work

<!-- gh-comment-id:4009796480 --> @rafaelcapucho commented on GitHub (Mar 6, 2026): > If you're not going to support tool calling for deepseek-r1, please consider removing the "tools" tag from the model page: https://ollama.com/library/deepseek-r1 > > It's confusing users that this model supports tool calling. You're I just lost many hours trying to make it work
Author
Owner

@Amamax commented on GitHub (Mar 30, 2026):

The models they provide support the displayed tags, just not the custom distillates you can download from somewhere else... But yeah there is really some lacking info about how the capability discovery process works.

<!-- gh-comment-id:4157403615 --> @Amamax commented on GitHub (Mar 30, 2026): The models they provide support the displayed tags, just not the custom distillates you can download from somewhere else... But yeah there is really some lacking info about how the capability discovery process works.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67546