[GH-ISSUE #8337] Cannot get a tool call and a message in the same response #5344

Closed
opened 2026-04-12 16:32:37 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @gotyer on GitHub (Jan 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8337

What is the issue?

I use the api/chat endpoint, and I can make an agent either respond normally, or call a tool. It can do either successfully, but never both in the same call.
Whenever a tool is called, the content is empty.
Even if I explicitly prompt the model to write a long text before using the tool, the content field of the response it still empty.

Here's an example of an agent calling a tool (the tool is the act of querying another agent).

[{
        "created_at": "2025-01-07T15:07:29.9089713Z",
        "done": false,
        "message": {
            "content": "",      ---CONTENT IS ALWAYS EMPTY---
            "role": "assistant",
            "tool_calls": [{
                    "function": {
                        "arguments": {
                            "query": "Add a new contact named Charlie with email charlie@example.com, role CFO, company Z"
                        },
                        "name": "query_contact_manager"
                    }
                }
            ]
        },
        "model": "llama3.1"
    }, {
        "created_at": "2025-01-07T15:07:29.91725Z",
        "done": true,
        "done_reason": "stop",
        "eval_count": 35,
        "eval_duration": 305000000,
        "load_duration": 2245466100,
        "message": {
            "content": "",
            "role": "assistant"
        },
        "model": "llama3.1",
        "prompt_eval_count": 478,
        "prompt_eval_duration": 181000000,
        "total_duration": 3106890500
    }
]

I for more context, the api body:
(there's no weird options enabled)

data = {
    "model": self.model,
    "messages": messages,
    "tools": ollama_tools,
    "options": {
        "num_ctx": self.num_ctx,
        "temperature":0.1,
    }
}

Am I misunderstanding something, or is there a mistake on the ollama server?

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.5.1

Originally created by @gotyer on GitHub (Jan 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8337 ### What is the issue? I use the api/chat endpoint, and I can make an agent either respond normally, or call a tool. It can do either successfully, but never both in the same call. Whenever a tool is called, the content is empty. Even if I explicitly prompt the model to write a long text before using the tool, the content field of the response it still empty. Here's an example of an agent calling a tool (the tool is the act of querying another agent). ``` [{ "created_at": "2025-01-07T15:07:29.9089713Z", "done": false, "message": { "content": "", ---CONTENT IS ALWAYS EMPTY--- "role": "assistant", "tool_calls": [{ "function": { "arguments": { "query": "Add a new contact named Charlie with email charlie@example.com, role CFO, company Z" }, "name": "query_contact_manager" } } ] }, "model": "llama3.1" }, { "created_at": "2025-01-07T15:07:29.91725Z", "done": true, "done_reason": "stop", "eval_count": 35, "eval_duration": 305000000, "load_duration": 2245466100, "message": { "content": "", "role": "assistant" }, "model": "llama3.1", "prompt_eval_count": 478, "prompt_eval_duration": 181000000, "total_duration": 3106890500 } ] ``` I for more context, the api body: (there's no weird options enabled) ``` data = { "model": self.model, "messages": messages, "tools": ollama_tools, "options": { "num_ctx": self.num_ctx, "temperature":0.1, } } ``` Am I misunderstanding something, or is there a mistake on the ollama server? ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.5.1
GiteaMirror added the bug label 2026-04-12 16:32:37 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 7, 2025):

Tool calls take precedence. If the output from the model contains a tool call, it is converted to tool_calls. That is, the input to the model is

<|start_header_id|>user
Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.

{"type":"function","function":{"name":"query_contact_manager","description":"Adds a contact","parameters":{"type":"object","required":["query"],"properties":{"query":{"type":"string","description":""}}}}}

Question: Charlie has joined the company, add a contact.
<|eot_id|>

The output of the model (content) is:

{"name": "query_contact_manager", "parameters": {"query":"Add a new contact named Charlie with email charlie@example.com, role CFO, company Z"}}

ollama sees that this matches the format specified for a tool call, and converts it to the tool_calls structure. Since content is converted, it's cleared. Models typically do not add extra information to a tool call, but if it does, it will be discarded. If you want a long text and a tool call, break it into two operations.

<!-- gh-comment-id:2575725686 --> @rick-github commented on GitHub (Jan 7, 2025): Tool calls take precedence. If the output from the model contains a tool call, it is converted to `tool_calls`. That is, the input to the model is ``` <|start_header_id|>user Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. {"type":"function","function":{"name":"query_contact_manager","description":"Adds a contact","parameters":{"type":"object","required":["query"],"properties":{"query":{"type":"string","description":""}}}}} Question: Charlie has joined the company, add a contact. <|eot_id|> ``` The output of the model (`content`) is: ``` {"name": "query_contact_manager", "parameters": {"query":"Add a new contact named Charlie with email charlie@example.com, role CFO, company Z"}} ``` ollama sees that this matches the format specified for a tool call, and converts it to the `tool_calls` structure. Since `content` is converted, it's cleared. Models typically do not add extra information to a tool call, but if it does, it will be discarded. If you want a long text and a tool call, break it into two operations.
Author
Owner

@gotyer commented on GitHub (Jan 8, 2025):

Okay, this makes sense.

My idea was that if we encourage the model to plan and refine its actions before calling a tool, we might get it to do more complex tasks, like some kind of CoT reasoning.

I will do some tests with the /api/generate endpoint to see if we can do some more complex tool calls this way.

<!-- gh-comment-id:2576929030 --> @gotyer commented on GitHub (Jan 8, 2025): Okay, this makes sense. My idea was that if we encourage the model to plan and refine its actions before calling a tool, we might get it to do more complex tasks, like some kind of CoT reasoning. I will do some tests with the ```/api/generate``` endpoint to see if we can do some more complex tool calls this way.
Author
Owner

@gotyer commented on GitHub (Jan 27, 2025):

With the release of models such as deepseek-r1, I think that the tool calling prompt should be adapted to encourage the models to think before taking actions.
As we will see more and more reasoning models in the next months, I really believe that this would be a coherent implementation in ollama.

<!-- gh-comment-id:2615603920 --> @gotyer commented on GitHub (Jan 27, 2025): With the release of models such as deepseek-r1, I think that the tool calling prompt should be adapted to encourage the models to think before taking actions. As we will see more and more reasoning models in the next months, I really believe that this would be a coherent implementation in ollama.
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

This is a function of the model, and in the case of deepseek-r1, it does this already.

If I make a request that uses a tool:

$ curl -s localhost:11434/api/chat -d '{
  "model":"MFDoom/deepseek-r1-tool-calling",
  "stream":false,
  "messages":[
    {"role":"user","content":"what is the weather in paris"}
  ],
  "tools":[
    {"type":"function","function":{"name":"get_current_weather","description":"get the weather"}}
  ]}'

I get the result:

{
  "model": "MFDoom/deepseek-r1-tool-calling",
  "created_at": "2025-01-27T15:25:07.447047756Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "get_current_weather",
          "arguments": {
            "location": "Paris",
            "units": "metric"
          }
        }
      }
    ]
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 2937773985,
  "load_duration": 359848546,
  "prompt_eval_count": 95,
  "prompt_eval_duration": 17000000,
  "eval_count": 206,
  "eval_duration": 2559000000
}

As explained above, the text response from the model is converted to a tool call and the content is discarded. However, if we look behind the scenes and monitor the direct output of the model, we can see that it is doing "thinking":

<think>

Alright, I'm trying to figure out how to respond to this user's query.
They want me to call a function called get _current _weather and provide
its parameters.  The prompt asks for the current weather in Paris.
First, I need to look at the existing JSON template they provided.
It has an example with "get _current _weather " as the name and an
empty dictionary for parameters.  My task is to fill that dictionary
properly.  I remember that when making API calls, you usually include
parameters like location.  Since the user specified Paris, I should
set the location parameter to "Paris ".  Next, considering the units
whether they want metric or imperial measurements I'll default to
"metric" unless otherwise specified.  So, I'll add "units": "metric".
Putting it all together, the JSON should have these two keys : "location"
and "units", with their respective values.
</think>

{"name": "get_current_weather", "parameters": {"location": "Paris", "units": "metric"}}
<!-- gh-comment-id:2616087443 --> @rick-github commented on GitHub (Jan 27, 2025): This is a function of the model, and in the case of deepseek-r1, it does this already. If I make a request that uses a tool: ```sh $ curl -s localhost:11434/api/chat -d '{ "model":"MFDoom/deepseek-r1-tool-calling", "stream":false, "messages":[ {"role":"user","content":"what is the weather in paris"} ], "tools":[ {"type":"function","function":{"name":"get_current_weather","description":"get the weather"}} ]}' ``` I get the result: ```json { "model": "MFDoom/deepseek-r1-tool-calling", "created_at": "2025-01-27T15:25:07.447047756Z", "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "get_current_weather", "arguments": { "location": "Paris", "units": "metric" } } } ] }, "done_reason": "stop", "done": true, "total_duration": 2937773985, "load_duration": 359848546, "prompt_eval_count": 95, "prompt_eval_duration": 17000000, "eval_count": 206, "eval_duration": 2559000000 } ``` As explained above, the text response from the model is converted to a tool call and the content is discarded. However, if we look behind the scenes and monitor the direct output of the model, we can see that it is doing "thinking": ```console <think> Alright, I'm trying to figure out how to respond to this user's query. They want me to call a function called get _current _weather and provide its parameters. The prompt asks for the current weather in Paris. First, I need to look at the existing JSON template they provided. It has an example with "get _current _weather " as the name and an empty dictionary for parameters. My task is to fill that dictionary properly. I remember that when making API calls, you usually include parameters like location. Since the user specified Paris, I should set the location parameter to "Paris ". Next, considering the units whether they want metric or imperial measurements I'll default to "metric" unless otherwise specified. So, I'll add "units": "metric". Putting it all together, the JSON should have these two keys : "location" and "units", with their respective values. </think> {"name": "get_current_weather", "parameters": {"location": "Paris", "units": "metric"}} ```
Author
Owner

@Master-Pr0grammer commented on GitHub (May 14, 2025):

would it be possible to return this context under the content key? this would make more intuitive sense and would be very useful.

<!-- gh-comment-id:2881584004 --> @Master-Pr0grammer commented on GitHub (May 14, 2025): would it be possible to return this context under the `content` key? this would make more intuitive sense and would be very useful.
Author
Owner

@rick-github commented on GitHub (May 14, 2025):

Currently ollama has no concept of 'thinking' - all of the output of the model is just a bunch of tokens converted into words, and the tool processing just extracts the tool call and discards everything. There is on-going work on adding the ability to distinguish thought from response (#10584), so preserving this may be possible down the track.

<!-- gh-comment-id:2881599490 --> @rick-github commented on GitHub (May 14, 2025): Currently ollama has no concept of 'thinking' - all of the output of the model is just a bunch of tokens converted into words, and the tool processing just extracts the tool call and discards everything. There is on-going work on adding the ability to distinguish `thought` from `response` (#10584), so preserving this may be possible down the track.
Author
Owner

@Master-Pr0grammer commented on GitHub (May 14, 2025):

oh ok cool, thats awesome! Definitely looking forward to that.

I still think throwing away the model's response, even for non reasoning models, isn't optimal for most use cases. Having that feedback is really nice, especially in "agentic" workflows.

There's no reason why the response message can't utilize the content property it already has (but doesn't use). Instead of throwing away the models response, maybe we could take the remaining text (after cropping out the tool call) and just package it in the content property. (this would be separate from what # 10584 describes). It wouldn't change/affect anything else afaik.

This would also allow (maybe in the future) for streaming on tool-calling requests, i.e. stream content (cropping out the tool call), then at the end token, return tool_calls with the response.

This would take more time to implement, but the former should be fairly easy to implement. I have a bit of free time on my hands, so I might take a crack at it, but I'm not familiar with the code base or go.

<!-- gh-comment-id:2881807196 --> @Master-Pr0grammer commented on GitHub (May 14, 2025): oh ok cool, thats awesome! Definitely looking forward to that. I still think throwing away the model's response, even for non reasoning models, isn't optimal for most use cases. Having that feedback is really nice, especially in "agentic" workflows. There's no reason why the response message can't utilize the `content` property it already has (but doesn't use). Instead of throwing away the models response, maybe we could take the remaining text (after cropping out the tool call) and just package it in the `content` property. (this would be separate from what # 10584 describes). It wouldn't change/affect anything else afaik. This would also allow (maybe in the future) for streaming on tool-calling requests, i.e. stream `content` (cropping out the tool call), then at the end token, return `tool_calls` with the response. This would take more time to implement, but the former should be fairly easy to implement. I have a bit of free time on my hands, so I might take a crack at it, but I'm not familiar with the code base or go.
Author
Owner

@rick-github commented on GitHub (May 15, 2025):

Non-reasoning models don't return anything but the tool call, plus syntactic sugar that delimits it. There's nothing in content that's not captured in tool_calls.

If you want to see the content, delete this line and build.

<!-- gh-comment-id:2881949974 --> @rick-github commented on GitHub (May 15, 2025): Non-reasoning models don't return anything but the tool call, plus syntactic sugar that delimits it. There's nothing in `content` that's not captured in `tool_calls`. If you want to see the content, delete [this line](https://github.com/ollama/ollama/blob/bd68d3ae50c67ba46ee94a584fa6d0386e4b8522/server/routes.go#L1578) and build.
Author
Owner

@Master-Pr0grammer commented on GitHub (May 15, 2025):

ok awesome thanks!

<!-- gh-comment-id:2882048132 --> @Master-Pr0grammer commented on GitHub (May 15, 2025): ok awesome thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5344