[GH-ISSUE #11407] Streaming (sometimes?) breaks tool calling #7529

Closed
opened 2026-04-12 19:37:51 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @ccreutzi on GitHub (Jul 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11407

Originally assigned to: @jmorganca on GitHub.

What is the issue?

Combining tool calling and streaming can cause Ollama not detect the tool calls and to send stream fragments instead that start with [TOOL_CALLS][{\".

Test call:

curl http://localhost:11434/api/chat -s -d '{
   "model": "mistral-nemo",
   "messages": [
     {
       "role": "user",
       "content": "What is the largest eigenvalue of the 3-by-3 magic square?"
     }
   ],
   "stream": true,
   "tools": [
     {
       "type": "function",
       "function": {
         "name": "eig",
         "description": "Get the eigenvalues of a square matrix",
         "parameters": {
           "type": "object",
           "properties": {
             "A": {
               "type": "array",
               "items": {
                "type": "number",
                "x" : {
                  "type": "number",
                  "description": "A matrix entry"
                }
               },
               "description": "The matrix, given by its components as a vector"
             }
           },
           "required": ["A"]
         }
       }
     }
   ]
 }' | jq

Relevant log output

{
  "model": "mistral-nemo",
  "created_at": "2025-07-14T07:19:37.854857Z",
  "message": {
    "role": "assistant",
    "content": "[TOOL_CALLS][{\""
  },
  "done": false
}
{
  "model": "mistral-nemo",
  "created_at": "2025-07-14T07:19:37.996332Z",
  "message": {
    "role": "assistant",
    "content": "name"
  },
  "done": false
}
{
  "model": "mistral-nemo",
  "created_at": "2025-07-14T07:19:38.130948Z",
  "message": {
    "role": "assistant",
    "content": "\":"
  },
  "done": false
}
...


Rarely, I may also see the expected output from the same call:

{
  "model": "mistral-nemo",
  "created_at": "2025-07-14T07:24:01.985852Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "eig",
          "arguments": {
            "A": [
              8,
              1,
              6,
              3,
              5,
              7,
              4,
              9,
              2
            ]
          }
        }
      }
    ]
  },
  "done": false
}
{
  "model": "mistral-nemo",
  "created_at": "2025-07-14T07:24:02.278489Z",
  "message": {
    "role": "assistant",
    "content": ""
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 6287407208,
  "load_duration": 72818875,
  "prompt_eval_count": 97,
  "prompt_eval_duration": 170429791,
  "eval_count": 45,
  "eval_duration": 6043055875
}

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.9.5

Originally created by @ccreutzi on GitHub (Jul 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11407 Originally assigned to: @jmorganca on GitHub. ### What is the issue? Combining tool calling and streaming can cause Ollama not detect the tool calls and to send stream fragments instead that start with `[TOOL_CALLS][{\"`. Test call: ``` curl http://localhost:11434/api/chat -s -d '{ "model": "mistral-nemo", "messages": [ { "role": "user", "content": "What is the largest eigenvalue of the 3-by-3 magic square?" } ], "stream": true, "tools": [ { "type": "function", "function": { "name": "eig", "description": "Get the eigenvalues of a square matrix", "parameters": { "type": "object", "properties": { "A": { "type": "array", "items": { "type": "number", "x" : { "type": "number", "description": "A matrix entry" } }, "description": "The matrix, given by its components as a vector" } }, "required": ["A"] } } } ] }' | jq ``` ### Relevant log output ```shell { "model": "mistral-nemo", "created_at": "2025-07-14T07:19:37.854857Z", "message": { "role": "assistant", "content": "[TOOL_CALLS][{\"" }, "done": false } { "model": "mistral-nemo", "created_at": "2025-07-14T07:19:37.996332Z", "message": { "role": "assistant", "content": "name" }, "done": false } { "model": "mistral-nemo", "created_at": "2025-07-14T07:19:38.130948Z", "message": { "role": "assistant", "content": "\":" }, "done": false } ... Rarely, I may also see the expected output from the same call: { "model": "mistral-nemo", "created_at": "2025-07-14T07:24:01.985852Z", "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "eig", "arguments": { "A": [ 8, 1, 6, 3, 5, 7, 4, 9, 2 ] } } } ] }, "done": false } { "model": "mistral-nemo", "created_at": "2025-07-14T07:24:02.278489Z", "message": { "role": "assistant", "content": "" }, "done_reason": "stop", "done": true, "total_duration": 6287407208, "load_duration": 72818875, "prompt_eval_count": 97, "prompt_eval_duration": 170429791, "eval_count": 45, "eval_duration": 6043055875 } ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.9.5
GiteaMirror added the bug label 2026-04-12 19:37:51 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 14, 2025):

The problem appears to be that the model doesn't output the required prefix to indicate a tool call. The template specifies:
[TOOL_CALLS] [ but the model sometimes doesn't emit the space between the closing and opening square brackets. I tried playing around with different formats and instructions in the Modelfile but it was still somewhat inconsistent. This could presumably be fixed in the tool call parser by being relaxed about whitespace in the tool call prefix. However, mistral-nemo has never been a great tool user so perhaps a different model might give you better results.

<!-- gh-comment-id:3069148469 --> @rick-github commented on GitHub (Jul 14, 2025): The problem appears to be that the model doesn't output the required prefix to indicate a tool call. The template specifies: [`[TOOL_CALLS] [`](https://ollama.com/library/mistral-nemo:latest/blobs/f023d1ce0e55#:~:text=%7B%7B%2D%20else%20if%20.ToolCalls%20%7D%7D%5BTOOL_CALLS%5D%20%5B) but the model sometimes doesn't emit the space between the closing and opening square brackets. I tried playing around with different formats and instructions in the Modelfile but it was still somewhat inconsistent. This could presumably be fixed in the tool call parser by being relaxed about whitespace in the tool call prefix. However, mistral-nemo has never been a great tool user so perhaps a different model might give you better results.
Author
Owner

@ParthSareen commented on GitHub (Jul 22, 2025):

Sorry @ccreutzi, would recommend another tool use model in the meantime but will take a look at this too!

<!-- gh-comment-id:3104488769 --> @ParthSareen commented on GitHub (Jul 22, 2025): Sorry @ccreutzi, would recommend another tool use model in the meantime but will take a look at this too!
Author
Owner

@jmorganca commented on GitHub (Jul 23, 2025):

Thanks @rick-github and @ccreutzi . The model should be updated with the correct template. Sorry for the issue. To redownload use ollama pull mistral-nemo

<!-- gh-comment-id:3109725655 --> @jmorganca commented on GitHub (Jul 23, 2025): Thanks @rick-github and @ccreutzi . The model should be updated with the correct template. Sorry for the issue. To redownload use `ollama pull mistral-nemo`
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7529