[GH-ISSUE #7014] Better Tool Call parsing #30205

Closed
opened 2026-04-22 09:43:57 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @zly2006 on GitHub (Sep 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7014

Originally assigned to: @ParthSareen on GitHub.

Currently tool call patterns are defined in go templates. this is fine for cases e.g. in this comment. However, it is not ideal.

Problems

  1. Content loss

To say, the model responds this text:

Yes, I can help you compute 3+4 with python
<tool_call>
{"name":"python", "args": {"expr":"3+4"}}
</tool_call>

In this line, all content are removed. So if the model provided some information like the first sentence, it is ignored arbitrarily.

  1. Streaming

Tool calls does NOT support streaming, it only works when it got the full content. However we all know we can process for tool calls even dont know all content.

  1. Other format support

here, this function only supports json, shall we support other formats, for example XML, in the future?

Solution

TL;DR: For 1. and 2., we can use Aho–Corasick algorithm for parsing. Define the pattern such as @@json{name, args}@@ means {"name":"python", "args": {"expr":"3+4"}} (e.g. llama3.2) and the pattern <tool_call>@@json{name,args}@@</tool_call> could be used for the output above.

When you are not in stream mode we can still process it like a stream. So each time a token received from the state machine, we should do some process to see if the token is part of the tool call syntax. This fundamentally resolves the first problem, because we only remove the part of output that we are sure it is part of tool call.

So I think a state machine should be very nice to "guess" if it is a valid tool call. We should keep in mind that the model may also put out some normal jsons (e.g. user ask it to process a json file), so never regard a json as tool call arbitrarily, we are "guessing" it. When we got a new token, we try to match it with our state machine. If it matches successfully, it is possible a tool call, so dont send the token to the client and put on hold temporarily. When we reach the end of the pattern string, e.g. finally matched </tool_call> means it is a valid tool call, we can drop all tokens, otherwise, when the state machine fails to match, this json is not a tool call and it should be sent to the client, we then send all tokens and reset the state machine.

Then lets talk about the pattern string. I currently design it like regular expression, all chars not wrapped with @@ should be matched as is(spaces, \t \r and \n are allowed everywhere in pattern and match string and will be ignored). So <tool_call> and </tool_call> can be matched.

The model output is NOT reliable. When we test on the qwen model, it sometimes dont output <tool_call>, but a random token, though the json is still valid. for these situation we can use @<match as is>@? in the pattern string. for example
@<tool_call>@? @@json{name,args}@@ @</tool_call>@?

Then lets talk how to ensure the json object is a valid tool call, instead of what the user ask it to put out. Im designing it like this:
@@json@@ means it is json and matches all josn. you can also specify type of the object to validate it. e.g. @@json{name}@@ means the name field of the json object should neither be undefined nor null. @@json{name:string}@@ further specifies the type must be string.

All supported types listed:

pattern example
any
string "text"
number 114514
[type] [number] => [1, 2, 3, 4]
{type of values} {number} => {"name1": 1, "name2": 5}
{name of field: type} {name:string,args:any}

Welcome to share your opinions here.

Related: #5796

Originally created by @zly2006 on GitHub (Sep 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7014 Originally assigned to: @ParthSareen on GitHub. Currently tool call patterns are defined in go templates. this is fine for cases [e.g. in this comment](https://github.com/ollama/ollama/issues/6061#issuecomment-2257137350). However, it is not ideal. ## Problems 1. Content loss To say, the model responds this text: ```plaintext Yes, I can help you compute 3+4 with python <tool_call> {"name":"python", "args": {"expr":"3+4"}} </tool_call> ``` In [this line](https://github.com/ollama/ollama/blob/cd5c8f6471abf32965289f0226016a78f0c5c938/server/routes.go#L1480), all content are removed. So if the model provided some information like the first sentence, it is ignored arbitrarily. 2. Streaming Tool calls does NOT support streaming, it only works when it got the full content. However we all know we can process for tool calls even dont know all content. 3. Other format support [here](https://github.com/ollama/ollama/blob/d05da2991245cfa0cd8da0bda476c626e26caaec/server/model.go#L301), this function only supports json, shall we support other formats, for example XML, in the future? ## Solution **TL;DR:** For 1. and 2., we can use [Aho–Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) for parsing. Define the pattern such as `@@json{name, args}@@` means `{"name":"python", "args": {"expr":"3+4"}}` (e.g. llama3.2) and the pattern `<tool_call>@@json{name,args}@@</tool_call>` could be used for the output above. When you are not in stream mode we can still process it like a stream. So each time a token received from the state machine, we should do some process to see if the token is part of the tool call syntax. This fundamentally resolves the first problem, because we only remove the part of output that we are sure it is part of tool call. So I think a state machine should be very nice to "guess" if it is a valid tool call. We should keep in mind that the model may also put out some normal jsons (e.g. user ask it to process a json file), so never regard a json as tool call arbitrarily, we are "guessing" it. When we got a new token, we try to match it with our state machine. **If it matches successfully, it is possible a tool call, so dont send the token to the client and put on hold temporarily.** When we reach the end of the pattern string, e.g. finally matched `</tool_call>` means it is a valid tool call, we can drop all tokens, otherwise, when the state machine fails to match, this json is not a tool call and it should be sent to the client, we then send all tokens and reset the state machine. Then lets talk about the pattern string. I currently design it like regular expression, all chars not wrapped with `@@` should be matched as is(spaces, \t \r and \n are allowed everywhere in pattern and match string and will be ignored). So `<tool_call>` and `</tool_call>` can be matched. The model output is NOT reliable. When we test on the qwen model, it sometimes dont output `<tool_call>`, but a random token, though the json is still valid. for these situation we can use `@<match as is>@?` in the pattern string. for example `@<tool_call>@? @@json{name,args}@@ @</tool_call>@?` Then lets talk how to ensure the json object is a valid tool call, instead of what the user ask it to put out. Im designing it like this: `@@json@@` means it is json and matches all josn. you can also specify type of the object to validate it. e.g. `@@json{name}@@` means the `name` field of the json object should neither be `undefined` nor `null`. `@@json{name:string}@@` further specifies the type must be string. All supported types listed: |pattern|example| |---|---| |any| |string| "text"| |number| 114514 | | [`type`] | `[number]` => [1, 2, 3, 4] | | {`type of values`} | `{number}` => {"name1": 1, "name2": 5} | | {`name of field`: `type`} | `{name:string,args:any}` | Welcome to share your opinions here. Related: #5796
GiteaMirror added the feature request label 2026-04-22 09:43:57 -05:00
Author
Owner

@YonTracks commented on GitHub (Sep 28, 2024):

Hi, testing and checking the docs for llama.cpp and llama3.1 / 3.2, I see the tool/function calling and built in tools,
[system, assistant, user, ipython] and <|python_tag|>.

I see models trying to run python built in tools auto like a bug? but not? especially llama3.2 3b, but yes, the template etc.
openai has a similar way and other ways also, json, formatting, structure, function calling, built in tools / assistants etc.

with the system prompt in a modelfile and other ollama methods, you can get different parsing/tags to work, but,
Hoping / guessing ollama is making this all work (what a team :), love it), and with little if any breaking changes and safe.
Can't even imagine all the ways that this kind of thing can be done lol.

cheers for finding another method, I think everything helps, ollama, I think will sort it.
happy days.

<!-- gh-comment-id:2380390640 --> @YonTracks commented on GitHub (Sep 28, 2024): Hi, testing and checking the docs for llama.cpp and llama3.1 / 3.2, I see the tool/function calling and built in tools, ` [system, assistant, user, ipython]` and `<|python_tag|>`. I see models trying to run python built in tools auto like a bug? but not? especially llama3.2 3b, but yes, the template etc. openai has a similar way and other ways also, json, formatting, structure, function calling, built in tools / assistants etc. with the system prompt in a modelfile and other ollama methods, you can get different parsing/tags to work, but, Hoping / guessing ollama is making this all work (what a team :), love it), and with little if any breaking changes and safe. Can't even imagine all the ways that this kind of thing can be done lol. cheers for finding another method, I think everything helps, ollama, I think will sort it. happy days.
Author
Owner

@whats2000 commented on GitHub (Sep 28, 2024):

I have a solution in client application for streamimg, by check pattern with flag. Maybe this is not the best solution but hope can help to improve current situation

Referenced Commit: 8e7c2c6a26

<!-- gh-comment-id:2380392297 --> @whats2000 commented on GitHub (Sep 28, 2024): I have a solution in client application for streamimg, by check pattern with flag. Maybe this is not the best solution but hope can help to improve current situation Referenced Commit: https://github.com/whats2000/CodeBRT/commit/8e7c2c6a267d021d7674959682007f4342afca7c
Author
Owner

@YonTracks commented on GitHub (Sep 28, 2024):

far out wow ok, I see lol. I have not even started on the ollama-JS (don't ask me how I missed it lol).
I love JS/TS always learning, but I am using the api directly? windows desktop ollama lol then I fetch("http://localhost:11434/api/generate", or chat or whatever lol.
Love these super cheers, this actually helps me a lot. more fun lol
demo at https://github.com/YonTracks/yon-ollama-gui

<!-- gh-comment-id:2380398407 --> @YonTracks commented on GitHub (Sep 28, 2024): far out wow ok, I see lol. I have not even started on the ollama-JS (don't ask me how I missed it lol). I love JS/TS always learning, but I am using the api directly? windows desktop ollama lol then I `fetch("http://localhost:11434/api/generate", ` or chat or whatever lol. Love these super cheers, this actually helps me a lot. more fun lol demo at `https://github.com/YonTracks/yon-ollama-gui`
Author
Owner

@YonTracks commented on GitHub (Sep 28, 2024):

mind is like blown lol, theres the problem? use the api directly?

I been giving tips and stuff all based on the direct api, no client lol,
on top of my already bad grammar lol, and cryptic like mess that I provide lol, it's no wonder no one listened lol.
good luck, cheers.

<!-- gh-comment-id:2380400692 --> @YonTracks commented on GitHub (Sep 28, 2024): mind is like blown lol, theres the problem? use the api directly? I been giving tips and stuff all based on the direct api, no client lol, on top of my already bad grammar lol, and cryptic like mess that I provide lol, it's no wonder no one listened lol. good luck, cheers.
Author
Owner

@whats2000 commented on GitHub (Oct 22, 2024):

The calling formats of model output tools vary greatly. I have recently come up with an idea for an application is post process and extract content from a stream response. You can check the WIP parseToolCall.ts.

For the application, we still keep the full streaming output, but after finishing we use the function to clean up the string and try to extract the tool call. This works pretty well!

Check the showcase:
2024-10-22 22-34-40

<!-- gh-comment-id:2429478407 --> @whats2000 commented on GitHub (Oct 22, 2024): The calling formats of model output tools vary greatly. I have recently come up with an idea for an application is post process and extract content from a stream response. You can check the [WIP parseToolCall.ts](https://github.com/whats2000/CodeBRT/blob/Agent-Tool/VSCodeExtension/code-brt/src/services/languageModel/utils/parseToolCall.ts). For the application, we still keep the full streaming output, but after finishing we use the function to clean up the string and try to extract the tool call. This works pretty well! Check the showcase: ![2024-10-22 22-34-40](https://github.com/user-attachments/assets/d8c8ed2e-ae03-4133-a86c-05b1d5e61491)
Author
Owner

@allenporter commented on GitHub (Jan 1, 2025):

I feel like this is similar to the discussion in #6002 about using grammars to parse tool calls. However, one open question is about how much to /force/ the response to be a tool call.

<!-- gh-comment-id:2566832530 --> @allenporter commented on GitHub (Jan 1, 2025): I feel like this is similar to the discussion in #6002 about using grammars to parse tool calls. However, one open question is about how much to /force/ the response to be a tool call.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30205