[PR #10415] tools: refactor tool call parsing and enable streaming #13236

Closed
opened 2026-04-13 00:21:44 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/10415

State: closed
Merged: Yes


Demo

Simple tool calling

https://github.com/user-attachments/assets/fe523ef1-904a-4ab2-aaab-d43bec1c35e6

Search tool calling

https://github.com/user-attachments/assets/a2fd71b8-67c3-4c87-9daa-cdc3a23fc783

Incremental Parsing

  • Enables streaming and eventually extending to other types of tools - e.g. Python
  • Dynamically finds the tool special token to use as prefix check
  • Still allows for JSON tool parsing as a fallback if the response starts with a JSON parsable type

Breaking Changes

This PR also warrants a change to the qwen2.5-coder template due to incremental JSON parsing and focusing on tool prefixes.

There a possibility that other models will break and this can happen in a couple considerations:

  1. Model has a tool prefix defined, however does not output the prefix before making a tool call.
  2. Model does not output a JSON tool call right away if it does not have a prefix. Previously we'd greedily parse over the accumulated content and get all JSON.

I'd say these are model specific problems that should pertain to training or templating.

Closes: https://github.com/ollama/ollama/issues/7014, https://github.com/ollama/ollama/issues/7886, https://github.com/ollama/ollama/issues/9632, https://github.com/ollama/ollama-python/issues/463, https://github.com/ollama/ollama/issues/10712

Follow ups:

  • Template updates for qwen and llama4
  • Remove leading spaces in general
  • Consider setting done to true
  • Python function calling: https://github.com/ollama/ollama/pull/10453
  • Use full prefix instead of single token for tools
  • "Pipelining" and sending JSON in multiple tool call setting faster
**Original Pull Request:** https://github.com/ollama/ollama/pull/10415 **State:** closed **Merged:** Yes --- ## Demo ### Simple tool calling https://github.com/user-attachments/assets/fe523ef1-904a-4ab2-aaab-d43bec1c35e6 ### Search tool calling https://github.com/user-attachments/assets/a2fd71b8-67c3-4c87-9daa-cdc3a23fc783 ## Incremental Parsing - Enables streaming and eventually extending to other types of tools - e.g. Python - Dynamically finds the tool special token to use as prefix check - Still allows for JSON tool parsing as a fallback if the response starts with a JSON parsable type ## Breaking Changes This PR also warrants a change to the `qwen2.5-coder` template due to incremental JSON parsing and focusing on tool prefixes. There a possibility that other models will break and this can happen in a couple considerations: 1. Model has a tool prefix defined, however does not output the prefix before making a tool call. 2. Model does not output a JSON tool call right away if it does not have a prefix. Previously we'd greedily parse over the accumulated content and get **all** JSON. I'd say these are model specific problems that should pertain to training or templating. Closes: https://github.com/ollama/ollama/issues/7014, https://github.com/ollama/ollama/issues/7886, https://github.com/ollama/ollama/issues/9632, https://github.com/ollama/ollama-python/issues/463, https://github.com/ollama/ollama/issues/10712 ### Follow ups: - [ ] Template updates for qwen and llama4 - [ ] Remove leading spaces in general - [ ] Consider setting done to true - [ ] Python function calling: https://github.com/ollama/ollama/pull/10453 - [ ] Use full prefix instead of single token for tools - [ ] "Pipelining" and sending JSON in multiple tool call setting faster
GiteaMirror added the pull-request label 2026-04-13 00:21:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13236