[GH-ISSUE #10870] **Title: Feature Request: Robust Streaming Support for the Full Tool Calling Lifecycle (including subsequent responses)** #7142

Closed
opened 2026-04-12 19:08:52 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @DannyWhyze on GitHub (May 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10870

Title: Feature Request: Robust Streaming Support for the Full Tool Calling Lifecycle (including subsequent responses)

Dear Ollama Team,

First and foremost, a massive thank you for your incredible work on Ollama! It's an exceptionally valuable project that significantly advances local AI development and fosters independence from major providers. The ability to run powerful models locally and with full control is a game-changer for many developers and researchers.

I'm writing today to request a feature that, in my opinion, would make Ollama even more powerful and drastically improve its integration into existing frameworks: robust and optimized streaming support for the entire tool-calling lifecycle. This is crucial for models capable of tool/function calling, and we hope this support can be implemented comprehensively, whether accessed via the OpenAI-compatible endpoint (/v1) or through native Ollama API interactions where tool calling is or will be supported.

Specifically, this refers to the model's ability to:

  1. Stream the initial response (which might contain tool calls or be a direct answer) in chunks.
  2. Stream tool call information (name, arguments, ID) as soon as the model identifies them.
  3. After the client executes the tools and sends back the results, stream the model's final, consolidated response completely.

Why is this so crucial?

  • Achieving Parity with Leading Models: Features like seamless streaming of tool calls and subsequent responses are vital to align with models from OpenAI (ChatGPT) or Google (Gemini), especially for complex agentic workflows and multi-turn conversations involving tool use.
  • Critical for Framework Integration: Many popular frameworks such as LangChain, Autogen, Google ADK, and others heavily rely on this type of streaming interaction for tool calling. Native and robust support would significantly simplify the integration of Ollama models into these ecosystems and enhance the user experience. Developers could then more reliably use Ollama models as a drop-in alternative across various integration points.
  • Simplifying Complex Application Development: For developers aiming to build advanced, agent-like applications with local LLMs, this would greatly simplify the process and expand possibilities.
  • Strengthening Independence: By providing these advanced features robustly and locally, Ollama further strengthens the vision of independence and democratized access to AI technology.

Current Challenges with Workarounds:
Currently, many in the community (myself included) are attempting to implement this behavior using various workarounds. For instance, when using frameworks like LangChain, attempts are often made by leveraging adapters like the ChatOpenAI client pointed at Ollama's /v1 endpoint for its tool-calling capabilities. However, these workarounds are proving to be highly unreliable and often barely functional for achieving a smooth, fully-streamed tool call and response sequence. The streaming of the initial tool-calling intent might partially work, but getting a reliably streamed final answer after tool execution via such methods is fraught with issues and inconsistencies. This makes building robust, production-ready agents very difficult.

A first-class, native implementation of this full streaming lifecycle directly within Ollama, for all relevant API endpoints supporting tool calls, would provide significantly more stability, a much better developer experience, and unlock more advanced use cases.

We would be incredibly grateful if this feature could be given high priority on your roadmap. It would vastly increase Ollama's attractiveness and utility for an even broader range of applications.

Thank you for your time and your outstanding work! I'm happy to provide further details, assist with testing, or offer any feedback needed.

Best regards

Originally created by @DannyWhyze on GitHub (May 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10870 **Title: Feature Request: Robust Streaming Support for the Full Tool Calling Lifecycle (including subsequent responses)** Dear Ollama Team, First and foremost, a massive thank you for your incredible work on Ollama! It's an exceptionally valuable project that significantly advances local AI development and fosters independence from major providers. The ability to run powerful models locally and with full control is a game-changer for many developers and researchers. I'm writing today to request a feature that, in my opinion, would make Ollama even more powerful and drastically improve its integration into existing frameworks: **robust and optimized streaming support for the entire tool-calling lifecycle.** This is crucial for models capable of tool/function calling, and we hope this support can be implemented comprehensively, whether accessed via the OpenAI-compatible endpoint (`/v1`) or through native Ollama API interactions where tool calling is or will be supported. Specifically, this refers to the model's ability to: 1. Stream the initial response (which might contain tool calls or be a direct answer) in chunks. 2. Stream tool call information (name, arguments, ID) as soon as the model identifies them. 3. After the client executes the tools and sends back the results, stream the model's final, consolidated response completely. **Why is this so crucial?** * **Achieving Parity with Leading Models:** Features like seamless streaming of tool calls and subsequent responses are vital to align with models from OpenAI (ChatGPT) or Google (Gemini), especially for complex agentic workflows and multi-turn conversations involving tool use. * **Critical for Framework Integration:** Many popular frameworks such as LangChain, Autogen, Google ADK, and others heavily rely on this type of streaming interaction for tool calling. Native and robust support would significantly simplify the integration of Ollama models into these ecosystems and enhance the user experience. Developers could then more reliably use Ollama models as a drop-in alternative across various integration points. * **Simplifying Complex Application Development:** For developers aiming to build advanced, agent-like applications with local LLMs, this would greatly simplify the process and expand possibilities. * **Strengthening Independence:** By providing these advanced features robustly and locally, Ollama further strengthens the vision of independence and democratized access to AI technology. **Current Challenges with Workarounds:** Currently, many in the community (myself included) are attempting to implement this behavior using various workarounds. For instance, when using frameworks like LangChain, attempts are often made by leveraging adapters like the `ChatOpenAI` client pointed at Ollama's `/v1` endpoint for its tool-calling capabilities. However, **these workarounds are proving to be highly unreliable and often barely functional for achieving a smooth, fully-streamed tool call and response sequence.** The streaming of the *initial* tool-calling intent might partially work, but getting a reliably streamed *final* answer *after* tool execution via such methods is fraught with issues and inconsistencies. This makes building robust, production-ready agents very difficult. A first-class, native implementation of this full streaming lifecycle directly within Ollama, for all relevant API endpoints supporting tool calls, would provide significantly more stability, a much better developer experience, and unlock more advanced use cases. We would be incredibly grateful if this feature could be given high priority on your roadmap. It would vastly increase Ollama's attractiveness and utility for an even broader range of applications. Thank you for your time and your outstanding work! I'm happy to provide further details, assist with testing, or offer any feedback needed. Best regards
GiteaMirror added the feature request label 2026-04-12 19:08:52 -05:00
Author
Owner

@ParthSareen commented on GitHub (May 27, 2025):

Hey @DannyWhyze!

Excited to get this in for the next launch :)
https://github.com/ollama/ollama/pull/10415

<!-- gh-comment-id:2910890590 --> @ParthSareen commented on GitHub (May 27, 2025): Hey @DannyWhyze! Excited to get this in for the next launch :) https://github.com/ollama/ollama/pull/10415
Author
Owner

@jmorganca commented on GitHub (May 29, 2025):

This is now available in 0.8.0! https://github.com/ollama/ollama/releases/tag/v0.8.0

<!-- gh-comment-id:2918075285 --> @jmorganca commented on GitHub (May 29, 2025): This is now available in 0.8.0! https://github.com/ollama/ollama/releases/tag/v0.8.0
Author
Owner

@DannyWhyze commented on GitHub (May 29, 2025):

Hey @ParthSareen
I was really looking forward to seeing #10415 land.
And now it's here, and it's absolutely fantastic! I jumped right in and tried it out immediately.
It's made using [Ollama/the tool] even more enjoyable now.
A huge thank you to the entire team for their incredible work on this!
Cheers,
Danny Whyze

<!-- gh-comment-id:2919005518 --> @DannyWhyze commented on GitHub (May 29, 2025): Hey @ParthSareen I was really looking forward to seeing #10415 land. And now it's here, and it's absolutely fantastic! I jumped right in and tried it out immediately. It's made using [Ollama/the tool] even more enjoyable now. A huge thank you to the entire team for their incredible work on this! Cheers, Danny Whyze
Author
Owner

@DannyWhyze commented on GitHub (May 29, 2025):

This is now available in 0.8.0! https://github.com/ollama/ollama/releases/tag/v0.8.0
"Thanks so much! I actually had to force myself to go to sleep because I spent the whole time re-setting up the backend and tweaking the frontend. This update is fantastic!"

<!-- gh-comment-id:2919011252 --> @DannyWhyze commented on GitHub (May 29, 2025): > This is now available in 0.8.0! https://github.com/ollama/ollama/releases/tag/v0.8.0 "Thanks so much! I actually had to force myself to go to sleep because I spent the whole time re-setting up the backend and tweaking the frontend. This update is fantastic!"
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7142