[GH-ISSUE #5989] Tools should support streaming=true #65783

Closed
opened 2026-05-03 22:39:07 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @drazdra on GitHub (Jul 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5989

When the stream=true, ollama doesn't return tool request in the final "done" message, instead it returns it just part by part as if it was a regular reply.

At that, we have no way to determine it was a tool request, because ollama doesn't change the role to "tool" and it's just "assistant". Due to that we can not hide the tool call code from users, as we can't recognize it's a tool request and not a usual reply, until we get the final "done" message that says "done_reason='stop'". Which is also not a good way to detect that.

And in addition to that, final "done" message doesn't carry the tool request field if the stream=true, as it does when stream=false. The field is just absent.

Both of these obviously make no sense. It would make sense to:

  1. In case of streaming reply there should be a special role in the message that prints a tool request, like, role='tool' or anything.
  2. In any case, tools request should be presented in the final "done" message, so it could be taken from there as it can be when stream=false right now.

I would suggest the best way is to drop streaming flag internally when you return tool request, as streaming is totally unneeded here, we need to see content for users live but we do not to see function calling live as we never show it to users and we can't execute it partially.

So you just need to return the tool call in a single "done" object when the model produces a tool request.

More lyrics:
The reason why we can not just disable streaming for using tools is because tools are supposed to be called spontaneously during the chat, whenever model decides it needs a tool and requests it.

It means, that this process happens during a regular chat and we can not predict when the model will require a tool request.

Originally created by @drazdra on GitHub (Jul 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5989 When the stream=true, ollama doesn't return tool request in the final "done" message, instead it returns it just part by part as if it was a regular reply. At that, we have no way to determine it was a tool request, because ollama doesn't change the role to "tool" and it's just "assistant". Due to that we can not hide the tool call code from users, as we can't recognize it's a tool request and not a usual reply, until we get the final "done" message that says "done_reason='stop'". Which is also not a good way to detect that. And in addition to that, final "done" message doesn't carry the tool request field if the stream=true, as it does when stream=false. The field is just absent. Both of these obviously make no sense. It would make sense to: 1. In case of streaming reply there should be a special role in the message that prints a tool request, like, role='tool' or anything. 2. In any case, tools request should be presented in the final "done" message, so it could be taken from there as it can be when stream=false right now. I would suggest the best way is to drop streaming flag internally when you return tool request, as streaming is totally unneeded here, we need to see content for users live but we do not to see function calling live as we never show it to users and we can't execute it partially. So you just need to return the tool call in a single "done" object when the model produces a tool request. More lyrics: The reason why we can not just disable streaming for using tools is because tools are supposed to be called spontaneously during the chat, whenever _model_ decides it needs a tool and requests it. It means, that this process happens during a regular chat and we can not predict when the model will require a tool request.
GiteaMirror added the feature request label 2026-05-03 22:39:07 -05:00
Author
Owner

@vertrue commented on GitHub (Jul 26, 2024):

Dupe #5796
Also PR #5915 is waiting for review

<!-- gh-comment-id:2253198813 --> @vertrue commented on GitHub (Jul 26, 2024): Dupe #5796 Also PR #5915 is waiting for review
Author
Owner

@drazdra commented on GitHub (Jul 26, 2024):

Dupe #5796 Also PR #5915 is waiting for review

does your patch make it possible to see that stream is started for tool reply, not a regular reply, so UI can know that the reply should not be shown to user?

<!-- gh-comment-id:2253211484 --> @drazdra commented on GitHub (Jul 26, 2024): > Dupe #5796 Also PR #5915 is waiting for review does your patch make it possible to see that stream is started for tool reply, not a regular reply, so UI can know that the reply should not be shown to user?
Author
Owner

@vertrue commented on GitHub (Jul 26, 2024):

Dupe #5796 Also PR #5915 is waiting for review

does your patch make it possible to see that stream is started for tool reply, not a regular reply, so UI can know that the reply should not be shown to user?

Actually no, because current setup for tools in ollama generates tools instructions right to the stream
You can build my branch and check if it’s match your requirements. It yes, it can help in reviewing
If not, please, update your issue so developers can see what is expected

<!-- gh-comment-id:2253245146 --> @vertrue commented on GitHub (Jul 26, 2024): > > Dupe #5796 Also PR #5915 is waiting for review > > does your patch make it possible to see that stream is started for tool reply, not a regular reply, so UI can know that the reply should not be shown to user? Actually no, because current setup for tools in ollama generates tools instructions right to the stream You can build my branch and check if it’s match your requirements. It yes, it can help in reviewing If not, please, update your issue so developers can see what is expected
Author
Owner

@drazdra commented on GitHub (Jul 26, 2024):

Dupe #5796 Also PR #5915 is waiting for review

does your patch make it possible to see that stream is started for tool reply, not a regular reply, so UI can know that the reply should not be shown to user?

Actually no, because current setup for tools in ollama generates tools instructions right to the stream You can build my branch and check if it’s match your requirements. It yes, it can help in reviewing If not, please, update your issue so developers can see what is expected

my issue already includes that element in the original description.

<!-- gh-comment-id:2253250885 --> @drazdra commented on GitHub (Jul 26, 2024): > > > Dupe #5796 Also PR #5915 is waiting for review > > > > > > does your patch make it possible to see that stream is started for tool reply, not a regular reply, so UI can know that the reply should not be shown to user? > > Actually no, because current setup for tools in ollama generates tools instructions right to the stream You can build my branch and check if it’s match your requirements. It yes, it can help in reviewing If not, please, update your issue so developers can see what is expected my issue already includes that element in the original description.
Author
Owner

@jmorganca commented on GitHub (Sep 4, 2024):

Thanks for the issue @drazdra, will merge this with https://github.com/ollama/ollama/issues/5796 so we have the conversation in one place.

<!-- gh-comment-id:2327885264 --> @jmorganca commented on GitHub (Sep 4, 2024): Thanks for the issue @drazdra, will merge this with https://github.com/ollama/ollama/issues/5796 so we have the conversation in one place.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65783