[GH-ISSUE #9632] Ollama not streaming tool calling responses #32046

Closed
opened 2026-04-22 12:55:15 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @fireblade2534 on GitHub (Mar 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9632

Originally assigned to: @ParthSareen on GitHub.

What is the issue?

When I use ollama to perform tool calling it doesn't stream the output after the model calls the tool. To compare how ollama and open ai does it I used these files which are slighly modified copies of openai's tool calling example https://platform.openai.com/docs/guides/function-calling:

Test Ollama.py

from openai import OpenAI
import json

client = OpenAI(base_url="http://192.168.2.244:11434/v1", api_key="NONe")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for provided location in celsius.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
            },
            "required": ["location"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

messages = [{"role": "user", "content": "What's the weather like in Paris today?"}]

completion = client.chat.completions.create(
    model="hf.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF:Q4_K_L",
    messages=messages,
    tools=tools,
)
tool_call = completion.choices[0].message.tool_calls[0]

result = "Today the weather in paris is 30 degrees celcius"
messages.append(completion.choices[0].message)  # append model's function call message
messages.append({                               # append result message
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": str(result)
})

completion_2 = client.chat.completions.create(
    model="hf.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF:Q4_K_L",
    messages=messages,
    tools=tools,
    stream=True
)

for chunk in completion_2:
    print(chunk)

Test OpenAI.py

from openai import OpenAI
import json

client = OpenAI(api_key="***************************")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for provided location in celsius.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
            },
            "required": ["location"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

messages = [{"role": "user", "content": "What's the weather like in Paris today?"}]

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
)
tool_call = completion.choices[0].message.tool_calls[0]

result = "The weather in paris is 30 degrees celcius"
messages.append(completion.choices[0].message)  # append model's function call message
messages.append({                               # append result message
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": str(result)
})

completion_2 = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    stream=True
)

for chunk in completion_2:
    print(chunk)

Relevant log output

fireblade2534@vscode:~/Code/Cuebert$ /home/fireblade2534/Code/Cuebert/.venv/bin/python "/home/fireblade2534/Code/Cuebert/Test Ollama.py"
ChatCompletionChunk(id='chatcmpl-432', choices=[Choice(delta=ChoiceDelta(content='Thank you for the information. So, it seems like today in Paris is quite warm with a temperature of 30 degrees Celsius. To give you more specific details, I would typically need to check the latest weather forecast from a reliable source. However, since specific forecasts can change and real-time data may vary, please consider verifying with a trusted weather service like the Meteorological Office or a weather app for the most accurate and up-to-date information.\n\nWould you like recommendations on what to do in Paris given this warm weather?', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1741638022, model='hf.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF:Q4_K_L', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_ollama', usage=None)
(cuebert) fireblade2534@vscode:~/Code/Cuebert$ /home/fireblade2534/Code/Cuebert/.venv/bin/python "/home/fireblade2534/Code/Cuebert/Test OpenAI.py"
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='The', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' current', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' temperature', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' in', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' Paris', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' is', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='30', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' degrees', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' Celsius', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)
ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None)

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.5.13

Originally created by @fireblade2534 on GitHub (Mar 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9632 Originally assigned to: @ParthSareen on GitHub. ### What is the issue? When I use ollama to perform tool calling it doesn't stream the output after the model calls the tool. To compare how ollama and open ai does it I used these files which are slighly modified copies of openai's tool calling example https://platform.openai.com/docs/guides/function-calling: Test Ollama.py ``` from openai import OpenAI import json client = OpenAI(base_url="http://192.168.2.244:11434/v1", api_key="NONe") tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get current temperature for provided location in celsius.", "parameters": { "type": "object", "properties": { "location": {"type": "string"}, }, "required": ["location"], "additionalProperties": False }, "strict": True } }] messages = [{"role": "user", "content": "What's the weather like in Paris today?"}] completion = client.chat.completions.create( model="hf.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF:Q4_K_L", messages=messages, tools=tools, ) tool_call = completion.choices[0].message.tool_calls[0] result = "Today the weather in paris is 30 degrees celcius" messages.append(completion.choices[0].message) # append model's function call message messages.append({ # append result message "role": "tool", "tool_call_id": tool_call.id, "content": str(result) }) completion_2 = client.chat.completions.create( model="hf.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF:Q4_K_L", messages=messages, tools=tools, stream=True ) for chunk in completion_2: print(chunk) ``` Test OpenAI.py ``` from openai import OpenAI import json client = OpenAI(api_key="***************************") tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get current temperature for provided location in celsius.", "parameters": { "type": "object", "properties": { "location": {"type": "string"}, }, "required": ["location"], "additionalProperties": False }, "strict": True } }] messages = [{"role": "user", "content": "What's the weather like in Paris today?"}] completion = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, ) tool_call = completion.choices[0].message.tool_calls[0] result = "The weather in paris is 30 degrees celcius" messages.append(completion.choices[0].message) # append model's function call message messages.append({ # append result message "role": "tool", "tool_call_id": tool_call.id, "content": str(result) }) completion_2 = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, stream=True ) for chunk in completion_2: print(chunk) ``` ### Relevant log output ```shell fireblade2534@vscode:~/Code/Cuebert$ /home/fireblade2534/Code/Cuebert/.venv/bin/python "/home/fireblade2534/Code/Cuebert/Test Ollama.py" ChatCompletionChunk(id='chatcmpl-432', choices=[Choice(delta=ChoiceDelta(content='Thank you for the information. So, it seems like today in Paris is quite warm with a temperature of 30 degrees Celsius. To give you more specific details, I would typically need to check the latest weather forecast from a reliable source. However, since specific forecasts can change and real-time data may vary, please consider verifying with a trusted weather service like the Meteorological Office or a weather app for the most accurate and up-to-date information.\n\nWould you like recommendations on what to do in Paris given this warm weather?', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1741638022, model='hf.co/bartowski/Mistral-Small-24B-Instruct-2501-GGUF:Q4_K_L', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_ollama', usage=None) (cuebert) fireblade2534@vscode:~/Code/Cuebert$ /home/fireblade2534/Code/Cuebert/.venv/bin/python "/home/fireblade2534/Code/Cuebert/Test OpenAI.py" ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='The', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' current', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' temperature', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' in', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' Paris', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' is', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='30', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' degrees', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=' Celsius', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ChatCompletionChunk(id='chatcmpl-B9dtt39CYgA2gPATGyw1g1T2fsJ32', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1741638029, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier='default', system_fingerprint='fp_f9f4fb6dbf', usage=None) ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.13
GiteaMirror added the bug label 2026-04-22 12:55:15 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 10, 2025):

https://github.com/ollama/ollama/issues/8887#issuecomment-2640721896

<!-- gh-comment-id:2711746811 --> @rick-github commented on GitHub (Mar 10, 2025): https://github.com/ollama/ollama/issues/8887#issuecomment-2640721896
Author
Owner

@fireblade2534 commented on GitHub (Mar 10, 2025):

I'm using the open ai api and considering its suppost to be compatible I think that it should comply with how it works on open ai's api.

<!-- gh-comment-id:2711884145 --> @fireblade2534 commented on GitHub (Mar 10, 2025): I'm using the open ai api and considering its suppost to be compatible I think that it should comply with how it works on open ai's api.
Author
Owner

@UlrikWKoren commented on GitHub (Mar 12, 2025):

Please FIX This, we need to be able to stream while we have given tools. Please!

<!-- gh-comment-id:2717792889 --> @UlrikWKoren commented on GitHub (Mar 12, 2025): Please FIX This, we need to be able to stream while we have given tools. Please!
Author
Owner

@seroperson commented on GitHub (May 4, 2025):

It actually breaks using ollama with any AI assistants like avante.nvim, codecompanion and others. Can't wait this to be resolved.

<!-- gh-comment-id:2849475887 --> @seroperson commented on GitHub (May 4, 2025): It actually breaks using ollama with any AI assistants like avante.nvim, codecompanion and others. Can't wait this to be resolved.
Author
Owner

@TheLazyLemur commented on GitHub (May 7, 2025):

I see this is currently being worked on

https://github.com/ollama/ollama/issues/8887#issuecomment-2831547411

<!-- gh-comment-id:2857401878 --> @TheLazyLemur commented on GitHub (May 7, 2025): I see this is currently being worked on https://github.com/ollama/ollama/issues/8887#issuecomment-2831547411
Author
Owner

@rick-github commented on GitHub (May 7, 2025):

#10415

<!-- gh-comment-id:2857820615 --> @rick-github commented on GitHub (May 7, 2025): #10415
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32046