[GH-ISSUE #5796] Streaming for tool calls is unsupported #50124

New Issue

GiteaMirror · 2026-04-28T14:13:27-05:00

GiteaMirror commented

2026-04-28 14:13:27 -05:00

Originally created by @vertrue on GitHub (Jul 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5796

Originally assigned to: @ParthSareen on GitHub.

What is the issue?

Hi everyone!

I am trying to use tools in requests to llama3-groq-tool-use:70b. Here is simple code in Python using langchain==0.2.9:

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
)
from langchain.agents import AgentExecutor, create_openai_tools_agent

@tool
def function_1(a: int, b: int) -> int:
    """uses function function_1 for arguments a and b."""
    return a % b + 2

@tool
def function_2(a: int, b: int) -> int:
    """uses function function_2 for arguments a and b."""
    return a * b + 1

tools = [function_1, function_2]


llm = ChatOpenAI(
    model="llama3-groq-tool-use:70b",
    temperature=0,
)

default_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful AI assistant."),
            MessagesPlaceholder("chat_history", optional=True),
            ("human", "{input}"),
            MessagesPlaceholder("agent_scratchpad"),
        ]
    )

input_message = "What is function_1(10, 11)? Also what is function_2(10, 11)?"

agent = create_openai_tools_agent(
    llm=llm,
    tools=tools,
    prompt=default_prompt
)

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    return_intermediate_steps=False,
)

res = agent_executor.invoke({"input": input_message})

print(res)

The result is following:

> Entering new AgentExecutor chain...
<tool_call>
{"id": 0, "name": "function_1", "arguments": {"a": 10, "b": 11}}
</tool_call>
<tool_call>
{"id": 1, "name": "function_2", "arguments": {"a": 10, "b": 11}}
</tool_call>

> Finished chain.
{
    'input': 'What is function_1(10, 11)? Also what is function_2(10, 11)?',
    'output': '<tool_call>\n{"id": 0, "name": "function_1", "arguments": {"a": 10, "b": 11}}\n</tool_call>\n<tool_call>\n{"id": 1, "name": "function_2", "arguments": {"a": 10, "b": 11}}\n</tool_call>'
}

If I am using langchain_community.chat_models.ollama.ChatOllama it output the same.

But if I use same model (llama3-groq-70b-8192-tool-use-preview) with groq OpenAI Compatible API, it uses tools and invokes the functions, output below:

> Entering new AgentExecutor chain...

Invoking: `function_1` with `{'a': 10, 'b': 11}`


12
Invoking: `function_2` with `{'a': 10, 'b': 11}`


111The result of function_1(10, 11) is 12, and the result of function_2(10, 11) is 111.

> Finished chain.
{
    'input': 'What is function_1(10, 11)? Also what is function_2(10, 11)?',
    'output': 'The result of function_1(10, 11) is 12, and the result of function_2(10, 11) is 111.'
}

Is it expected behaviour or this problem is it still in progress?
Many thanks

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.2.7

Originally created by @vertrue on GitHub (Jul 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5796 Originally assigned to: @ParthSareen on GitHub. ### What is the issue? Hi everyone! I am trying to use tools in requests to `llama3-groq-tool-use:70b`. Here is simple code in Python using langchain==0.2.9: ``` from langchain_core.tools import tool from langchain_openai import ChatOpenAI from langchain.prompts import ( ChatPromptTemplate, MessagesPlaceholder, ) from langchain.agents import AgentExecutor, create_openai_tools_agent @tool def function_1(a: int, b: int) -> int: """uses function function_1 for arguments a and b.""" return a % b + 2 @tool def function_2(a: int, b: int) -> int: """uses function function_2 for arguments a and b.""" return a * b + 1 tools = [function_1, function_2] llm = ChatOpenAI( model="llama3-groq-tool-use:70b", temperature=0, ) default_prompt = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful AI assistant."), MessagesPlaceholder("chat_history", optional=True), ("human", "{input}"), MessagesPlaceholder("agent_scratchpad"), ] ) input_message = "What is function_1(10, 11)? Also what is function_2(10, 11)?" agent = create_openai_tools_agent( llm=llm, tools=tools, prompt=default_prompt ) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, return_intermediate_steps=False, ) res = agent_executor.invoke({"input": input_message}) print(res) ``` The result is following: ``` > Entering new AgentExecutor chain... <tool_call> {"id": 0, "name": "function_1", "arguments": {"a": 10, "b": 11}} </tool_call> <tool_call> {"id": 1, "name": "function_2", "arguments": {"a": 10, "b": 11}} </tool_call> > Finished chain. { 'input': 'What is function_1(10, 11)? Also what is function_2(10, 11)?', 'output': '<tool_call>\n{"id": 0, "name": "function_1", "arguments": {"a": 10, "b": 11}}\n</tool_call>\n<tool_call>\n{"id": 1, "name": "function_2", "arguments": {"a": 10, "b": 11}}\n</tool_call>' } ``` If I am using `langchain_community.chat_models.ollama.ChatOllama` it output the same. But if I use same model (`llama3-groq-70b-8192-tool-use-preview`) with groq OpenAI Compatible API, it uses tools and invokes the functions, output below: ``` > Entering new AgentExecutor chain... Invoking: `function_1` with `{'a': 10, 'b': 11}` 12 Invoking: `function_2` with `{'a': 10, 'b': 11}` 111The result of function_1(10, 11) is 12, and the result of function_2(10, 11) is 111. > Finished chain. { 'input': 'What is function_1(10, 11)? Also what is function_2(10, 11)?', 'output': 'The result of function_1(10, 11) is 12, and the result of function_2(10, 11) is 111.' } ``` Is it expected behaviour or this problem is it still in progress? Many thanks ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.2.7

GiteaMirror added the bug api labels 2026-04-28 14:13:28 -05:00

GiteaMirror closed this issue

2026-04-28 14:13:33 -05:00

GiteaMirror commented

2026-04-28 14:13:36 -05:00

@rick-github commented on GitHub (Jul 19, 2024):

The dedicated tool handling is a recent addition to ollama so probably needs some tweaking. Looking at your logs, it would seem that what ollama is returning is not what langchain is expecting, so some digging through the code on both sides would be needed to match them up.

@rick-github commented on GitHub (Jul 19, 2024): The dedicated tool handling is a recent addition to ollama so probably needs some tweaking. Looking at your logs, it would seem that what ollama is returning is not what langchain is expecting, so some digging through the code on both sides would be needed to match them up.

GiteaMirror commented

2026-04-28 14:13:38 -05:00

@marcnnn commented on GitHub (Jul 19, 2024):

I found out that Ollama sends “stop” and not "finish_reason": "tool_calls"
like groq api that I tested it against.

I was using langchain on elixir.
"finish_reason" => "tool_calls", deleting this from the Pattern match helped.

Ollama OpenAi api should answer with tool_calls as a finishing reason as well

@marcnnn commented on GitHub (Jul 19, 2024): I found out that Ollama sends “stop” and not "finish_reason": "tool_calls" like groq api that I tested it against. I was using langchain on elixir. "finish_reason" => "tool_calls", deleting this from the Pattern match helped. Ollama OpenAi api should answer with tool_calls as a finishing reason as well

GiteaMirror commented

2026-04-28 14:13:40 -05:00

@vertrue commented on GitHub (Jul 20, 2024):

Nice! Hope this will get fixed soon

@vertrue commented on GitHub (Jul 20, 2024): Nice! Hope this will get fixed soon

GiteaMirror commented

2026-04-28 14:13:42 -05:00

@KSemenenko commented on GitHub (Jul 20, 2024):

Nice! Hope this will get fixed soon

me too!

@KSemenenko commented on GitHub (Jul 20, 2024): > Nice! Hope this will get fixed soon me too!

GiteaMirror commented

2026-04-28 14:13:46 -05:00

@vertrue commented on GitHub (Jul 23, 2024):

@rick-github hi! are you in contact with someone who can fix this issue or review current PR?
seems like this bug is critical for langchain... or any other instrument that can use tools

@vertrue commented on GitHub (Jul 23, 2024): @rick-github hi! are you in contact with someone who can fix this issue or review current PR? seems like this bug is critical for langchain... or any other instrument that can use tools

GiteaMirror commented

2026-04-28 14:13:47 -05:00

@rick-github commented on GitHub (Jul 23, 2024):

Sorry, I'm not a member of the ollama team. I see that you've tagged Jeffrey, you'll to wait until he or somebody with review powers takes a look. In the meantime you'll have to build locally.

@rick-github commented on GitHub (Jul 23, 2024): Sorry, I'm not a member of the ollama team. I see that you've tagged Jeffrey, you'll to wait until he or somebody with review powers takes a look. In the meantime you'll have to build locally.

GiteaMirror commented

2026-04-28 14:13:47 -05:00

@KSemenenko commented on GitHub (Jul 23, 2024):

Llama 3.1 is here, function is here, now this is super important fix

@KSemenenko commented on GitHub (Jul 23, 2024): Llama 3.1 is here, function is here, now this is super important fix

GiteaMirror commented

2026-04-28 14:13:48 -05:00

@rick-github commented on GitHub (Jul 23, 2024):

The current version of llama3.1 doesn't support tools, https://github.com/ollama/ollama/issues/5885

@rick-github commented on GitHub (Jul 23, 2024): The current version of llama3.1 doesn't support tools, https://github.com/ollama/ollama/issues/5885

GiteaMirror commented

2026-04-28 14:13:50 -05:00

@vertrue commented on GitHub (Jul 23, 2024):

I dug deeper

when agent is executed in langchain. here is outputs right before function callings:

groq:

[llm/end] [chain:AgentExecutor > chain:RunnableSequence > llm:ChatOpenAI] [696ms] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "",
        "generation_info": {
          "finish_reason": "tool_calls",
          "model_name": "llama3-groq-70b-8192-tool-use-preview",
          "system_fingerprint": "fp_ee4b521143"
        },
        "type": "ChatGenerationChunk",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "",
            "additional_kwargs": {
              "tool_calls": [
                {
                  "index": 0,
                  "id": "call_wqy8",
                  "function": {
                    "arguments": "{\"a\": 10, \"b\": 11}",
                    "name": "function_1"
                  },
                  "type": "function"
                },
                {
                  "index": 1,
                  "id": "call_my9j",
                  "function": {
                    "arguments": "{\"a\": 10, \"b\": 11}",
                    "name": "function_1"
                  },
                  "type": "function"
                }
              ]
            },
            "response_metadata": {
              "finish_reason": "tool_calls",
              "model_name": "llama3-groq-70b-8192-tool-use-preview",
              "system_fingerprint": "fp_ee4b521143"
            },
            "type": "AIMessageChunk",
            "id": "run-1527b935-91a4-417f-93fa-696d3f184c08",
            "tool_calls": [
              {
                "name": "function_1",
                "args": {
                  "a": 10,
                  "b": 11
                },
                "id": "call_wqy8",
                "type": "tool_call"
              },
              {
                "name": "function_1",
                "args": {
                  "a": 10,
                  "b": 11
                },
                "id": "call_my9j",
                "type": "tool_call"
              }
            ],
            "tool_call_chunks": [
              {
                "name": "function_1",
                "args": "{\"a\": 10, \"b\": 11}",
                "id": "call_wqy8",
                "index": 0,
                "type": "tool_call_chunk"
              },
              {
                "name": "function_1",
                "args": "{\"a\": 10, \"b\": 11}",
                "id": "call_my9j",
                "index": 1,
                "type": "tool_call_chunk"
              }
            ],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}

ollama:

[llm/end] [chain:AgentExecutor > chain:RunnableSequence > llm:ChatOpenAI] [3.27s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "<tool_call>\n{\"id\": 0, \"name\": \"function_1\", \"arguments\": {\"a\": 10, \"b\": 11}}\n</tool_call>",
        "generation_info": {
          "finish_reason": "stop",
          "model_name": "llama3-groq-tool-use:8b",
          "system_fingerprint": "fp_ollama"
        },
        "type": "ChatGenerationChunk",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "<tool_call>\n{\"id\": 0, \"name\": \"function_1\", \"arguments\": {\"a\": 10, \"b\": 11}}\n</tool_call>",
            "response_metadata": {
              "finish_reason": "stop",
              "model_name": "llama3-groq-tool-use:8b",
              "system_fingerprint": "fp_ollama"
            },
            "type": "AIMessageChunk",
            "id": "run-2099882f-1760-4d5e-9027-04f67a656a0a",
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}

now I am not sure if it is a bug :)

@vertrue commented on GitHub (Jul 23, 2024): I dug deeper when agent is executed in langchain. here is outputs right before function callings: groq: ``` [llm/end] [chain:AgentExecutor > chain:RunnableSequence > llm:ChatOpenAI] [696ms] Exiting LLM run with output: { "generations": [ [ { "text": "", "generation_info": { "finish_reason": "tool_calls", "model_name": "llama3-groq-70b-8192-tool-use-preview", "system_fingerprint": "fp_ee4b521143" }, "type": "ChatGenerationChunk", "message": { "lc": 1, "type": "constructor", "id": [ "langchain", "schema", "messages", "AIMessageChunk" ], "kwargs": { "content": "", "additional_kwargs": { "tool_calls": [ { "index": 0, "id": "call_wqy8", "function": { "arguments": "{\"a\": 10, \"b\": 11}", "name": "function_1" }, "type": "function" }, { "index": 1, "id": "call_my9j", "function": { "arguments": "{\"a\": 10, \"b\": 11}", "name": "function_1" }, "type": "function" } ] }, "response_metadata": { "finish_reason": "tool_calls", "model_name": "llama3-groq-70b-8192-tool-use-preview", "system_fingerprint": "fp_ee4b521143" }, "type": "AIMessageChunk", "id": "run-1527b935-91a4-417f-93fa-696d3f184c08", "tool_calls": [ { "name": "function_1", "args": { "a": 10, "b": 11 }, "id": "call_wqy8", "type": "tool_call" }, { "name": "function_1", "args": { "a": 10, "b": 11 }, "id": "call_my9j", "type": "tool_call" } ], "tool_call_chunks": [ { "name": "function_1", "args": "{\"a\": 10, \"b\": 11}", "id": "call_wqy8", "index": 0, "type": "tool_call_chunk" }, { "name": "function_1", "args": "{\"a\": 10, \"b\": 11}", "id": "call_my9j", "index": 1, "type": "tool_call_chunk" } ], "invalid_tool_calls": [] } } } ] ], "llm_output": null, "run": null } ``` ollama: ``` [llm/end] [chain:AgentExecutor > chain:RunnableSequence > llm:ChatOpenAI] [3.27s] Exiting LLM run with output: { "generations": [ [ { "text": "<tool_call>\n{\"id\": 0, \"name\": \"function_1\", \"arguments\": {\"a\": 10, \"b\": 11}}\n</tool_call>", "generation_info": { "finish_reason": "stop", "model_name": "llama3-groq-tool-use:8b", "system_fingerprint": "fp_ollama" }, "type": "ChatGenerationChunk", "message": { "lc": 1, "type": "constructor", "id": [ "langchain", "schema", "messages", "AIMessageChunk" ], "kwargs": { "content": "<tool_call>\n{\"id\": 0, \"name\": \"function_1\", \"arguments\": {\"a\": 10, \"b\": 11}}\n</tool_call>", "response_metadata": { "finish_reason": "stop", "model_name": "llama3-groq-tool-use:8b", "system_fingerprint": "fp_ollama" }, "type": "AIMessageChunk", "id": "run-2099882f-1760-4d5e-9027-04f67a656a0a", "tool_calls": [], "invalid_tool_calls": [] } } } ] ], "llm_output": null, "run": null } ``` now I am not sure if it is a bug :)

GiteaMirror commented

2026-04-28 14:13:54 -05:00

@vertrue commented on GitHub (Jul 24, 2024):

found out that ollama is not parsing tools if req.Stream = true

found out that here ChatRequest.Stream is by default is true
a6cd8f6169/api/types.go (L93)

so if you are calling /v1/chat/completions, it just does not parse tools and return text response with tool in tags:
a6cd8f6169/server/routes.go (L1372)

changing
if req.Stream != nil && !*req.Stream
to
if req.Stream != nil && *req.Stream
still gives answer without tools

investigating further to see what langchain is looking for in response, because fmt.Print(len(resp.Message.ToolCalls)) right after this line
a6cd8f6169/server/routes.go (L1401)
prints 1 (not 0) to console

for me it looks like api.ChatResponse should also have field ToolCalls

@vertrue commented on GitHub (Jul 24, 2024): found out that ollama is not parsing tools if `req.Stream = true` found out that here `ChatRequest.Stream` is by default is `true` https://github.com/ollama/ollama/blob/a6cd8f6169c029c92105962017562274bd90626b/api/types.go#L93 so if you are calling `/v1/chat/completions`, it just does not parse tools and return text response with tool in tags: https://github.com/ollama/ollama/blob/a6cd8f6169c029c92105962017562274bd90626b/server/routes.go#L1372 changing `if req.Stream != nil && !*req.Stream` to `if req.Stream != nil && *req.Stream` still gives answer without tools investigating further to see what langchain is looking for in response, because `fmt.Print(len(resp.Message.ToolCalls))` right after this line https://github.com/ollama/ollama/blob/a6cd8f6169c029c92105962017562274bd90626b/server/routes.go#L1401 prints `1` (not `0`) to console for me it looks like `api.ChatResponse` should also have field `ToolCalls`

GiteaMirror commented

2026-04-28 14:13:56 -05:00

@KSemenenko commented on GitHub (Jul 24, 2024):

I found Mistarl 7B also support toolling, lets check it! maybe groc model is broken

@KSemenenko commented on GitHub (Jul 24, 2024): I found Mistarl 7B also support toolling, lets check it! maybe groc model is broken

GiteaMirror commented

2026-04-28 14:13:57 -05:00

@vertrue commented on GitHub (Jul 24, 2024):

langchain is working with chunks
and ollama does not return any chunk that includes tools

here is request to /v1/chat/completions:

{
  "messages": [
    {
      "content": "You are a helpful AI assistant that can use tools.",
      "role": "system"
    },
    {
      "content": "What is function_1(10, 11)? Also what is function_1(11, 12)? use provided tools",
      "role": "user"
    }
  ],
  "model": "llama3-groq-tool-use:8b",
  "logprobs": false,
  "n": 1,
  "stream": true,
  "temperature": 0,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "function_1",
        "description": "uses function function_1 for arguments a and b.",
        "parameters": {
          "type": "object",
          "properties": {
            "a": {
              "type": "integer"
            },
            "b": {
              "type": "integer"
            }
          },
          "required": [
            "a",
            "b"
          ]
        }
      }
    }
  ]
}

here is output of the last chunk:

{
    "id": "chatcmpl-934",
    "object": "chat.completion.chunk",
    "created": 1721826100,
    "model": "llama3-groq-tool-use:8b",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "delta": {
                "role": "assistant",
                "content": ""
            },
            "finish_reason": "stop"
        }
    ]
}

@vertrue commented on GitHub (Jul 24, 2024): langchain is working with chunks and ollama does not return any chunk that includes tools here is request to `/v1/chat/completions`: ``` { "messages": [ { "content": "You are a helpful AI assistant that can use tools.", "role": "system" }, { "content": "What is function_1(10, 11)? Also what is function_1(11, 12)? use provided tools", "role": "user" } ], "model": "llama3-groq-tool-use:8b", "logprobs": false, "n": 1, "stream": true, "temperature": 0, "tools": [ { "type": "function", "function": { "name": "function_1", "description": "uses function function_1 for arguments a and b.", "parameters": { "type": "object", "properties": { "a": { "type": "integer" }, "b": { "type": "integer" } }, "required": [ "a", "b" ] } } } ] } ``` here is output of the last chunk: ``` { "id": "chatcmpl-934", "object": "chat.completion.chunk", "created": 1721826100, "model": "llama3-groq-tool-use:8b", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "" }, "finish_reason": "stop" } ] } ```

GiteaMirror commented

2026-04-28 14:13:58 -05:00

@rick-github commented on GitHub (Jul 24, 2024):

Looks like you pasted the last chunk instead of the request.

@rick-github commented on GitHub (Jul 24, 2024): Looks like you pasted the last chunk instead of the request.

GiteaMirror commented

2026-04-28 14:13:59 -05:00

@vertrue commented on GitHub (Jul 24, 2024):

@rick-github fixed

@vertrue commented on GitHub (Jul 24, 2024): @rick-github fixed

GiteaMirror commented

2026-04-28 14:14:02 -05:00

@vertrue commented on GitHub (Jul 24, 2024):

@KSemenenko fixed, I believe!
you can pull my branch if urgent

@vertrue commented on GitHub (Jul 24, 2024): @KSemenenko fixed, I believe! you can pull my branch if urgent

GiteaMirror commented

2026-04-28 14:14:03 -05:00

@vertrue commented on GitHub (Aug 8, 2024):

still waiting for PR :(

@vertrue commented on GitHub (Aug 8, 2024): still waiting for PR :(

GiteaMirror commented

2026-04-28 14:14:05 -05:00

@vertrue commented on GitHub (Aug 21, 2024):

interesting update

the following code allows to call tools without any problems. not sure if langchain-ollama uses stream=true
ollama 0.3.6, langchain-ollama 0.1.1

from langchain_ollama import ChatOllama

from langchain.agents import (
    AgentExecutor,
    create_tool_calling_agent
)

from langchain_core.prompts import ChatPromptTemplate

from langchain.globals import set_debug

from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate


@tool("function_1")
def function_1(a: int, b: int) -> int:
    """uses function function_1 for arguments a and b."""
    return a * b


@tool("function_2")
def function_2(a: int, b: int) -> int:
    """uses function function_2 for arguments a and b."""
    return a // b

set_debug(True)

llm = ChatOllama(
   model='llama3.1:70b',
   temperature=0,
   base_url=
)

tools = [function_1, function_2]

prompt = ChatPromptTemplate.from_messages([
  ("system", "You are a helpful assistant."),
  ("placeholder", "{chat_history}"),
  ("human", "{input}"),
  ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(
    llm=llm,
    tools=tools,
    prompt=prompt,
)

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    return_intermediate_steps=True,
    verbose=True,
)

agent_executor.invoke({"input": "What is function_1(10, 11)? Also, what is function_2(10, 11)?"})

@vertrue commented on GitHub (Aug 21, 2024): interesting update the following code allows to call tools without any problems. not sure if `langchain-ollama` uses `stream=true` ollama 0.3.6, langchain-ollama 0.1.1 ``` from langchain_ollama import ChatOllama from langchain.agents import ( AgentExecutor, create_tool_calling_agent ) from langchain_core.prompts import ChatPromptTemplate from langchain.globals import set_debug from langchain_core.tools import tool from langchain_core.prompts import ChatPromptTemplate @tool("function_1") def function_1(a: int, b: int) -> int: """uses function function_1 for arguments a and b.""" return a * b @tool("function_2") def function_2(a: int, b: int) -> int: """uses function function_2 for arguments a and b.""" return a // b set_debug(True) llm = ChatOllama( model='llama3.1:70b', temperature=0, base_url= ) tools = [function_1, function_2] prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("placeholder", "{chat_history}"), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) agent = create_tool_calling_agent( llm=llm, tools=tools, prompt=prompt, ) agent_executor = AgentExecutor( agent=agent, tools=tools, return_intermediate_steps=True, verbose=True, ) agent_executor.invoke({"input": "What is function_1(10, 11)? Also, what is function_2(10, 11)?"}) ```

GiteaMirror commented

2026-04-28 14:14:07 -05:00

@raducoravu commented on GitHub (Nov 13, 2024):

Any idea when this will get fixed in an official ollama release? It affects me too.

@raducoravu commented on GitHub (Nov 13, 2024): Any idea when this will get fixed in an official ollama release? It affects me too.

GiteaMirror commented

2026-04-28 14:14:09 -05:00

@Mhijazi16 commented on GitHub (Nov 15, 2024):

any new updates I can't get work done becuase of this issue

@Mhijazi16 commented on GitHub (Nov 15, 2024): any new updates I can't get work done becuase of this issue

GiteaMirror commented

2026-04-28 14:14:09 -05:00

@ParthSareen commented on GitHub (Nov 19, 2024):

Hi everyone! Thanks for being patient!

I'd love to understand the use case for streamed tool calls. Would appreciate if you can attach code samples as well (can be any framework/usage).

For any tool to be called you'd need the full response from the model to make the decision for which function to call and with what parameters. And since these are not user facing, one usually just waits for the response from the model to complete.

If this is a more framework enabled concept I am a bit weary of adding that as core functionality to Ollama - but happy to reconsider.

@ParthSareen commented on GitHub (Nov 19, 2024): Hi everyone! Thanks for being patient! I'd love to understand the use case for streamed tool calls. Would appreciate if you can attach code samples as well (can be any framework/usage). For any tool to be called you'd need the full response from the model to make the decision for which function to call and with what parameters. And since these are not user facing, one usually just waits for the response from the model to complete. If this is a more framework enabled concept I am a bit weary of adding that as core functionality to Ollama - but happy to reconsider.

GiteaMirror commented

2026-04-28 14:14:11 -05:00

@codefromthecrypt commented on GitHub (Nov 19, 2024):

@ParthSareen tool calls are the main way to integrate data besides RAG (feel free to argue). Streaming is currently in use by tools like kibana and many demos that render UIs, and core kibana functionality requires tool usage to integrate data.

A good start would be to fully support streaming options. We (elastic) raised a pull request on that recently, and afterwards could consider helping on tool calls. As you can imagine, maintaining diffs is its own task, so landing one thing before another is important https://github.com/ollama/ollama/pull/6784

@codefromthecrypt commented on GitHub (Nov 19, 2024): @ParthSareen tool calls are the main way to integrate data besides RAG (feel free to argue). Streaming is currently in use by tools like kibana and many demos that render UIs, and core kibana functionality requires tool usage to integrate data. A good start would be to fully support streaming options. We (elastic) raised a pull request on that recently, and afterwards could consider helping on tool calls. As you can imagine, maintaining diffs is its own task, so landing one thing before another is important https://github.com/ollama/ollama/pull/6784

GiteaMirror commented

2026-04-28 14:14:15 -05:00

@raducoravu commented on GitHub (Nov 19, 2024):

@ParthSareen in my case I have a GUI Chat view implemented in Java, when chatting with the AI engine, the engine has various tools at its disposal that it can (if necessary) invoke from the client side. All interactions with the AI engine are done by passing the "stream":true property so that the end user receives their final answer from the AI engine gradually as it is generated.
When working with an OpenAI server directly indeed the tool calls themselves received from the server side are chunked like this:

    data: {"id":"chatcmpl-AVBUA1EJdFS0SFdzMJWdsWoLw6kIm","object":"chat.completion.chunk","created":1731995682,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_b0dd3c3254","choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_ETUPF8laB8dVR8a6AwF2gt7W","type":"function","function":{"name":"retrieve_all_action_ids","arguments":""}}],"refusal":null},"logprobs":null,"finish_reason":null}]}
    data: {"id":"chatcmpl-AVBUA1EJdFS0SFdzMJWdsWoLw6kIm","object":"chat.completion.chunk","created":1731995682,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_b0dd3c3254","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{}"}}]},"logprobs":null,"finish_reason":null}]}
    data: {"id":"chatcmpl-AVBUA1EJdFS0SFdzMJWdsWoLw6kIm","object":"chat.completion.chunk","created":1731995682,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_b0dd3c3254","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"tool_calls"}]}

It does not bother me if the tool call from the server side would be received in one chunk or in three chunks.

@raducoravu commented on GitHub (Nov 19, 2024): @ParthSareen in my case I have a GUI Chat view implemented in Java, when chatting with the AI engine, the engine has various tools at its disposal that it can (if necessary) invoke from the client side. All interactions with the AI engine are done by passing the "stream":true property so that the end user receives their final answer from the AI engine gradually as it is generated. When working with an OpenAI server directly indeed the tool calls themselves received from the server side are chunked like this: data: {"id":"chatcmpl-AVBUA1EJdFS0SFdzMJWdsWoLw6kIm","object":"chat.completion.chunk","created":1731995682,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_b0dd3c3254","choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_ETUPF8laB8dVR8a6AwF2gt7W","type":"function","function":{"name":"retrieve_all_action_ids","arguments":""}}],"refusal":null},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-AVBUA1EJdFS0SFdzMJWdsWoLw6kIm","object":"chat.completion.chunk","created":1731995682,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_b0dd3c3254","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{}"}}]},"logprobs":null,"finish_reason":null}]} data: {"id":"chatcmpl-AVBUA1EJdFS0SFdzMJWdsWoLw6kIm","object":"chat.completion.chunk","created":1731995682,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_b0dd3c3254","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"tool_calls"}]} It does not bother me if the tool call from the server side would be received in one chunk or in three chunks.

GiteaMirror commented

2026-04-28 14:14:17 -05:00

@ParthSareen commented on GitHub (Nov 19, 2024):

@raducoravu @codefromthecrypt

Thanks for the quick replies! Will dig into this a bit more and hopefully provide some clarity.

@ParthSareen commented on GitHub (Nov 19, 2024): @raducoravu @codefromthecrypt Thanks for the quick replies! Will dig into this a bit more and hopefully provide some clarity.

GiteaMirror commented

2026-04-28 14:14:19 -05:00

@tzolov commented on GitHub (Nov 19, 2024):

@ParthSareen,
Based on our Spring AI implementation experience with various AI providers:
Because the tool_call messages require complete JSON content before processing, we pre-aggregate only the tool_call chunks into single messages, while keeping regular text responses streaming.
This satisfies both requirements - complete tool calls and streamed final responses.
It would have been nice if the providers did the JSON aggregation on the server/model side.

Note: Initially, we used to switch to non-streaming after detecting the first tool calls message, but user feedback favored keeping the final text responses streamed.

Ollama is an amazing tool! Looking forward to extending our Ollama Function Calling support with streaming when it's ready!

@tzolov commented on GitHub (Nov 19, 2024): @ParthSareen, Based on our [Spring AI](https://docs.spring.io/spring-ai/reference/api/functions.html) implementation experience with various AI providers: Because the tool_call messages require complete JSON content before processing, we pre-aggregate only the tool_call chunks into single messages, while keeping regular text responses streaming. This satisfies both requirements - complete tool calls and streamed final responses. It would have been nice if the providers did the JSON aggregation on the server/model side. Note: Initially, we used to switch to non-streaming after detecting the first tool calls message, but user feedback favored keeping the final text responses streamed. Ollama is an amazing tool! Looking forward to extending our [Ollama Function Calling](https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html#_function_calling) support with streaming when it's ready!

GiteaMirror commented

2026-04-28 14:14:21 -05:00

@edmcman commented on GitHub (Nov 19, 2024):

My situation is very similar to @raducoravu's. I have a chat and want to stream the chat results. I don't care about the tool results streaming. (It's hard to imagine an application where streaming the tool results is important...)

@edmcman commented on GitHub (Nov 19, 2024): My situation is very similar to @raducoravu's. I have a chat and want to stream the chat results. I don't care about the tool results streaming. (It's hard to imagine an application where streaming the tool results is important...)

GiteaMirror commented

2026-04-28 14:14:22 -05:00

@codefromthecrypt commented on GitHub (Nov 20, 2024):

PSA not everyone has the influence or skill to hunt down the calls sites that use a particular API. Sometimes they are buried in frameworks or otherwise not easy to change. In any case, it costs a significant amount of downstream effort to discover this limitation and then possibly make comments as we've seen. If we look at this issue, we can see a myriad of projects linking problems found to it.

What I'm curious about is related to other parts of openai. When there is incentive (I think we can agree there is incentive here, even if some arguments about the practice).. is it possible for someone to raise a PR and complete a change?

More and more products are normalizing on openai as a portability layer, and that doesn't mean each agrees with all the API decisions. I guess what I mean to say is how much stake is there in not completing this, or allowing it to be completed by someone else?

@codefromthecrypt commented on GitHub (Nov 20, 2024): PSA not everyone has the influence or skill to hunt down the calls sites that use a particular API. Sometimes they are buried in frameworks or otherwise not easy to change. In any case, it costs a significant amount of downstream effort to discover this limitation and then possibly make comments as we've seen. If we look at this issue, we can see a myriad of projects linking problems found to it. What I'm curious about is related to other parts of openai. When there is incentive (I think we can agree there is incentive here, even if some arguments about the practice).. is it possible for someone to raise a PR and complete a change? More and more products are normalizing on openai as a portability layer, and that doesn't mean each agrees with all the API decisions. I guess what I mean to say is how much stake is there in not completing this, or allowing it to be completed by someone else?

GiteaMirror commented

2026-04-28 14:14:25 -05:00

@ParthSareen commented on GitHub (Nov 20, 2024):

Hey everyone!

Thank you for raising some great points - we'll be working over the next little bit to get this in!

Still figuring out the exact details as it could potentially break some experiences. But this is definitely high up on my list - thankful for you all to bring it up. It's all about making the experience better for you all while having good engineering decisions.

@ParthSareen commented on GitHub (Nov 20, 2024): Hey everyone! Thank you for raising some great points - we'll be working over the next little bit to get this in! Still figuring out the exact details as it could potentially break some experiences. But this is definitely high up on my list - thankful for you all to bring it up. It's all about making the experience better for you all while having good engineering decisions.

GiteaMirror commented

2026-04-28 14:14:28 -05:00

@jackmpcollins commented on GitHub (Nov 20, 2024):

For any tool to be called you'd need the full response from the model to make the decision for which function to call and with what parameters. And since these are not user facing, one usually just waits for the response from the model to complete.

@ParthSareen I have a use case that would benefit from streaming the tool call arguments, like openai does. In https://github.com/jackmpcollins/magentic tool calling is used to generate structured outputs. When an iterable of structured objects X is requested, under the hood magentic submits a tool with return type list[X], and as the arguments are being streamed back each item is parsed out and yielded when it completes. The advantage of this approach is structured items can start being displayed in UI (or acted on in other ways) without waiting for the whole generation to have finished.

Some details about this in the docs here https://magentic.dev/streaming/#object-streaming with example code:

from collections.abc import Iterable
from time import time

from magentic import prompt
from pydantic import BaseModel


class Superhero(BaseModel):
    name: str
    age: int
    power: str
    enemies: list[str]


@prompt("Create a Superhero team named {name}.")
def create_superhero_team(name: str) -> Iterable[Superhero]: ...


start_time = time()
for hero in create_superhero_team("The Food Dudes"):
    print(f"{time() - start_time:.2f}s : {hero}")

# 2.23s : name='Pizza Man' age=30 power='Can shoot pizza slices from his hands' enemies=['The Hungry Horde', 'The Junk Food Gang']
# 4.03s : name='Captain Carrot' age=35 power='Super strength and agility from eating carrots' enemies=['The Sugar Squad', 'The Greasy Gang']
# 6.05s : name='Ice Cream Girl' age=25 power='Can create ice cream out of thin air' enemies=['The Hot Sauce Squad', 'The Healthy Eaters']

A similar approach is used in https://github.com/jxnl/instructor for the "partial responses" feature. More details in docs here https://python.useinstructor.com/concepts/partial/

@jackmpcollins commented on GitHub (Nov 20, 2024): > For any tool to be called you'd need the full response from the model to make the decision for which function to call and with what parameters. And since these are not user facing, one usually just waits for the response from the model to complete. @ParthSareen I have a use case that would benefit from streaming the tool call arguments, like openai does. In https://github.com/jackmpcollins/magentic tool calling is used to generate structured outputs. When an iterable of structured objects `X` is requested, under the hood magentic submits a tool with return type `list[X]`, and as the arguments are being streamed back each item is parsed out and yielded when it completes. The advantage of this approach is structured items can start being displayed in UI (or acted on in other ways) without waiting for the whole generation to have finished. Some details about this in the docs here https://magentic.dev/streaming/#object-streaming with example code: ```python from collections.abc import Iterable from time import time from magentic import prompt from pydantic import BaseModel class Superhero(BaseModel): name: str age: int power: str enemies: list[str] @prompt("Create a Superhero team named {name}.") def create_superhero_team(name: str) -> Iterable[Superhero]: ... start_time = time() for hero in create_superhero_team("The Food Dudes"): print(f"{time() - start_time:.2f}s : {hero}") # 2.23s : name='Pizza Man' age=30 power='Can shoot pizza slices from his hands' enemies=['The Hungry Horde', 'The Junk Food Gang'] # 4.03s : name='Captain Carrot' age=35 power='Super strength and agility from eating carrots' enemies=['The Sugar Squad', 'The Greasy Gang'] # 6.05s : name='Ice Cream Girl' age=25 power='Can create ice cream out of thin air' enemies=['The Hot Sauce Squad', 'The Healthy Eaters'] ``` A similar approach is used in https://github.com/jxnl/instructor for the "partial responses" feature. More details in docs here https://python.useinstructor.com/concepts/partial/

GiteaMirror commented

2026-04-28 14:14:31 -05:00

@lucaskatayama commented on GitHub (Nov 21, 2024):

Hey guys... Sorry didn't read the entire thread... But I think I am in the right thread.

I am trying to get langchain to receive chunks when using agents... basically I need ollama to accept stream when using tools...
I achieved that by doing the changes below:

Modify the Modelfile to prefix the tool response

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the exactly format below:
\f{"name": function name, "parameters": dictionary of argument name and its value}

Do not use variables
Do NOT forget the \f at the beginning.

Modify the ollama server to identify the prefix \f and join all chunks into a tool message response. Sending through stream
https://github.com/ollama/ollama/compare/main...lucaskatayama:ollama:feat/tool-stream?expand=1
Modify langchain_ollama . Change stream=False to stream=True

 if "tools" in kwargs:
            async for part in await self._async_client.chat( <<<<<<<<
                model=params["model"],
                messages=ollama_messages,
                stream=True, <<<<<<< 
                options=Options(**params["options"]),
                keep_alive=params["keep_alive"],
                format=params["format"],
                tools=kwargs["tools"],
            ) :
                yield part # type:ignore

if "tools" in kwargs:
            yield from self._client.chat( <<<<<<<<<<<<<
                model=params["model"],
                messages=ollama_messages,
                stream=True, <<<<<<<<<<<<<
                options=Options(**params["options"]),
                keep_alive=params["keep_alive"],
                format=params["format"],
                tools=kwargs["tools"],
            )

I am contributing with an idea. Don't know if thi is the right way..

@lucaskatayama commented on GitHub (Nov 21, 2024): Hey guys... Sorry didn't read the entire thread... But I think I am in the right thread. I am trying to get langchain to receive chunks when using agents... basically I need ollama to accept stream when using tools... I achieved that by doing the changes below: 1. Modify the Modelfile to prefix the tool response ``` Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. Respond in the exactly format below: \f{"name": function name, "parameters": dictionary of argument name and its value} Do not use variables Do NOT forget the \f at the beginning. ``` 2. Modify the ollama server to identify the prefix \f and join all chunks into a tool message response. Sending through stream https://github.com/ollama/ollama/compare/main...lucaskatayama:ollama:feat/tool-stream?expand=1 3. Modify langchain_ollama . Change `stream=False` to `stream=True` ``` if "tools" in kwargs: async for part in await self._async_client.chat( <<<<<<<< model=params["model"], messages=ollama_messages, stream=True, <<<<<<< options=Options(**params["options"]), keep_alive=params["keep_alive"], format=params["format"], tools=kwargs["tools"], ) : yield part # type:ignore ``` ``` if "tools" in kwargs: yield from self._client.chat( <<<<<<<<<<<<< model=params["model"], messages=ollama_messages, stream=True, <<<<<<<<<<<<< options=Options(**params["options"]), keep_alive=params["keep_alive"], format=params["format"], tools=kwargs["tools"], ) ```` I am contributing with an idea. Don't know if thi is the right way..

GiteaMirror commented

2026-04-28 14:14:32 -05:00

@edmcman commented on GitHub (Nov 21, 2024):

It would be best to not require changing the modelfiles...

On Thu, Nov 21, 2024 at 9:46 AM Lucas Katayama - @.***
@.***> wrote:

Hey guys... Sorry didn't read the entire thread... But I think I am in the
right thread.

I am trying to get langchain to receive chunks when using agents...
basically I need ollama to accept stream when using tools...
I achieved that by doing the changes below:

Modify the Modelfile to prefix the tool response

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the exactly format below:
\f{"name": function name, "parameters": dictionary of argument name and its value}

Do not use variables
Do NOT forget the \f at the beginning.

Modify the ollama server to identify the prefix \f and join all chunks
into a tool message response. Sending through stream

https://github.com/ollama/ollama/compare/main...lucaskatayama:ollama:feat/tool-stream?expand=1
3.

Modify langchain_ollama . Change stream=False to stream=True

if "tools" in kwargs:
async for part in await self._async_client.chat( <<<<<<<<
model=params["model"],
messages=ollama_messages,
stream=True, <<<<<<<
options=Options(**params["options"]),
keep_alive=params["keep_alive"],
format=params["format"],
tools=kwargs["tools"],
) :
yield part # type:ignore

if "tools" in kwargs:
yield from self._client.chat( <<<<<<<<<<<<<
model=params["model"],
messages=ollama_messages,
stream=True, <<<<<<<<<<<<<
options=Options(**params["options"]),
keep_alive=params["keep_alive"],
format=params["format"],
tools=kwargs["tools"],
)

I am contributing with an idea. Don't know if thi is the right way..

—
Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/5796#issuecomment-2491425669,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHYKZKQSTUVQTF2VWOL5OD2BXW5FAVCNFSM6AAAAABLE3PH5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJRGQZDKNRWHE
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

@edmcman commented on GitHub (Nov 21, 2024): It would be best to not require changing the modelfiles... On Thu, Nov 21, 2024 at 9:46 AM Lucas Katayama - ***@***.*** ***@***.***> wrote: > Hey guys... Sorry didn't read the entire thread... But I think I am in the > right thread. > > I am trying to get langchain to receive chunks when using agents... > basically I need ollama to accept stream when using tools... > I achieved that by doing the changes below: > > 1. Modify the Modelfile to prefix the tool response > > Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. > > Respond in the exactly format below: > \f{"name": function name, "parameters": dictionary of argument name and its value} > > Do not use variables > Do NOT forget the \f at the beginning. > > > 2. > > Modify the ollama server to identify the prefix \f and join all chunks > into a tool message response. Sending through stream > > https://github.com/ollama/ollama/compare/main...lucaskatayama:ollama:feat/tool-stream?expand=1 > 3. > > Modify langchain_ollama . Change stream=False to stream=True > > if "tools" in kwargs: > async for part in await self._async_client.chat( <<<<<<<< > model=params["model"], > messages=ollama_messages, > stream=True, <<<<<<< > options=Options(**params["options"]), > keep_alive=params["keep_alive"], > format=params["format"], > tools=kwargs["tools"], > ) : > yield part # type:ignore > > if "tools" in kwargs: > yield from self._client.chat( <<<<<<<<<<<<< > model=params["model"], > messages=ollama_messages, > stream=True, <<<<<<<<<<<<< > options=Options(**params["options"]), > keep_alive=params["keep_alive"], > format=params["format"], > tools=kwargs["tools"], > ) > > I am contributing with an idea. Don't know if thi is the right way.. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/5796#issuecomment-2491425669>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAHYKZKQSTUVQTF2VWOL5OD2BXW5FAVCNFSM6AAAAABLE3PH5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJRGQZDKNRWHE> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

GiteaMirror commented

2026-04-28 14:14:34 -05:00

@ParthSareen commented on GitHub (Nov 28, 2024):

Hey everyone! Thanks for being so patient :) New release just went out with streaming tool call support. Will ping some folks around the community so they don't have to work around it.

Appreciate all the insight for this issue! https://github.com/ollama/ollama/releases/tag/v0.4.6

Quick Notes:

Each chunk returned to the user will contain a tool call (if any)
Multiple tool calls can be returned in a streamed manner

@ParthSareen commented on GitHub (Nov 28, 2024): Hey everyone! Thanks for being so patient :) New release just went out with streaming tool call support. Will ping some folks around the community so they don't have to work around it. Appreciate all the insight for this issue! https://github.com/ollama/ollama/releases/tag/v0.4.6 Quick Notes: - Each chunk returned to the user will contain a tool call (if any) - Multiple tool calls can be returned in a streamed manner

GiteaMirror commented

2026-04-28 14:14:35 -05:00

@edmcman commented on GitHub (Nov 28, 2024):

Thank you!

On Wed, Nov 27, 2024, 9:41 PM Parth Sareen - @.***
@.***> wrote:

Hey everyone! Thanks for being so patient :) New release just went out
with streaming tool call support. Will ping some folks around the community
so they don't have to work around it.

Appreciate all the insight for this issue!
https://github.com/ollama/ollama/releases/tag/v0.4.6

—
Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/5796#issuecomment-2505152208,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHYKZKVJSBHBE33V33BBUD2CZ7EJAVCNFSM6AAAAABLE3PH5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBVGE2TEMRQHA
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

@edmcman commented on GitHub (Nov 28, 2024): Thank you! On Wed, Nov 27, 2024, 9:41 PM Parth Sareen - ***@***.*** ***@***.***> wrote: > Hey everyone! Thanks for being so patient :) New release just went out > with streaming tool call support. Will ping some folks around the community > so they don't have to work around it. > > Appreciate all the insight for this issue! > https://github.com/ollama/ollama/releases/tag/v0.4.6 > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/5796#issuecomment-2505152208>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAHYKZKVJSBHBE33V33BBUD2CZ7EJAVCNFSM6AAAAABLE3PH5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBVGE2TEMRQHA> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

GiteaMirror commented

2026-04-28 14:14:38 -05:00

@jackmpcollins commented on GitHub (Nov 29, 2024):

@ParthSareen I opened an issue for adding the index to each tool call as this breaks compatibility for some use cases. https://github.com/ollama/ollama/issues/7881 Otherwise it is working well! Thank you

@jackmpcollins commented on GitHub (Nov 29, 2024): @ParthSareen I opened an issue for adding the `index` to each tool call as this breaks compatibility for some use cases. https://github.com/ollama/ollama/issues/7881 Otherwise it is working well! Thank you

GiteaMirror commented

2026-04-28 14:14:38 -05:00

@ParthSareen commented on GitHub (Nov 29, 2024):

@ParthSareen I opened an issue for adding the index to each tool call as this breaks compatibility for some use cases. https://github.com/ollama/ollama/issues/7881 Otherwise it is working well! Thank you

Ahh dang must have missed that field! Will add in the AM thanks for the ping!

@ParthSareen commented on GitHub (Nov 29, 2024): > @ParthSareen I opened an issue for adding the `index` to each tool call as this breaks compatibility for some use cases. https://github.com/ollama/ollama/issues/7881 Otherwise it is working well! Thank you Ahh dang must have missed that field! Will add in the AM thanks for the ping!

GiteaMirror commented

2026-04-28 14:14:42 -05:00

@Rizaldy commented on GitHub (Nov 29, 2024):

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
  "model": "qwen2.5:latest",
  "messages": [
    {
      "role": "user",
      "content": "What is your name? please explain in detail in 2 paragraph"
    }
  ],
  "stream": true,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            },
            "format": {
              "type": "string",
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location", "format"]
        }
      }
    }
  ]
}'

data: {"id":"chatcmpl-635","object":"chat.completion.chunk","created":1732905314,"model":"qwen2.5:latest","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"I am Qwen, an AI assistant created by Alibaba Cloud. My purpose is to assist and interact with users like you to provide information, answer questions, generate content, and perform various tasks. I don't have a personal identity or name outside of this context, but \"Qwen\" serves as my identifier for the purposes of communication. If you have any specific inquiries or need help with something, feel free to ask!"},"finish_reason":"stop"}]}

data: [DONE]

Hi @ParthSareen want to add feedback after updated Ollama to 0.4.6, as the above:

Add tool on the payload
Ask non related tools
Qwen answer with conversational
the response is not stream in chunk but in one content

But if I remove tools from payload it stream like normal. Just want to know if this is a new implementations or something is missing?

@Rizaldy commented on GitHub (Nov 29, 2024): ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen2.5:latest", "messages": [ { "role": "user", "content": "What is your name? please explain in detail in 2 paragraph" } ], "stream": true, "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the weather for, e.g. San Francisco, CA" }, "format": { "type": "string", "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "format"] } } } ] }' data: {"id":"chatcmpl-635","object":"chat.completion.chunk","created":1732905314,"model":"qwen2.5:latest","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"I am Qwen, an AI assistant created by Alibaba Cloud. My purpose is to assist and interact with users like you to provide information, answer questions, generate content, and perform various tasks. I don't have a personal identity or name outside of this context, but \"Qwen\" serves as my identifier for the purposes of communication. If you have any specific inquiries or need help with something, feel free to ask!"},"finish_reason":"stop"}]} data: [DONE] ``` Hi @ParthSareen want to add feedback after updated Ollama to 0.4.6, as the above: 1. Add tool on the payload 2. Ask non related tools 3. Qwen answer with conversational 4. the response is not `stream in chunk` but in one `content` But if I remove `tools` from payload it stream like normal. Just want to know if this is a new implementations or something is missing?

GiteaMirror commented

2026-04-28 14:14:44 -05:00

@ParthSareen commented on GitHub (Nov 30, 2024):

@Rizaldy Yes in streaming mode this is expected. Essentially we don't know when a tool is going to come back from a model. If there is a toolcall present, the content should be removed and only the call should be sent back, and if there is no toolcall we should return whatever content was returned by the model. Hope this helps!

@ParthSareen commented on GitHub (Nov 30, 2024): @Rizaldy Yes in streaming mode this is expected. Essentially we don't know when a tool is going to come back from a model. If there is a toolcall present, the content should be removed and only the call should be sent back, and if there is no toolcall we should return whatever content was returned by the model. Hope this helps!

GiteaMirror commented

2026-04-28 14:14:46 -05:00

@RippinRocket commented on GitHub (Nov 30, 2024):

@ParthSareen I'm seeing the same thing using the python library. If I don't include tools in the payload, the response content streams across token by token. If I do include tools, I get a streamed response but it contains the full response content in one go instead of token by token.

@RippinRocket commented on GitHub (Nov 30, 2024): @ParthSareen I'm seeing the same thing using the python library. If I don't include tools in the payload, the response content streams across token by token. If I do include tools, I get a streamed response but it contains the full response content in one go instead of token by token.

GiteaMirror commented

2026-04-28 14:14:47 -05:00

@ParthSareen commented on GitHub (Nov 30, 2024):

Hey @Rizaldy @RippinRocket,

We wanted to get a quick implementation out to unblock people on this. Will scope something in to eventually identify a bit earlier whether tool calls are coming back or not and then stream rest of the response out (tracking in: https://github.com/ollama/ollama/issues/7886).

In the meantime I'd recommend to pass tools in when needed and less for chatting - especially with small models as they overfit to sending tool responses back anyways. Appreciate y'all raising this!

@ParthSareen commented on GitHub (Nov 30, 2024): Hey @Rizaldy @RippinRocket, We wanted to get a quick implementation out to unblock people on this. Will scope something in to eventually identify a bit earlier whether tool calls are coming back or not and then stream rest of the response out (tracking in: https://github.com/ollama/ollama/issues/7886). In the meantime I'd recommend to pass tools in when needed and less for chatting - especially with small models as they overfit to sending tool responses back anyways. Appreciate y'all raising this!

GiteaMirror commented

2026-04-28 14:14:48 -05:00

@saivishwak commented on GitHub (Feb 28, 2025):

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is your name? please explain in detail in 2 paragraph"
    }
  ],
  "stream": true,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            },
            "format": {
              "type": "string",
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location", "format"]
        }
      }
    }
  ]
}'



data: {"id":"chatcmpl-432","object":"chat.completion.chunk","created":1740745072,"model":"llama3.2","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"","tool_calls":[{"id":"call_zd4aa4tf","index":0,"type":"function","function":{"name":"get_current_weather","arguments":"{\"format\":\"none\",\"location\":\"assistant AI system\"}"}}]},"finish_reason":null}]}

data: {"id":"chatcmpl-432","object":"chat.completion.chunk","created":1740745072,"model":"llama3.2","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]}

data: [DONE]

When using tools and stream, the reponse has tools_call even when the query is non tool related, Is this expected issue?

@saivishwak commented on GitHub (Feb 28, 2025): ```sh curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "What is your name? please explain in detail in 2 paragraph" } ], "stream": true, "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the weather for, e.g. San Francisco, CA" }, "format": { "type": "string", "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "format"] } } } ] }' data: {"id":"chatcmpl-432","object":"chat.completion.chunk","created":1740745072,"model":"llama3.2","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"","tool_calls":[{"id":"call_zd4aa4tf","index":0,"type":"function","function":{"name":"get_current_weather","arguments":"{\"format\":\"none\",\"location\":\"assistant AI system\"}"}}]},"finish_reason":null}]} data: {"id":"chatcmpl-432","object":"chat.completion.chunk","created":1740745072,"model":"llama3.2","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]} data: [DONE] ``` When using tools and stream, the reponse has tools_call even when the query is non tool related, Is this expected issue?

GiteaMirror commented

2026-04-28 14:14:49 -05:00

@ParthSareen commented on GitHub (Mar 3, 2025):

Hey @saivishwak, this is just model behavior. Smaller models when provided tools tend to lean towards making tool calls than not. If you're constrained to using small models, I'd recommend adding a client to manage the other responses.

@ParthSareen commented on GitHub (Mar 3, 2025): Hey @saivishwak, this is just model behavior. Smaller models when provided tools tend to lean towards making tool calls than not. If you're constrained to using small models, I'd recommend adding a client to manage the other responses.

GiteaMirror commented

2026-04-28 14:14:49 -05:00

@edmcman commented on GitHub (Mar 3, 2025):

@ParthSareen wrote:

Smaller models when provided tools tend to lean towards making tool calls than not.

This may be true, but it's certainly not the only thing going on here. Ollama is using a poorly performing prompt template. See https://edmcman.github.io/blog/2025-02-21--lang-chain-and-ollama-make-building-local-tool-calling-agents-easy-it-s-a-shame-they-don-t-work-part-2/ -- very curious to hear your thoughts.

@saivishwak You might want to try https://ollama.com/ejschwar/llama3.2-better-prompts or use a different host than Ollama. You can test llama 3.2 on groq pretty easily for free. For instance, I found that on Ollama, llama3.2 regularly responds to "Hello" with a tool call. On groq/llama.cpp, it does not. I believe it all boils down to the prompt template.

@edmcman commented on GitHub (Mar 3, 2025): @ParthSareen wrote: > Smaller models when provided tools tend to lean towards making tool calls than not. This may be true, but it's certainly not the only thing going on here. Ollama is using a poorly performing prompt template. See https://edmcman.github.io/blog/2025-02-21--lang-chain-and-ollama-make-building-local-tool-calling-agents-easy-it-s-a-shame-they-don-t-work-part-2/ -- very curious to hear your thoughts. @saivishwak You might want to try https://ollama.com/ejschwar/llama3.2-better-prompts or use a different host than Ollama. You can test llama 3.2 on groq pretty easily for free. For instance, I found that on Ollama, `llama3.2` regularly responds to "Hello" with a tool call. On groq/llama.cpp, it does not. I believe it all boils down to the prompt template.

GiteaMirror commented

2026-04-28 14:14:50 -05:00

@ParthSareen commented on GitHub (Mar 3, 2025):

@edmcman Cool work on hacking on the template! Llama3.2 uses a python function for which we don't parse as of yet. It probably explains some of the difference in behavior.

@ParthSareen commented on GitHub (Mar 3, 2025): @edmcman Cool work on hacking on the template! Llama3.2 uses a python function for which we don't parse as of yet. It probably explains some of the difference in behavior.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-claude-channels-env

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#50124