[GH-ISSUE #11037] Providing JSON Schema results in ignoring tool calls #7280

Closed
opened 2026-04-12 19:19:46 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @tinkerctl on GitHub (Jun 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11037

What is the issue?

Known models affected: llama3.1:8b, qwen3:14b

Prompt: list all my repositories. return a json object as the response.
Result schema provided:

{
  "properties": {
    "repositories": {
      "description": "The result of repositories from the function call",
      "type": "array"
    }
  },
  "type": "object"
}

Example result w/ result schema (not correct, no tool calls requested):

{"repositories":[{"name": "my_repo_1"},{"name": "my_repo_2"}]}

Example result without result schema (correct, tool calls requested):

```json
{
  "repositories": [
    {
      "id": 1,
      "name": "foo",
      "full_name": "myorg/foo",
      "description": "",
      "private": false,
    }
  ]
}
\```

Depending on the model, I'll either get an empty json response or a hallucination. When dropping the schema, tool calls are properly requested. I'll try to take some time to recreate with curl at a later date so we have something easier to recreate with.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.9.0

Originally created by @tinkerctl on GitHub (Jun 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11037 ### What is the issue? Known models affected: `llama3.1:8b`, `qwen3:14b` Prompt: `list all my repositories. return a json object as the response.` Result schema provided: ```json { "properties": { "repositories": { "description": "The result of repositories from the function call", "type": "array" } }, "type": "object" } ``` Example result w/ result schema (not correct, no tool calls requested): ```json {"repositories":[{"name": "my_repo_1"},{"name": "my_repo_2"}]} ``` Example result without result schema (correct, tool calls requested): ``` ```json { "repositories": [ { "id": 1, "name": "foo", "full_name": "myorg/foo", "description": "", "private": false, } ] } \``` ``` Depending on the model, I'll either get an empty json response or a hallucination. When dropping the schema, tool calls are properly requested. I'll try to take some time to recreate with curl at a later date so we have something easier to recreate with. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.9.0
GiteaMirror added the bug label 2026-04-12 19:19:46 -05:00
Author
Owner

@tinkerctl commented on GitHub (Jun 10, 2025):

Tested on 0.8.0 and 0.7.1. Same result.

<!-- gh-comment-id:2960371592 --> @tinkerctl commented on GitHub (Jun 10, 2025): Tested on `0.8.0` and `0.7.1`. Same result.
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

If you are sending the schema along with the tool list and the prompt "list all my repositories. return a json object as the response", then you are asking the model to make the tool calls conform to the schema of the response, which is not what you want. You need to send the schema with the output of the tool calls.

#!/usr/bin/env python3

import ollama

from pydantic import BaseModel
from typing import List

model = "qwen3:14b"
prompt = "list all my repositories. return a json object as the response"

class RepoList(BaseModel):
  repositories: List[str]

def list_repositories():
  """
    List all repositories
  """
  return ["my_repo_1", "my_repo_2"]

tools = [ list_repositories ]
toolmap = {f.__name__:f for f in tools}

messages = [{"role":"user","content":prompt}]

response = ollama.chat(model=model, messages=messages,tools=tools,stream=False)

if response.message.tool_calls:
  for tool in response.message.tool_calls:
    if func := toolmap.get(tool.function.name):
      output = func(**tool.function.arguments)
      messages.append({"role":"tool", "content":str(output), "name":tool.function.name})
  response = ollama.chat(model=model, messages=messages, stream=False, format=RepoList.model_json_schema())

print(response.message)
$ ./11037.py
role='assistant' content='{"repositories": ["my_repo_1", "my_repo_2"]}' thinking=None images=None tool_calls=None
<!-- gh-comment-id:2960429705 --> @rick-github commented on GitHub (Jun 10, 2025): If you are sending the schema along with the tool list and the prompt "list all my repositories. return a json object as the response", then you are asking the model to make the tool calls conform to the schema of the response, which is not what you want. You need to send the schema with the output of the tool calls. ```python #!/usr/bin/env python3 import ollama from pydantic import BaseModel from typing import List model = "qwen3:14b" prompt = "list all my repositories. return a json object as the response" class RepoList(BaseModel): repositories: List[str] def list_repositories(): """ List all repositories """ return ["my_repo_1", "my_repo_2"] tools = [ list_repositories ] toolmap = {f.__name__:f for f in tools} messages = [{"role":"user","content":prompt}] response = ollama.chat(model=model, messages=messages,tools=tools,stream=False) if response.message.tool_calls: for tool in response.message.tool_calls: if func := toolmap.get(tool.function.name): output = func(**tool.function.arguments) messages.append({"role":"tool", "content":str(output), "name":tool.function.name}) response = ollama.chat(model=model, messages=messages, stream=False, format=RepoList.model_json_schema()) print(response.message) ``` ```console $ ./11037.py role='assistant' content='{"repositories": ["my_repo_1", "my_repo_2"]}' thinking=None images=None tool_calls=None ```
Author
Owner

@tinkerctl commented on GitHub (Jun 10, 2025):

I see. If that's the case, that would mean I need to know ahead of time that the model is not going to request additional tool calls ahead of time.

I'm new to functions, so please correct me if I'm wrong, but isn't that a bit limiting? That would limit the amount of round trips to a depth of 1, instead of allowing the model to ask for more information.

EDIT: Thank you so much for taking the time to provide a working example!

<!-- gh-comment-id:2960489219 --> @tinkerctl commented on GitHub (Jun 10, 2025): I see. If that's the case, that would mean I need to know ahead of time that the model is not going to request additional tool calls ahead of time. I'm new to functions, so please correct me if I'm wrong, but isn't that a bit limiting? That would limit the amount of round trips to a depth of 1, instead of allowing the model to ask for more information. EDIT: Thank you so much for taking the time to provide a working example!
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

There are a couple of ways to deal with this, with some caveats.

First is to create a function that will fulfill all of the requirements of a tool call in one shot. For example, if you want to modify a file and then get its size, instead of defining two tools tools = [ modify_file, get_file_size ], you create a single function tools = [ modify_file_and_get_file_size ].

The second is to accumulate the results of tool calls until the model stops asking. The penultimate call returns without a tool call, and then the last call is made to allow the model to synthesize an answer from the previous calls.

--- 11037.py.orig	2025-06-10 22:49:19.491590165 +0200
+++ 11037.py	2025-06-10 22:50:43.627826277 +0200
@@ -22,13 +22,14 @@
 
 messages = [{"role":"user","content":prompt}]
 
-response = ollama.chat(model=model, messages=messages,tools=tools,stream=False)
-
-if response.message.tool_calls:
+while True:
+  response = ollama.chat(model=model, messages=messages,tools=tools,stream=False)
+  if not response.message.tool_calls:
+    break
   for tool in response.message.tool_calls:
     if func := toolmap.get(tool.function.name):
       output = func(**tool.function.arguments)
       messages.append({"role":"tool", "content":str(output), "name":tool.function.name})
-  response = ollama.chat(model=model, messages=messages, stream=False, format=RepoList.model_json_schema())
 
+response = ollama.chat(model=model, messages=messages, stream=False, format=RepoList.model_json_schema())
 print(response.message)

The problem here is that while tool use is a pretty neat application of LLM technology, models can be unreliable, especially if they are small and only moderately good at tool use, which covers most of the freely available models (ie, models from ollama and HF). The more complex the description of a tool, the easier it is for the model to not use it properly. Similarly, when making repeated tool calls, the client should enact some sanity checking to ensure that the model hasn't fallen into an infinite loop, eg by breaking out of the loop if a tool call is repeated. The unreliability of a model can be mitigated somewhat by using models that are specifically trained for tool use, eg firefunction or xlam-2. The downside is that models tend to be large and less suitable for general purpose inferences.

The upshot is that when using models for tools, there are a number of factors to taken into account: prompting, function design, tool call result verification, tool call retries in error cases, result aggregation, etc. It's currently not a point-and-shoot operation. There are frameworks that do some of this management, and allow integration into agentic workflows, which is a big user of tool enabled models. It's all pretty neat but if you want to follow down this path, be prepared for some frustration.

<!-- gh-comment-id:2960599353 --> @rick-github commented on GitHub (Jun 10, 2025): There are a couple of ways to deal with this, with some caveats. First is to create a function that will fulfill all of the requirements of a tool call in one shot. For example, if you want to modify a file and then get its size, instead of defining two tools `tools = [ modify_file, get_file_size ]`, you create a single function `tools = [ modify_file_and_get_file_size ]`. The second is to accumulate the results of tool calls until the model stops asking. The penultimate call returns without a tool call, and then the last call is made to allow the model to synthesize an answer from the previous calls. ```diff --- 11037.py.orig 2025-06-10 22:49:19.491590165 +0200 +++ 11037.py 2025-06-10 22:50:43.627826277 +0200 @@ -22,13 +22,14 @@ messages = [{"role":"user","content":prompt}] -response = ollama.chat(model=model, messages=messages,tools=tools,stream=False) - -if response.message.tool_calls: +while True: + response = ollama.chat(model=model, messages=messages,tools=tools,stream=False) + if not response.message.tool_calls: + break for tool in response.message.tool_calls: if func := toolmap.get(tool.function.name): output = func(**tool.function.arguments) messages.append({"role":"tool", "content":str(output), "name":tool.function.name}) - response = ollama.chat(model=model, messages=messages, stream=False, format=RepoList.model_json_schema()) +response = ollama.chat(model=model, messages=messages, stream=False, format=RepoList.model_json_schema()) print(response.message) ``` The problem here is that while tool use is a pretty neat application of LLM technology, models can be unreliable, especially if they are small and only moderately good at tool use, which covers most of the freely available models (ie, models from ollama and HF). The more complex the description of a tool, the easier it is for the model to not use it properly. Similarly, when making repeated tool calls, the client should enact some sanity checking to ensure that the model hasn't fallen into an infinite loop, eg by breaking out of the loop if a tool call is repeated. The unreliability of a model can be mitigated somewhat by using models that are specifically trained for tool use, eg [firefunction](https://ollama.com/library/firefunction-v2) or [xlam-2](https://huggingface.co/Salesforce/Llama-xLAM-2-70b-fc-r). The downside is that models tend to be large and less suitable for general purpose inferences. The upshot is that when using models for tools, there are a number of factors to taken into account: prompting, function design, tool call result verification, tool call retries in error cases, result aggregation, etc. It's currently not a point-and-shoot operation. There are frameworks that do some of this management, and allow integration into agentic workflows, which is a big user of tool enabled models. It's all pretty neat but if you want to follow down this path, be prepared for some frustration.
Author
Owner

@tinkerctl commented on GitHub (Jun 10, 2025):

Ah, I didn't think about just incurring the cost of an additional call to the model to have it reformat everything. Thank you.

Similarly, when making repeated tool calls, the client should enact some sanity checking to ensure that the model hasn't fallen into an infinite loop

I intend to handle this with Go's context package (timeout and perhaps some new "max call to llm" logic).

As a newcomer, I do think there's either a slight bug or opportunity for doc uplift. If I provide a schema to the format field, which is currently documented to only affect the response format (https://github.com/ollama/ollama/blob/main/api/types.go#L72:L73), then I do not think that should have an impact on the already defined schema in the Tools field (https://github.com/ollama/ollama/blob/main/api/types.go#L228-L239).

<!-- gh-comment-id:2960625383 --> @tinkerctl commented on GitHub (Jun 10, 2025): Ah, I didn't think about just incurring the cost of an additional call to the model to have it reformat everything. Thank you. > Similarly, when making repeated tool calls, the client should enact some sanity checking to ensure that the model hasn't fallen into an infinite loop I intend to handle this with Go's context package (timeout and perhaps some new "max call to llm" logic). As a newcomer, I do think there's either a slight bug or opportunity for doc uplift. If I provide a schema to the format field, which is currently documented to only affect the response format (https://github.com/ollama/ollama/blob/main/api/types.go#L72:L73), then I do not think that should have an impact on the already defined schema in the Tools field (https://github.com/ollama/ollama/blob/main/api/types.go#L228-L239).
Author
Owner

@rick-github commented on GitHub (Jun 10, 2025):

It doesn't directly affect the schema of the Tools field. What a format does is create a GBNF that modifies the token probability function such that generated tokens must match the GBNF. So what you were doing earlier was preventing the model from generating tokens that matched the Tools field, because they were limited to tokens defined by the result schema. This is an application of structured outputs. You're right that the impact on tool use is not explicitly called out, but it's understood that using structured outputs constrains the response from a model.

<!-- gh-comment-id:2960644200 --> @rick-github commented on GitHub (Jun 10, 2025): It doesn't directly affect the schema of the Tools field. What a `format` does is create a GBNF that modifies the token probability function such that generated tokens must match the GBNF. So what you were doing earlier was preventing the model from generating tokens that matched the Tools field, because they were limited to tokens defined by the result schema. This is an application of [structured outputs](https://ollama.com/blog/structured-outputs). You're right that the impact on tool use is not explicitly called out, but it's understood that using structured outputs constrains the response from a model.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7280