[GH-ISSUE #8993] Question about how Ollama feeds structured_output into the model #5841

Closed
opened 2026-04-12 17:10:54 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Sergic-Cell on GitHub (Feb 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8993

Tried getting an answer through discord, trying to see if I have better luck here. I want to ask a question for my understanding:

I'm running the latest version of Ollama (0.5.7) and latest version of python api (0.4.7). I am on windows.
I clarification on how to verify that the raw json format/schema is given to and received by the server.
Using llama3.1, you should be able to using tool calls and structured outputs.
I understand how tool calls are received and how to verify them by looking at the raw prompt sent to the server when running in CMD line with OLLAMA_DEBUG=1.
When the python client makes a tool call to Ollama, the tools are passed into the tools property in the ChatRequest class along with the messages and model.

ChatRequest(
   model=model,
   messages=[message for message in copy_messages(messages),
   tools=[tool for tool in _copy_tools(tools)],
   stream=stream,
   format=format,
   options=options,
   keep_alive=keep_alive
)

For llama3.1, tools and messages are formatted into the raw prompt like in this example:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the orginal use question.<|eot_id|><|start_header_id|>user<|end_header_id|>

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{
    "type": "function",
    "function": {
    "name": "get_current_conditions",
    "description": "Get the current weather conditions for a specific location",
    "parameters": {
        "type": "object",
        "properties": {
        "location": {
            "type": "string",
            "description": "The city and state, e.g., San Francisco, CA"
        },
        "unit": {
            "type": "string",
            "enum": ["Celsius", "Fahrenheit"],
            "description": "The temperature unit to use. Infer this from the user's location."
        }
        },
        "required": ["location", "unit"]
    }
    }
}

Question: what is the weather like in San Fransisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Makes sense how the model is able to call tools then, which essentially is a "structured_output":

<|start_header_id|>assistant<|end_header_id|>
{"name": "get_current_conditions", "parameters": {"location": "San Francisco, CA", "unit": "Fahrenheit"}}
<|eot_id|>

But when forgoing tools in lieu of a json_schema, we forego provided tools in the ChatRequest and instead pass the json schema in the format property. Okay. However this json format is not represented in the raw prompt of the model, at least through the server console. If the format in not specified in the prompt, where is it given? Does llama3.1 take formats from another input stream that isn't the prompt? If so, how can we verify this in the server logs?

Thanks

Originally created by @Sergic-Cell on GitHub (Feb 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8993 Tried getting an answer through discord, trying to see if I have better luck here. I want to ask a question for my understanding: I'm running the latest version of Ollama (0.5.7) and latest version of python api (0.4.7). I am on windows. I clarification on how to verify that the raw json format/schema is given to and received by the server. Using llama3.1, you should be able to using tool calls and structured outputs. I understand how tool calls are received and how to verify them by looking at the raw prompt sent to the server when running in CMD line with OLLAMA_DEBUG=1. When the python client makes a tool call to Ollama, the tools are passed into the tools property in the ChatRequest class along with the messages and model. ``` ChatRequest( model=model, messages=[message for message in copy_messages(messages), tools=[tool for tool in _copy_tools(tools)], stream=stream, format=format, options=options, keep_alive=keep_alive ) ``` For llama3.1, tools and messages are formatted into the raw prompt like in this example: ``` <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the orginal use question.<|eot_id|><|start_header_id|>user<|end_header_id|> Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. { "type": "function", "function": { "name": "get_current_conditions", "description": "Get the current weather conditions for a specific location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g., San Francisco, CA" }, "unit": { "type": "string", "enum": ["Celsius", "Fahrenheit"], "description": "The temperature unit to use. Infer this from the user's location." } }, "required": ["location", "unit"] } } } Question: what is the weather like in San Fransisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|> ``` Makes sense how the model is able to call tools then, which essentially is a "structured_output": ``` <|start_header_id|>assistant<|end_header_id|> {"name": "get_current_conditions", "parameters": {"location": "San Francisco, CA", "unit": "Fahrenheit"}} <|eot_id|> ``` But when forgoing tools in lieu of a json_schema, we forego provided tools in the ChatRequest and instead pass the json schema in the format property. Okay. However this json format is not represented in the raw prompt of the model, at least through the server console. If the format in not specified in the prompt, where is it given? Does llama3.1 take formats from another input stream that isn't the prompt? If so, how can we verify this in the server logs? Thanks
GiteaMirror added the question label 2026-04-12 17:10:54 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 10, 2025):

It's not recorded in the logs. Currently you need to monitor the connection between the server and the runner using something like tcpflow or tcpdump.

<!-- gh-comment-id:2648300488 --> @rick-github commented on GitHub (Feb 10, 2025): It's not recorded in the logs. Currently you need to monitor the connection between the server and the runner using something like `tcpflow` or `tcpdump`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5841