[GH-ISSUE #6473] OpenAI Structured Output Compatability #50584

Closed
opened 2026-04-28 16:26:35 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @jd-solanki on GitHub (Aug 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6473

Hi 👋🏻

Loving ollama always ❤️

I'm eager to use newly released structured output using ollama but it looks like ollama doesn't have compatibility yet so I can just put base_url and I'll get response from my local LLM.

Also, I would like will it support streaming like instructor?

Thanks.

Originally created by @jd-solanki on GitHub (Aug 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6473 Hi 👋🏻 Loving ollama always ❤️ I'm eager to use newly released structured output using ollama but it looks like ollama doesn't have compatibility yet so I can just put base_url and I'll get response from my local LLM. Also, I would like will it support [streaming](https://python.useinstructor.com/concepts/partial/) like instructor? Thanks.
GiteaMirror added the feature request label 2026-04-28 16:26:35 -05:00
Author
Owner

@vish0l commented on GitHub (Aug 23, 2024):

We can include it, but it will only function with models that support structured output.

<!-- gh-comment-id:2306810475 --> @vish0l commented on GitHub (Aug 23, 2024): We can include it, but it will only function with models that support structured output.
Author
Owner

@codefromthecrypt commented on GitHub (Aug 26, 2024):

not sure if this is helpful, but I see a PR for langchaingo's openai client https://github.com/tmc/langchaingo/pull/986

<!-- gh-comment-id:2309253328 --> @codefromthecrypt commented on GitHub (Aug 26, 2024): not sure if this is helpful, but I see a PR for langchaingo's openai client https://github.com/tmc/langchaingo/pull/986
Author
Owner

@wassimMsw commented on GitHub (Sep 11, 2024):

We can include it, but it will only function with models that support structured output.

llama3.1 supports it as mentioned here. For it to work it requires adding the following to the prompt:

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

A solution is to generate the addition mentioned above out of response_format parameter in OpenAI chat completion API. Will you consider doing this?

Example of OpenAI response format API:

client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "math_response",
            "schema": {
                "type": "object",
                "properties": {
                    "steps": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "explanation": {"type": "string"},
                                "output": {"type": "string"}
                            },
                            "required": ["explanation", "output"],
                            "additionalProperties": False
                        }
                    },
                    "final_answer": {"type": "string"}
                },
                "required": ["steps", "final_answer"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
    )

<!-- gh-comment-id:2343862015 --> @wassimMsw commented on GitHub (Sep 11, 2024): > We can include it, but it will only function with models that support structured output. llama3.1 supports it as mentioned [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1#json-based-tool-calling). For it to work it requires adding the following to the prompt: ``` Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. ``` A solution is to generate the addition mentioned above out of `response_format` parameter in OpenAI chat completion API. Will you consider doing this? Example of OpenAI response format API: ``` client.chat.completions.create( model="gpt-4o-2024-08-06", messages=[ {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."}, {"role": "user", "content": "how can I solve 8x + 7 = -23"} ], response_format={ "type": "json_schema", "json_schema": { "name": "math_response", "schema": { "type": "object", "properties": { "steps": { "type": "array", "items": { "type": "object", "properties": { "explanation": {"type": "string"}, "output": {"type": "string"} }, "required": ["explanation", "output"], "additionalProperties": False } }, "final_answer": {"type": "string"} }, "required": ["steps", "final_answer"], "additionalProperties": False }, "strict": True } } ) ```
Author
Owner

@cpfiffer commented on GitHub (Oct 16, 2024):

We can include it, but it will only function with models that support structured output.

This is not a requirement for structured output.

Outlines supports any open-weight model, and you could easily turn Ollama into an OpenAI-compatible structured output server with more functionality than OpenAI's endpoint.

We're happy to help implement it. We've recently re-written our backend into Rust, so it should not be outrageously difficult to bolt in. I get the sense that there's quite a lot of demand for it -- I've heard as much in private, in public, on issues here, etc.

I think at this point we need serious discussion with the Ollama team about how it would look in the internals.

<!-- gh-comment-id:2417962669 --> @cpfiffer commented on GitHub (Oct 16, 2024): > We can include it, but it will only function with models that support structured output. This is not a requirement for structured output. [Outlines](https://github.com/dottxt-ai/outlines) supports any open-weight model, and you could easily turn Ollama into an OpenAI-compatible structured output server with more functionality than OpenAI's endpoint. We're happy to help implement it. We've recently re-written our backend into Rust, so it should not be outrageously difficult to bolt in. I get the sense that there's quite a lot of demand for it -- I've heard as much in private, in public, on issues here, etc. I think at this point we need serious discussion with the Ollama team about how it would look in the internals.
Author
Owner

@ashwinb commented on GitHub (Oct 22, 2024):

+1 for having this. And yes, there is no reason why this needs to be limited to only models supporting structured output as @cpfiffer mentions above.

Why we on llama-stack are interested? See https://github.com/meta-llama/llama-stack/pull/281

<!-- gh-comment-id:2428395997 --> @ashwinb commented on GitHub (Oct 22, 2024): +1 for having this. And yes, there is no reason why this needs to be limited to only models supporting structured output as @cpfiffer mentions above. Why we on llama-stack are interested? See https://github.com/meta-llama/llama-stack/pull/281
Author
Owner

@mitar commented on GitHub (Oct 22, 2024):

There is no special need in models to support structured output. Any model can be used with structured output and for example PR #5348 adds such support to Ollama. The question is only where do you do JSON Schema -> grammar conversion, inside Ollama or outside it. And then you have to pass grammar to select tokens to llama.cpp (or another model runtime supporting grammars).

The issue with opening grammars as a whole is that bad grammars can have quite a negative impact on both generation speed and quality. This is why most decide to open only JSON Schema as an API surface. (And even that with heavy limitations on supported JSON Schema, like OpenAI has, BTW.)

<!-- gh-comment-id:2428419613 --> @mitar commented on GitHub (Oct 22, 2024): There is no special need in models to support structured output. Any model can be used with structured output and for example PR #5348 adds such support to Ollama. The question is only where do you do JSON Schema -> grammar conversion, inside Ollama or outside it. And then you have to pass grammar to select tokens to llama.cpp (or another model runtime supporting grammars). The issue with opening grammars as a whole is that bad grammars can have quite a negative impact on both generation speed and quality. This is why most decide to open only JSON Schema as an API surface. (And even that with heavy limitations on supported JSON Schema, like OpenAI has, BTW.)
Author
Owner

@nadeesha commented on GitHub (Nov 30, 2024):

There is no special need in models to support structured output

Ok, this is fair. I implemented my own around ollama with zod and async-retry at the application level. It's somewhat trivial. Guide for anyone else wanting to do the same.

<!-- gh-comment-id:2509345599 --> @nadeesha commented on GitHub (Nov 30, 2024): > There is no special need in models to support structured output Ok, this is fair. I implemented my own around ollama with `zod` and `async-retry` at the application level. It's somewhat trivial. [Guide](https://www.inferable.ai/blog/posts/llm-json-parser-structured-output) for anyone else wanting to do the same.
Author
Owner

@dgutson commented on GitHub (Dec 2, 2024):

There is no special need in models to support structured output

Ok, this is fair. I implemented my own around ollama with zod and async-retry at the application level. It's somewhat trivial. Guide for anyone else wanting to do the same.

This is basically brute force retrying, we need native support.

<!-- gh-comment-id:2512992510 --> @dgutson commented on GitHub (Dec 2, 2024): > > There is no special need in models to support structured output > > Ok, this is fair. I implemented my own around ollama with `zod` and `async-retry` at the application level. It's somewhat trivial. [Guide](https://www.inferable.ai/blog/posts/llm-json-parser-structured-output) for anyone else wanting to do the same. This is basically brute force retrying, we need native support.
Author
Owner

@ParthSareen commented on GitHub (Dec 5, 2024):

This will get rolled out with #7900! @dgutson @nadeesha @jd-solanki

<!-- gh-comment-id:2518850099 --> @ParthSareen commented on GitHub (Dec 5, 2024): This will get rolled out with #7900! @dgutson @nadeesha @jd-solanki
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50584