[GH-ISSUE #5899] Grammar support #65717

Open
opened 2026-05-03 22:23:39 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @kelvinhammond on GitHub (Jul 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5899

Hello,

I understand the developers are very busy but there are several pull requests open for grammar support and it appears to be as simple as passing the value through to llama.cpp . Is it possible to get one of these pull requests merged so that it is possible to use more than just JSON.

Is there another way to support a format similar to ChatML instead of grammar, this is what I'm looking to achieve.

Originally created by @kelvinhammond on GitHub (Jul 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5899 Hello, I understand the developers are very busy but there are several pull requests open for grammar support and it appears to be as simple as passing the value through to llama.cpp . Is it possible to get one of these pull requests merged so that it is possible to use more than just JSON. Is there another way to support a format similar to ChatML instead of grammar, this is what I'm looking to achieve.
GiteaMirror added the feature request label 2026-05-03 22:23:39 -05:00
Author
Owner

@rndmcnlly commented on GitHub (Apr 14, 2025):

I'm also interested in accessing the (GBNF) grammar support from llama.cpp in ollama. However, I have an idea for a way do this without changing either the command line or http interfaces.

Consider treating grammars similar to how we treat chat templates or adapter. Consider them to be part of the model rather than part of specific requests. Concretely, you might specify them as part of the Modelfile:

GRAMMAR """
root ::= "I'm " ("hungry" | "sleepy") " ".
"""

Associating the grammar with the model at load-time matches the usage pattern from llama.cpp utilities like llama-cli and llama-server.

This design is not as flexible as allowing people to change the grammar on a per-response basis. In exchange for this rigidity, it allows grammars to be distributed along with and recombined with pre-existing models via layered Modelfiles. By making the grammar into a server-side configuration, you can then make use of grammars in client apps that don't offer users the option of manually configuring a low-level grammar.

Concrete application: Imagine a customer service chatbot that should use full intelligence while only responding within the bounds of a guaranteed-safe response grammar. Even if people find a way to send the chatbot a custom request, because the grammar is associated with the model on the server side, they can't trick it into responding outside of the pre-specified constraints.

Snag: Having an always-on grammar constraint like this doesn't play well with also having a per-response grammar (e.g. derived from a JSON schema. CS theory tells us that the set of context free languages are not closed under intersection, so we couldn't generally mechanically generate a new grammar that represented only strings that passed both grammars. One response would be to say that the per-request format is ignored when using a model-level grammar. Another would be to reject the request. Yet another would be to interpret it as wanting the union of the languages: the generated response will be valid according to at least one of the grammars as if the new grammar said root ::= model_grammar_root | request_specific_grammar_root. Eh, or you could have the per-request grammar override the model-specific one. Any of these solutions is fine so long as it is documented.

Stepping way back, it would also be fine to have a grammar option be part of the chat request (effectively a lower level format option). This would be fine for my applications, but I'm more excited about the ability to bake formats into the Modelfile.

<!-- gh-comment-id:2802240201 --> @rndmcnlly commented on GitHub (Apr 14, 2025): I'm also interested in accessing the (GBNF) grammar support from llama.cpp in ollama. However, I have an idea for a way do this without changing either the command line or http interfaces. Consider treating grammars similar to how we treat chat templates or adapter. Consider them to be part of the model rather than part of specific requests. Concretely, you might specify them as part of the Modelfile: ``` GRAMMAR """ root ::= "I'm " ("hungry" | "sleepy") " ". """ ``` Associating the grammar with the model at load-time matches the usage pattern from llama.cpp utilities like `llama-cli` and `llama-server`. This design is not as flexible as allowing people to change the grammar on a per-response basis. In exchange for this rigidity, it allows grammars to be distributed along with and recombined with pre-existing models via layered Modelfiles. By making the grammar into a server-side configuration, you can then make use of grammars in client apps that don't offer users the option of manually configuring a low-level grammar. Concrete application: Imagine a customer service chatbot that should use full intelligence while only responding within the bounds of a guaranteed-safe response grammar. Even if people find a way to send the chatbot a custom request, because the grammar is associated with the model on the server side, they can't trick it into responding outside of the pre-specified constraints. Snag: Having an always-on grammar constraint like this doesn't play well with also having a per-response grammar (e.g. derived from a JSON schema. CS theory tells us that the set of context free languages are not closed under intersection, so we couldn't generally mechanically generate a new grammar that represented only strings that passed both grammars. One response would be to say that the per-request format is ignored when using a model-level grammar. Another would be to reject the request. Yet another would be to interpret it as wanting the union of the languages: the generated response will be valid according to at least one of the grammars as if the new grammar said `root ::= model_grammar_root | request_specific_grammar_root`. Eh, or you could have the per-request grammar override the model-specific one. Any of these solutions is fine so long as it is documented. Stepping way back, it would also be fine to have a `grammar` option be part of the chat request (effectively a lower level `format` option). This would be fine for my applications, but I'm more excited about the ability to bake formats into the Modelfile.
Author
Owner

@ExposedCat commented on GitHub (May 16, 2025):

Any update on this? I've seen some discussions about devs not liking the idea but I don't see any drawbacks of adding say "unsafe_grammar" that is gonna just omit json-to-gbnf step and pass it as a grammar field to llama.cpp?

<!-- gh-comment-id:2887357564 --> @ExposedCat commented on GitHub (May 16, 2025): Any update on this? I've seen some discussions about devs not liking the idea but I don't see any drawbacks of adding say "unsafe_grammar" that is gonna just omit json-to-gbnf step and pass it as a grammar field to llama.cpp?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65717