[PR #4525] [CLOSED] Exposing grammar as a request parameter in completion/chat with go-side grammar validation #42762

Closed
opened 2026-04-24 22:29:45 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4525
Author: @richardanaya
Created: 5/19/2024
Status: Closed

Base: mainHead: main


📝 Commits (4)

  • 80b46f7 Exposing grammar as a request parameter in completion/chat with go-side grammar validation
  • 1181b8a Merge branch 'ollama:main' into main
  • 026f6c3 adding cacheing and new test
  • 36cf87f Merge branch 'main' into main

📊 Changes

6 files changed (+1130 additions, -0 deletions)

View changed files

📝 api/types.go (+6 -0)
📝 docs/api.md (+16 -0)
llm/grammar.go (+537 -0)
llm/grammar_test.go (+548 -0)
📝 llm/server.go (+21 -0)
📝 server/routes.go (+2 -0)

📄 Description

Why is passing down grammars needed?

Relying upon the context of a prompt to dictate structure can be unreliable (because its dependent upon the model and generational randomness) and takes up context space. Grammar is a well proven way to constrain generational output, and in fact format="JSON" even depends on it, but format="JSON" allows no reliable specification large complex structures and can even be tricked with prompt attacks.

image

Why grammar and not JSON schema?

While JSON schema would make a nice future addition, there's interest in data structures outside of JSON (simple enum values, programming languages, etc.). Also, JSON schema generators will rely upon grammars fundamentally, so validating the grammar generated by JSON schema will also benefit from grammar checking.

Why not just pass along the grammar to llama.cpp?

I looked into complexities of passing along grammar to llama.cpp server. There's a few challenges:

  • llama.cpp server doesn't return errors when bad grammar is passed to it with streaming mode on. It gives an incomprehensible "unexpected EOF"
    image
  • the in memory model will be reused if the grammar is valid OR changed. BUT... the in-memory model appears to get reloaded if you give it a bad grammar and then follow up with a good grammar.
    image
  • it appears to work perfectly reusing in memory models just passing along a completely valid grammar (even a variety of valid grammars)

My conclusion from this given the advice of the community is that we do indeed have to do our our GBNF grammar validation on the Go server side to do our best at preventing passing down bad grammar.


In this PR i've created:

  • the functionality to pass along grammar in chat and completion mode
  • documentation in readme related to new property
  • prevention of using grammar and json parameters at same time.
  • validation code for grammars
  • extensive set of 30+ tests for grammar ranging from character classes, strings, internationalizations comments, etc.
  • tests of every known grammar on llama.cpp and also individual unit tests
  • no usages of regex to make clear understandable parsing

Edge cases:

  • i've probably not implemented the entirety of whats possible in character classes, but I have a limited subset compatible with the grammar listed on llamma.cpp. My assumption is most people's grammars will be less complex than these.
  • there might be some valid grammars I don't currently support (but to the best of my knowledge we support all the major publicly available ones including ones as complex as C programming language), I chose not to use a full on go parser library because I wanted the cognitive load of this code to be approachable initially (rather than every viewer of this code to have to learn a new library). if in the future, we want to replace it with a more formal technology we can and tests can be reused.

Examples of success:

Screenshot 2024-05-19 at 1 29 10 PM Screenshot 2024-05-19 at 1 44 22 PM

Example of failure:

Screenshot 2024-05-19 at 1 28 22 PM

I believe this PR satisfies https://github.com/ollama/ollama/issues/4074 with an acceptable amount of protection from sending invalid GBNF grammars with useful error messages.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4525 **Author:** [@richardanaya](https://github.com/richardanaya) **Created:** 5/19/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (4) - [`80b46f7`](https://github.com/ollama/ollama/commit/80b46f72e6a6d52f03ab8e320eb915c0f003657f) Exposing grammar as a request parameter in completion/chat with go-side grammar validation - [`1181b8a`](https://github.com/ollama/ollama/commit/1181b8a77b66563f262c814adada7a2108ebade8) Merge branch 'ollama:main' into main - [`026f6c3`](https://github.com/ollama/ollama/commit/026f6c350e75a222c1c113fbfa42638f78bb34af) adding cacheing and new test - [`36cf87f`](https://github.com/ollama/ollama/commit/36cf87f31457732c0f0e3b25519f1ba9c6dd28d7) Merge branch 'main' into main ### 📊 Changes **6 files changed** (+1130 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+6 -0) 📝 `docs/api.md` (+16 -0) ➕ `llm/grammar.go` (+537 -0) ➕ `llm/grammar_test.go` (+548 -0) 📝 `llm/server.go` (+21 -0) 📝 `server/routes.go` (+2 -0) </details> ### 📄 Description **Why is passing down grammars needed?** Relying upon the context of a prompt to dictate structure can be unreliable (because its dependent upon the model and generational randomness) and takes up context space. Grammar is a well proven way to constrain generational output, and in fact `format="JSON"` even depends on it, but `format="JSON"` allows no reliable specification large complex structures and can even be tricked with prompt attacks. ![image](https://github.com/ollama/ollama/assets/294042/c37845b6-abba-4f2c-864b-a6e152766d07) **Why grammar and not JSON schema?** While JSON schema would make a nice future addition, there's interest in data structures outside of JSON (simple enum values, programming languages, etc.). Also, JSON schema generators will rely upon grammars fundamentally, so validating the grammar generated by JSON schema will also benefit from grammar checking. **Why not just pass along the grammar to llama.cpp?** I looked into complexities of passing along grammar to llama.cpp server. There's a few challenges: * llama.cpp server [doesn't return errors when bad grammar is passed to it](https://github.com/ggerganov/llama.cpp/issues/7391) with streaming mode on. It gives an incomprehensible "unexpected EOF" ![image](https://github.com/ollama/ollama/assets/294042/fcee5ad2-c231-4421-a03e-ba993b71512a) * the in memory model will be reused if the grammar is valid OR changed. BUT... the in-memory model appears to get reloaded if you give it a bad grammar and then follow up with a good grammar. ![image](https://github.com/ollama/ollama/assets/294042/c8542d29-2480-4451-ba34-05a64e5df2cd) * it appears to work perfectly reusing in memory models just passing along a completely valid grammar (even a variety of valid grammars) My conclusion from this given the advice of the community is that we do indeed have to do our our GBNF grammar validation on the Go server side to do our best at preventing passing down bad grammar. ---- In this PR i've created: * the functionality to pass along `grammar` in chat and completion mode * documentation in readme related to new property * prevention of using `grammar` and `json` parameters at same time. * validation code for grammars * extensive set of 30+ tests for grammar ranging from character classes, strings, internationalizations comments, etc. * tests of every known grammar on llama.cpp and also individual unit tests * no usages of regex to make clear understandable parsing Edge cases: * i've probably not implemented the entirety of whats possible in character classes, but I have a limited subset compatible with the grammar listed on llamma.cpp. My assumption is most people's grammars will be less complex than these. * there might be some valid grammars I don't currently support (but to the best of my knowledge we support all the major publicly available ones including ones as complex as C programming language), I chose not to use a full on go parser library because I wanted the cognitive load of this code to be approachable initially (rather than every viewer of this code to have to learn a new library). if in the future, we want to replace it with a more formal technology we can and tests can be reused. Examples of success: <img width="915" alt="Screenshot 2024-05-19 at 1 29 10 PM" src="https://github.com/ollama/ollama/assets/294042/ebc45377-2c27-4874-a2b2-00185736d1f9"> <img width="908" alt="Screenshot 2024-05-19 at 1 44 22 PM" src="https://github.com/ollama/ollama/assets/294042/535a5211-3060-421e-a8ef-b262ec1d969f"> Example of failure: <img width="967" alt="Screenshot 2024-05-19 at 1 28 22 PM" src="https://github.com/ollama/ollama/assets/294042/0ea76e6a-a34d-477a-871c-71ec7027a4f1"> I believe this PR satisfies https://github.com/ollama/ollama/issues/4074 with an acceptable amount of protection from sending invalid GBNF grammars with useful error messages. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 22:29:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#42762