[GH-ISSUE #10799] Failed to create new sequence: unable to create sampling context #7091

Closed
opened 2026-04-12 19:02:06 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @Sweeper777 on GitHub (May 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10799

What is the issue?

I get

{"error":"Failed to create new sequence: unable to create sampling context\n"}

for

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{"format":{"items":{"type":"string"},"type":"array"},"messages":[{"role":"user","content":"Give me some random strings"}],"model":"llama3.2", "stream":true}'

It seems like it is having trouble with minified JSON. If I do

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
   "format":{
      "items":{
         "type":"string"
      },
      "type":"array"
   },
   "messages":[
      {
         "role":"user",
         "content":"Give me some random strings"
      }
   ],
   "model":"llama3.2",
   "stream":false
}'

then it works as expected.

Relevant log output

parse: error parsing grammar: expecting ::= at 

root ::= "[" space (string ("," space string)*)? "]" space
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
string ::= "\"" char* "\"" space
space
llama_grammar_init_impl: failed to parse grammar
time=2025-05-21T13:24:05.843+01:00 level=INFO source=server.go:809 msg="llm predict error: Failed to create new sequence: unable to create sampling context"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.7.0

Originally created by @Sweeper777 on GitHub (May 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10799 ### What is the issue? I get ``` {"error":"Failed to create new sequence: unable to create sampling context\n"} ``` for ``` curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{"format":{"items":{"type":"string"},"type":"array"},"messages":[{"role":"user","content":"Give me some random strings"}],"model":"llama3.2", "stream":true}' ``` It seems like it is having trouble with minified JSON. If I do ``` curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{ "format":{ "items":{ "type":"string" }, "type":"array" }, "messages":[ { "role":"user", "content":"Give me some random strings" } ], "model":"llama3.2", "stream":false }' ``` then it works as expected. ### Relevant log output ```shell parse: error parsing grammar: expecting ::= at root ::= "[" space (string ("," space string)*)? "]" space char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4}) string ::= "\"" char* "\"" space space llama_grammar_init_impl: failed to parse grammar time=2025-05-21T13:24:05.843+01:00 level=INFO source=server.go:809 msg="llm predict error: Failed to create new sequence: unable to create sampling context" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.7.0
GiteaMirror added the bug label 2026-04-12 19:02:06 -05:00
Author
Owner

@rick-github commented on GitHub (May 21, 2025):

It's actually the length of the schema that's causing the problem. There was an attempt in 0.7.0 (#10649) to move away from using a static buffer for the grammar, by scaling the buffer based on the length of the input schema. Unfortunately this proved too small for short schemas, and as a result the generated grammar is truncated and fails later parsing. There is a fix pending in #10747. In the meantime, padding the schema with spaces will work around the problem.

$ curl -sX POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{"format":{"items":{"type":"string"},"type":"array" },"messages":[{"role":"user","content":"Give me some random strings"}],"model":"llama3.2", "stream":false}' | jq
{
  "model": "llama3.2",
  "created_at": "2025-05-21T12:50:57.778590829Z",
  "message": {
    "role": "assistant",
    "content": "[\"_F2pA\",\":^G1b6w\",\"{xKd8Rt\",\"<|fU8aD9j\"]"
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 572723769,
  "load_duration": 286018703,
  "prompt_eval_count": 30,
  "prompt_eval_duration": 10051923,
  "eval_count": 34,
  "eval_duration": 275903586
}
<!-- gh-comment-id:2897861246 --> @rick-github commented on GitHub (May 21, 2025): It's actually the length of the schema that's causing the problem. There was an attempt in 0.7.0 (#10649) to move away from using a static buffer for the grammar, by scaling the buffer based on the length of the input schema. Unfortunately this proved too small for short schemas, and as a result the generated grammar is truncated and fails later parsing. There is a fix pending in #10747. In the meantime, padding the schema with spaces will work around the problem. ```console $ curl -sX POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{"format":{"items":{"type":"string"},"type":"array" },"messages":[{"role":"user","content":"Give me some random strings"}],"model":"llama3.2", "stream":false}' | jq { "model": "llama3.2", "created_at": "2025-05-21T12:50:57.778590829Z", "message": { "role": "assistant", "content": "[\"_F2pA\",\":^G1b6w\",\"{xKd8Rt\",\"<|fU8aD9j\"]" }, "done_reason": "stop", "done": true, "total_duration": 572723769, "load_duration": 286018703, "prompt_eval_count": 30, "prompt_eval_duration": 10051923, "eval_count": 34, "eval_duration": 275903586 } ```
Author
Owner

@Sweeper777 commented on GitHub (May 21, 2025):

I see. I cannot change the formatting of the JSON because I am using someone else's wrapper, which handles all the JSON serialisation. However, I can add other unrelated random key value pairs (which seem to be ignored) to the schema to make the schema longer. Is there a fixed minimum length where the buffer will be long enough to not truncate the grammar?

<!-- gh-comment-id:2898081488 --> @Sweeper777 commented on GitHub (May 21, 2025): I see. I cannot change the formatting of the JSON because I am using [someone else's wrapper](https://github.com/kevinhermawan/OllamaKit), which handles all the JSON serialisation. However, I *can* add other unrelated random key value pairs (which seem to be ignored) to the schema to make the schema longer. Is there a fixed minimum length where the buffer will be long enough to not truncate the grammar?
Author
Owner

@rick-github commented on GitHub (May 21, 2025):

Is there a fixed minimum length

Sadly, no. I thought about this when I did the initial change but couldn't come up with a compelling reason for having a minimum length, so actually removed the code I had added for that. Adding unrelated random values should work, for every character added to the schema, the grammar buffer will be extended 4 characters.

<!-- gh-comment-id:2898098850 --> @rick-github commented on GitHub (May 21, 2025): > Is there a fixed minimum length Sadly, no. I thought about this when I did the initial change but couldn't come up with a compelling reason for having a minimum length, so actually removed the code I had added for that. Adding unrelated random values should work, for every character added to the schema, the grammar buffer will be extended 4 characters.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7091