[GH-ISSUE #12559] Non-deterministic structured output with same seed #70390

Open
opened 2026-05-04 21:23:29 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @ndido98 on GitHub (Oct 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12559

What is the issue?

I'm running into an issue in which the output is not deterministic between runs when I ask for structured output instead of plaintext.

See following Python script:

import ollama
from pydantic import BaseModel

class RandomNumber(BaseModel):
    value: int

class RandomNumbers(BaseModel):
    numbers: list[RandomNumber]

response = ollama.chat(
    model="phi4:14b",
    messages=[
        {"role": "user", "content": "Generate 100 random numbers between 1 and 100"}
    ],
    format=RandomNumbers.model_json_schema(),
    options=ollama.Options(
        seed=42,
        temperature=1.0,
        num_gpu=9999,
        top_p=0.95,
        top_k=40,
    ),
)
model = RandomNumbers.model_validate_json(response.message.content)
print([n.value for n in model.numbers])

If I run this script twice, I expect to have the same list of numbers, but instead I get two very different ones. However, if I comment out the format keyword argument and instead just print out the raw model's response, that one is deterministic and is consistently repeated across runs, as expected. In other words, it looks like that the structured output somehow breaks the determinism I expected.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.12.3

Originally created by @ndido98 on GitHub (Oct 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12559 ### What is the issue? I'm running into an issue in which the output is not deterministic between runs when I ask for structured output instead of plaintext. See following Python script: ```python import ollama from pydantic import BaseModel class RandomNumber(BaseModel): value: int class RandomNumbers(BaseModel): numbers: list[RandomNumber] response = ollama.chat( model="phi4:14b", messages=[ {"role": "user", "content": "Generate 100 random numbers between 1 and 100"} ], format=RandomNumbers.model_json_schema(), options=ollama.Options( seed=42, temperature=1.0, num_gpu=9999, top_p=0.95, top_k=40, ), ) model = RandomNumbers.model_validate_json(response.message.content) print([n.value for n in model.numbers]) ``` If I run this script twice, I expect to have the same list of numbers, but instead I get two very different ones. However, if I comment out the format keyword argument and instead just print out the raw model's response, that one is deterministic and is consistently repeated across runs, as expected. In other words, it looks like that the structured output somehow breaks the determinism I expected. ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-05-04 21:23:29 -05:00
Author
Owner

@ghmer commented on GitHub (Oct 10, 2025):

Isn't the temperature also controlling how "random" a generated answer is?
I was thinking of a seed as "this is the starting point for the generator", temperature == 0 is deterministic, and every temperature over 0 adds some "randomness" to the output.

<!-- gh-comment-id:3390467085 --> @ghmer commented on GitHub (Oct 10, 2025): Isn't the temperature also controlling how "random" a generated answer is? I was thinking of a seed as "this is the starting point for the generator", temperature == 0 is deterministic, and every temperature over 0 adds some "randomness" to the output.
Author
Owner

@ndido98 commented on GitHub (Oct 10, 2025):

In my mental model the seed should also affect the sampler, so by fixing the seed I should not always get the most probable token (as temperature > 0); however, across different calls I expect the same tokens in the same order if I'm using the same seed, as the sampler should choose the same random tokens.

EDIT: I also tried explicitly setting temperature to 0, but still two different runs yield different responses. However, after the second run, all responses are the same for a while; I suppose I am hitting some sort of cache.

<!-- gh-comment-id:3390647808 --> @ndido98 commented on GitHub (Oct 10, 2025): In my mental model the seed should also affect the sampler, so by fixing the seed I should not always get the most probable token (as temperature > 0); however, across different calls I expect the same tokens in the same order if I'm using the same seed, as the sampler should choose the same random tokens. EDIT: I also tried explicitly setting temperature to 0, but still two different runs yield different responses. However, after the second run, all responses are the same for a while; I suppose I am hitting some sort of cache.
Author
Owner

@ndido98 commented on GitHub (Oct 10, 2025):

Probably related to #5321 and #5760?

<!-- gh-comment-id:3391182716 --> @ndido98 commented on GitHub (Oct 10, 2025): Probably related to #5321 and #5760?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70390