[GH-ISSUE #1749] The "seed" is not working reliable for me. #1000

Closed
opened 2026-04-12 10:42:39 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @oderwat on GitHub (Dec 30, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1749

Originally assigned to: @BruceMacD on GitHub.

I am using a seed (int 1) for prompt generation with a mistral model, and it works not reliable. Instead, I get some interesting results with a pattern:

EDIT: It seems like this behavior is independent of the seed choice and the seeds are not working at all?

When freshly start ollama serve and send the exact same prompt together with a seed to "/api/generate" (stream: false) I always get three times the same reply. The fourth and all following replies are then different!

When I switch the model and make the same prompt to that, it also gives three of the same and then varying results!

As a workaround, I actually switch to another very small model and create a minimal embedding (is faster than doing an inference prompt) before doing the actual prompt and this gives me reliable results. Even if that actually works and is quite fast in my case, I think there is a problem that needs to be fixed.

I am on the current main 2a2fa3c

Originally created by @oderwat on GitHub (Dec 30, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1749 Originally assigned to: @BruceMacD on GitHub. I am using a seed (int 1) for prompt generation with a mistral model, and it works not reliable. Instead, I get some interesting results with a pattern: EDIT: It seems like this behavior is independent of the seed choice and the seeds are not working at all? When freshly start `ollama serve` and send the exact same prompt together with a seed to "/api/generate" (stream: false) I always get three times the same reply. The fourth and all following replies are then different! When I switch the model and make the same prompt to that, it also gives three of the same and then varying results! As a workaround, I actually switch to another very small model and create a minimal embedding (is faster than doing an inference prompt) before doing the actual prompt and this gives me reliable results. Even if that actually works and is quite fast in my case, I think there is a problem that needs to be fixed. I am on the current main [2a2fa3c](https://github.com/jmorganca/ollama/commit/2a2fa3c3298194f4f3790aade78df2f53d170d8e)
GiteaMirror added the bug label 2026-04-12 10:42:39 -05:00
Author
Owner

@technovangelist commented on GitHub (Jan 2, 2024):

Interesting, I tried out this code:

async function test() {
	const body = {
		"model": "mistral", 
		"prompt": "list 3 synonyms for a sink", 
		"stream": false, 
		"options": {
			"seed": 12345, 
			"temperature": 0
		}
	};
	const response = await fetch("http://localhost:11434/api/generate", {
		"method": "POST", 
		"body": JSON.stringify(body), 
	});
	
	const out = await response.json();
	
	console.log(out.response)
}

test();
test();
test();
test();
test();
test();
test();
test();
test();
test();

When I leave out the temperature, I get somewhat random responses each time. But use the seed and temp and I get the same results every single time.

<!-- gh-comment-id:1874249372 --> @technovangelist commented on GitHub (Jan 2, 2024): Interesting, I tried out this code: ``` async function test() { const body = { "model": "mistral", "prompt": "list 3 synonyms for a sink", "stream": false, "options": { "seed": 12345, "temperature": 0 } }; const response = await fetch("http://localhost:11434/api/generate", { "method": "POST", "body": JSON.stringify(body), }); const out = await response.json(); console.log(out.response) } test(); test(); test(); test(); test(); test(); test(); test(); test(); test(); ``` When I leave out the temperature, I get somewhat random responses each time. But use the seed and temp and I get the same results every single time.
Author
Owner

@BruceMacD commented on GitHub (Jan 2, 2024):

The suggestion above (using temperature) helps a lot, although I do occasionally see some variation here after may requests.

Some other things to note:

  • There was a bug in main where seed wouldn't be set properly, which is fixed in #1761. If you weren't building from source this will not have been the issue.
  • If you're setting temperate in the interactive ollama run the chat history will effect what is generated also.
<!-- gh-comment-id:1874264335 --> @BruceMacD commented on GitHub (Jan 2, 2024): The suggestion above (using temperature) helps a lot, although I do occasionally see some variation here after may requests. Some other things to note: - There was a bug in `main` where seed wouldn't be set properly, which is fixed in #1761. If you weren't building from source this will not have been the issue. - If you're setting `temperate` in the interactive `ollama run` the chat history will effect what is generated also.
Author
Owner

@oderwat commented on GitHub (Jan 2, 2024):

I recompiled from main with the mentioned PR merged and basically have the same result as before (I thought it was different, but I still had my "reset" workaround active).

@technovangelist have you tried other seeds?

<!-- gh-comment-id:1874303049 --> @oderwat commented on GitHub (Jan 2, 2024): I recompiled from main with the mentioned PR merged and basically have the same result as before (I thought it was different, but I still had my "reset" workaround active). @technovangelist have you tried other seeds?
Author
Owner

@oderwat commented on GitHub (Jan 2, 2024):

@technovangelist This is what I get using the current main and your little script (pointing at my server and using 'dolphin2.2-mistral:7b-q4_K_M')

image

This is after restarting the ollama server and using a different seed:

image

P.S.: The ollama server runs on WSL 2 using a RTX 3090 and 64 GB RAM.

EDIT: I started the ollama server on my iMac with the same model and script. There it replies twice with the identical and on the third call starts to deviate. Also, independent of the seed I use.

<!-- gh-comment-id:1874311334 --> @oderwat commented on GitHub (Jan 2, 2024): @technovangelist This is what I get using the current main and your little script (pointing at my server and using 'dolphin2.2-mistral:7b-q4_K_M') <img width="909" alt="image" src="https://github.com/jmorganca/ollama/assets/719156/13efced0-73eb-4fb5-9c21-aca30bd3d79d"> This is after restarting the ollama server and using a different seed: <img width="234" alt="image" src="https://github.com/jmorganca/ollama/assets/719156/2ce2659f-d707-44c2-bcba-2334dd86e5a7"> P.S.: The ollama server runs on WSL 2 using a RTX 3090 and 64 GB RAM. EDIT: I started the ollama server on my iMac with the same model and script. There it replies twice with the identical and on the third call starts to deviate. Also, independent of the seed I use.
Author
Owner

@oderwat commented on GitHub (Jan 3, 2024):

Using the newest main (a00367b2f92) I still can't get the seed to work reliable. But I wonder if that is "just me" or if this is a confirmed problem. From all my experiments it looks like as if there is no real support for a 'seed' or implemented in an unusable way.

<!-- gh-comment-id:1875793416 --> @oderwat commented on GitHub (Jan 3, 2024): Using the newest main (a00367b2f92) I still can't get the seed to work reliable. But I wonder if that is "just me" or if this is a confirmed problem. From all my experiments it looks like as if there is no real support for a 'seed' or implemented in an unusable way.
Author
Owner

@technovangelist commented on GitHub (Jan 4, 2024):

Thanks @oderwat for checking. I know @BruceMacD is looking into it. We added a bug label to the issue so we will continue investigating.

<!-- gh-comment-id:1877522406 --> @technovangelist commented on GitHub (Jan 4, 2024): Thanks @oderwat for checking. I know @BruceMacD is looking into it. We added a bug label to the issue so we will continue investigating.
Author
Owner

@jmorganca commented on GitHub (Feb 20, 2024):

@oderwat for consistent output both seed must be set to a given number, and temperature must be set to 0. Let me know if this doesn't help.

I've added some examples of reproducible outputs to api.md:

https://github.com/ollama/ollama/blob/main/docs/api.md#request-reproducible-outputs

<!-- gh-comment-id:1953352264 --> @jmorganca commented on GitHub (Feb 20, 2024): @oderwat for consistent output both `seed` must be set to a given number, and `temperature` must be set to 0. Let me know if this doesn't help. I've added some examples of reproducible outputs to `api.md`: https://github.com/ollama/ollama/blob/main/docs/api.md#request-reproducible-outputs
Author
Owner

@d-kleine commented on GitHub (Jun 27, 2024):

@jmorganca To make the output on the same OS reproducible, you need to set a fixed context length with num_ctx too (otherwise the output slightly differs when not setting it)

Also, the having set seed, temperature to 0 and num_ctx, the output is deterministic and reproducible, but inconsistent over different OS. Might be related to llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/2100

<!-- gh-comment-id:2193870584 --> @d-kleine commented on GitHub (Jun 27, 2024): @jmorganca To make the output on the same OS reproducible, you need to set a fixed context length with `num_ctx` too (otherwise the output slightly differs when not setting it) Also, the having set `seed`, `temperature` to 0 and `num_ctx`, the output is deterministic and reproducible, but inconsistent over different OS. Might be related to llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/2100
Author
Owner

@jtyska commented on GitHub (Nov 13, 2024):

Hey! I'm still having this problem. When I set temperature to 0 and use different seed values, the generated content is always the same. Any clue on how to fix it? Here is the payload I'm using to generate content via API (/generate)

"model": "qwen2.5:72b",
"prompt": "Generate something",
"options":{
"seed": 99, # here I change to 100, 101, 102, 103 anyvalue and the generated content is always the same
"temperature":0
},
"stream": False

        My ollama version is 0.4.1 got from the docker image on dockerhub.
        
       Could anyone give me a hand on this?
<!-- gh-comment-id:2473895916 --> @jtyska commented on GitHub (Nov 13, 2024): Hey! I'm still having this problem. When I set temperature to 0 and use different seed values, the generated content is always the same. Any clue on how to fix it? Here is the payload I'm using to generate content via API (/generate) "model": "qwen2.5:72b", "prompt": "Generate something", "options":{ "seed": 99, # here I change to 100, 101, 102, 103 anyvalue and the generated content is always the same "temperature":0 }, "stream": False My ollama version is 0.4.1 got from the docker image on dockerhub. Could anyone give me a hand on this?
Author
Owner

@d-kleine commented on GitHub (Nov 13, 2024):

@jtyska You probably need to pass a fixed context length too as this is not set up in the template: https://ollama.com/library/qwen2:72b/blobs/f02dd72bb242. The reason is that the context length will be random then too, leading the different lengths (and therefore also tokens) generated.

So try this by adding:

...
"num_ctx": 32_768
...

to your code and let us know is this fixes the issue for you.

<!-- gh-comment-id:2473987714 --> @d-kleine commented on GitHub (Nov 13, 2024): @jtyska You probably need to pass a fixed context length too as this is not set up in the template: https://ollama.com/library/qwen2:72b/blobs/f02dd72bb242. The reason is that the context length will be random then too, leading the different lengths (and therefore also tokens) generated. So try this by adding: ```python ... "num_ctx": 32_768 ... ``` to your code and let us know is this fixes the issue for you.
Author
Owner

@oderwat commented on GitHub (Nov 13, 2024):

It sounds counterintuitive that making more stuff "fixed" will suddenly create different outputs when they were the same with different seeds before. Could you elaborate on why that is?

The reason is that the context length will be random then too, leading the different lengths (and therefore also tokens) generated.

?

And, why would the context be random anyway? The context of a model is a fixed value. What may change is what you feed to the model by changing shortening the prompt. What am I missing?

<!-- gh-comment-id:2474002739 --> @oderwat commented on GitHub (Nov 13, 2024): It sounds counterintuitive that making more stuff "fixed" will suddenly create different outputs when they were the same with different seeds before. Could you elaborate on why that is? > The reason is that the context length will be random then too, leading the different lengths (and therefore also tokens) generated. ? And, why would the context be random anyway? The context of a model is a fixed value. What may change is what you feed to the model by changing shortening the prompt. What am I missing?
Author
Owner

@d-kleine commented on GitHub (Nov 13, 2024):

@oderwat Sorry, you are right – I was juggling too many things at once and got confused. The context length is indeed fixed, and what changes is the input or output length based on the prompt or generation settings.

@jtyska I just tested the settings in Ollama, you should be able to get consistent outputs per seed with the seed settings. If you you want variation across different seeds, you need to increase the temperature to a value >0 (for instance 0.7). So your settings would be:

{
        "model": "qwen2.5:70b",
        "messages": [
            {
                "role": "user",
                "content":  "Generate something"
            }
        ],
        "options": {
            "seed": 101,
            "temperature": 0.7            
        },
        "stream": False
    }

Then every generated output is consistent when using the same seed, but varies across different seeds.

<!-- gh-comment-id:2474095892 --> @d-kleine commented on GitHub (Nov 13, 2024): @oderwat Sorry, you are right – I was juggling too many things at once and got confused. The context length is indeed fixed, and what changes is the input or output length based on the prompt or generation settings. @jtyska I just tested the settings in Ollama, you should be able to get consistent outputs per seed with the `seed` settings. If you you want variation across different seeds, you need to increase the temperature to a value >0 (for instance `0.7`). So your settings would be: ```python { "model": "qwen2.5:70b", "messages": [ { "role": "user", "content": "Generate something" } ], "options": { "seed": 101, "temperature": 0.7 }, "stream": False } ``` Then every generated output is consistent when using the same seed, but varies across different seeds.
Author
Owner

@jtyska commented on GitHub (Nov 13, 2024):

If you increase the temperature, reproducibility will be impossible. The answer for the same seed, with temperature 0, should be the same, but different seeds, with temperature zero, should produce different responses. Isn't that right? Otherwise, what's the point of having a seed parameter?

<!-- gh-comment-id:2474232066 --> @jtyska commented on GitHub (Nov 13, 2024): If you increase the temperature, reproducibility will be impossible. The answer for the same seed, with temperature 0, should be the same, but different seeds, with temperature zero, should produce different responses. Isn't that right? Otherwise, what's the point of having a seed parameter?
Author
Owner

@d-kleine commented on GitHub (Nov 13, 2024):

Please try out the settings I have provided, and let me know if that worked for you.

If you increase the temperature, reproducibility will be impossible. The answer for the same seed, with temperature 0, should be the same, but different seeds, with temperature zero, should produce different responses. Isn't that right? Otherwise, what's the point of having a seed parameter?

Regarding your question, that's not right - the seed initializes a random number generator used during text generation, changing which tokens will be chosen (i.e., for temperature > 0). A fixed seed ensures consistency and reproducibility in such cases. However, when temperature = 0, the model becomes fully deterministic (=always selecting the most likely tokens), meaning there is no randomness in token selection and changing the seed will not affect the output then.

<!-- gh-comment-id:2474296685 --> @d-kleine commented on GitHub (Nov 13, 2024): Please try out the settings I have provided, and let me know if that worked for you. > If you increase the temperature, reproducibility will be impossible. The answer for the same seed, with temperature 0, should be the same, but different seeds, with temperature zero, should produce different responses. Isn't that right? Otherwise, what's the point of having a seed parameter? Regarding your question, that's not right - the seed initializes a random number generator used during text generation, changing which tokens will be chosen (i.e., for **temperature > 0**). A fixed seed ensures consistency and reproducibility in such cases. However, when **temperature = 0**, the model becomes fully deterministic (=always selecting the most likely tokens), meaning there is no randomness in token selection and changing the seed will not affect the output then.
Author
Owner

@oderwat commented on GitHub (Nov 13, 2024):

@d-kleine That would mean that with a fixed seed the result of a fixed temperature should be deterministic?

<!-- gh-comment-id:2474306801 --> @oderwat commented on GitHub (Nov 13, 2024): @d-kleine That would mean that with a fixed seed the result of a fixed temperature should be deterministic?
Author
Owner

@d-kleine commented on GitHub (Nov 13, 2024):

Depends on what you mean by "fixed temperature". So for temp = 0 (in practice, it's a very small number close to zero), yes; And for temp > 0, only under ideal conditions (there can be different calculations due to underlying hardware differences or model architecture complexities though)

I think it's easier to distinguish using consistency and determinism:

  • fixed seed: ensures consistency when randomness is involved
  • temp=0: guarantees determinism

So, both settings contribute to reproducibility but from different sides: the seed controls randomness when it's present, and temperature controls whether randomness exists at all.

<!-- gh-comment-id:2474348265 --> @d-kleine commented on GitHub (Nov 13, 2024): Depends on what you mean by "fixed temperature". So for temp = 0 (in practice, it's a very small number close to zero), yes; And for temp > 0, only under ideal conditions (there can be different calculations due to underlying hardware differences or model architecture complexities though) I think it's easier to distinguish using **consistency** and **determinism**: * fixed seed: ensures **consistency** when randomness is involved * temp=0: guarantees **determinism** So, both settings contribute to reproducibility but from different sides: the seed controls randomness when it's present, and temperature controls whether randomness exists at all.
Author
Owner

@oderwat commented on GitHub (Nov 13, 2024):

"temp=0: guarantees determinism"

Aha? Between different models and hardware? Really?

And of course we all here talk about the same model, same hardware (and a temperature > 0.0) with the same seed vs. another seed.

What I think what happens is the following:

The random generator used is not a (deterministic) pseudo random generator and that means the seed is useless if s more than one number needed.

Could that be the case?

<!-- gh-comment-id:2474378145 --> @oderwat commented on GitHub (Nov 13, 2024): > "temp=0: guarantees determinism" Aha? Between different models and hardware? Really? And of course we all here talk about the same model, same hardware (and a temperature > 0.0) with the same seed vs. another seed. What I think what happens is the following: The random generator used is not a (deterministic) pseudo random generator and that means the seed is useless if s more than one number needed. Could that be the case?
Author
Owner

@d-kleine commented on GitHub (Nov 13, 2024):

Aha? Between different models and hardware? Really?

Yes, I was actually involved in some of those discussions around it, see for example https://github.com/ollama/ollama/pull/5760#issuecomment-2462917232 and https://github.com/ggerganov/llama.cpp/issues/8353#issuecomment-2304407109

What I think what happens is the following:

The random generator used is not a (deterministic) pseudo random generator and that means the seed is useless if s more than one number needed.

Could that be the case?

In this case, no - most modern LLMs (like Qwen) use deterministic PRNG.

<!-- gh-comment-id:2474419037 --> @d-kleine commented on GitHub (Nov 13, 2024): > Aha? Between different models and hardware? Really? Yes, I was actually involved in some of those discussions around it, see for example https://github.com/ollama/ollama/pull/5760#issuecomment-2462917232 and https://github.com/ggerganov/llama.cpp/issues/8353#issuecomment-2304407109 > What I think what happens is the following: > > The random generator used is not a (deterministic) pseudo random generator and that means the seed is useless if s more than one number needed. > > Could that be the case? In this case, no - most modern LLMs (like Qwen) use deterministic PRNG.
Author
Owner

@jtyska commented on GitHub (Nov 13, 2024):

Thanks a lot @d-kleine I finally got it. I tested with a higher temperature and the same seed, and it indeed produced the same output!

<!-- gh-comment-id:2474439283 --> @jtyska commented on GitHub (Nov 13, 2024): Thanks a lot @d-kleine I finally got it. I tested with a higher temperature and the same seed, and it indeed produced the same output!
Author
Owner

@filijamz commented on GitHub (Mar 1, 2025):

I tried some experiments with llama3.2 with no memory (history) and random seed values. I kept asking for a riddle expecting a new riddle with a different seed each time. Instead, it responded with the exact same riddle each time. Am I misunderstanding what seed does?

<!-- gh-comment-id:2692298297 --> @filijamz commented on GitHub (Mar 1, 2025): I tried some experiments with llama3.2 with no memory (history) and random seed values. I kept asking for a riddle expecting a new riddle with a different seed each time. Instead, it responded with the exact same riddle each time. Am I misunderstanding what seed does?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1000