[GH-ISSUE #7854] Different outputs for first and subsequent inferences after model load #51534

Closed
opened 2026-04-28 20:28:24 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @akamaus on GitHub (Nov 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7854

What is the issue?

The result I get just after model load into VRAM differ from subsequent ones. It's easily reproduced and consistent.

After issuing ollama clean, the first time I get A, and next times I get B. I tried several models (marco-o1 and qwen2.5 ) and both CPU (with num_gpu=0 option) and GPU inference and observe this behavior everywhere.

$ ollama clean qwen2.5
$ python -c 'from ollama import generate; gen1 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen2 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen3 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); print(gen1["response"] == gen2["response"], gen2["response"] == gen3["response"])'
False True

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.4.4

Originally created by @akamaus on GitHub (Nov 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7854 ### What is the issue? The result I get just after model load into VRAM differ from subsequent ones. It's easily reproduced and consistent. After issuing ollama clean, the first time I get A, and next times I get B. I tried several models (marco-o1 and qwen2.5 ) and both CPU (with num_gpu=0 option) and GPU inference and observe this behavior everywhere. ``` $ ollama clean qwen2.5 $ python -c 'from ollama import generate; gen1 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen2 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen3 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); print(gen1["response"] == gen2["response"], gen2["response"] == gen3["response"])' False True ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.4
GiteaMirror added the bug label 2026-04-28 20:28:24 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 27, 2024):

$ ollama -v
ollama version is 0.4.4
$ ollama stop qwen2.5
$ python -c 'from ollama import generate; gen1 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen2 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen3 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); print(gen1["response"] == gen2["response"], gen2["response"] == gen3["response"])'
True True

What's the actual difference between the responses? Note that there are some other parameters that can influence the output.

<!-- gh-comment-id:2503136395 --> @rick-github commented on GitHub (Nov 27, 2024): ```console $ ollama -v ollama version is 0.4.4 $ ollama stop qwen2.5 $ python -c 'from ollama import generate; gen1 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen2 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen3 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); print(gen1["response"] == gen2["response"], gen2["response"] == gen3["response"])' True True ``` What's the actual difference between the responses? Note that there are some other [parameters](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values) that can influence the output.
Author
Owner

@akamaus commented on GitHub (Nov 27, 2024):

python -c 'from ollama import generate; gen1 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen2 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen3 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); print("***\n", gen1["response"]); print("***\n", gen2["response"]); print("***\n", gen3["response"])'
***
 The sky appears blue during the day due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light waves are shorter and scatter more easily than other colors like red or yellow, which have longer wavelengths.

Rayleigh scattering causes these blue light waves to scatter in all directions. Since we see scattered light from every direction, the sky
***
 The sky appears blue during the day due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light waves are shorter and scatter more than other colors when they collide with these tiny particles.

This scattered blue light then travels in all directions, making the sky appear blue to an observer on Earth's surface. At sunrise or sunset,
***
 The sky appears blue during the day due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light waves are shorter and scatter more than other colors when they collide with these tiny particles.

This scattered blue light then travels in all directions, making the sky appear blue to an observer on Earth's surface. At sunrise or sunset,
<!-- gh-comment-id:2503181051 --> @akamaus commented on GitHub (Nov 27, 2024): ``` python -c 'from ollama import generate; gen1 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen2 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); gen3 = generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "num_predict": 100}); print("***\n", gen1["response"]); print("***\n", gen2["response"]); print("***\n", gen3["response"])' *** The sky appears blue during the day due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light waves are shorter and scatter more easily than other colors like red or yellow, which have longer wavelengths. Rayleigh scattering causes these blue light waves to scatter in all directions. Since we see scattered light from every direction, the sky *** The sky appears blue during the day due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light waves are shorter and scatter more than other colors when they collide with these tiny particles. This scattered blue light then travels in all directions, making the sky appear blue to an observer on Earth's surface. At sunrise or sunset, *** The sky appears blue during the day due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light waves are shorter and scatter more than other colors when they collide with these tiny particles. This scattered blue light then travels in all directions, making the sky appear blue to an observer on Earth's surface. At sunrise or sunset, ```
Author
Owner

@akamaus commented on GitHub (Nov 27, 2024):

I use a custom-built version from nixos-master. Maybe that's the reason.

Note that there are some other parameters that can influence the output.

I use default settings. Which ones should I pay attention to?

<!-- gh-comment-id:2503192017 --> @akamaus commented on GitHub (Nov 27, 2024): I use a custom-built version from nixos-master. Maybe that's the reason. > Note that there are some other [parameters](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values) that can influence the output. I use default settings. Which ones should I pay attention to?
Author
Owner

@rick-github commented on GitHub (Nov 27, 2024):

top_p can be influential, but even after setting that low, the results vary:

[nix-shell:/]# ollama stop qwen2.5 ; python -c 'from ollama import generate; gen = [generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "top_p":0, "num_predict": 100}) for x in range(3)]; print([gen[x]["response"] == gen[x+1]["response"] for x in range(len(gen)-1)])'
[False, True]

It seems to only affect the first two responses, if the script does more generations the subsequent ones match. So it might be some initialization state in the runner. Not sure why it's affecting nixos and apparently not mint, though. Looking through the package definition I don't see any patches that would change ollama's behaviour.

<!-- gh-comment-id:2503380421 --> @rick-github commented on GitHub (Nov 27, 2024): `top_p` can be influential, but even after setting that low, the results vary: ```console [nix-shell:/]# ollama stop qwen2.5 ; python -c 'from ollama import generate; gen = [generate(model="qwen2.5", prompt="Sky is blue because", options={"temperature": 0, "seed":0, "top_p":0, "num_predict": 100}) for x in range(3)]; print([gen[x]["response"] == gen[x+1]["response"] for x in range(len(gen)-1)])' [False, True] ``` It seems to only affect the first two responses, if the script does more generations the subsequent ones match. So it might be some initialization state in the runner. Not sure why it's affecting nixos and apparently not mint, though. Looking through the package definition I don't see any patches that would change ollama's behaviour.
Author
Owner

@jessegross commented on GitHub (Nov 27, 2024):

This is a variation of #5321 - there is some more information in that bug that might be helpful. I'm going to close this one in favor of tracking it there.

<!-- gh-comment-id:2504614190 --> @jessegross commented on GitHub (Nov 27, 2024): This is a variation of #5321 - there is some more information in that bug that might be helpful. I'm going to close this one in favor of tracking it there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51534