mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 00:22:43 -05:00
[GH-ISSUE #586] /api/generate with fixed seed and temperature=0 doesn't produce deterministic results
#46774
Closed
opened 2026-04-27 23:57:00 -05:00 by GiteaMirror
·
44 comments
No Branch/Tag Specified
main
hoyyeva/anthropic-local-image-path
dhiltgen/ci
dhiltgen/llama-runner
parth-remove-claude-desktop-launch
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#46774
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jmorganca on GitHub (Sep 25, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/586
Originally assigned to: @BruceMacD on GitHub.
@sqs commented on GitHub (Sep 27, 2023):
I just noticed this as well.
~3 weeks ago, the following command was deterministic:
Now it is not.
@BruceMacD commented on GitHub (Oct 2, 2023):
Fixed in #663
@j2l commented on GitHub (May 23, 2024):
It happens again.
@d-kleine commented on GitHub (Jun 26, 2024):
I just resolved my issue with the Ollama API docs. The model parameters to make the model deterministic (and herewith reproducible) need to be passed in with an
"options"key in the json input for Ollama:@j2l commented on GitHub (Jun 27, 2024):
Hey @d-kleine, do you mean you just tested it and it absolutely works for you?
Because as you can see in the message from Sep 27, 2023, we all use seed + temperature + num_ctx in options:
@d-kleine commented on GitHub (Jun 27, 2024):
Yes, the output is deterministic and reproducible - on the same device with the same OS (in my case, Windows 10). However, if you have an different OS (I have tested it with Docker running an Ubuntu image on the same device), it will generate a similar but not identical output. So, the output is deterministic and reproducible on the same OS, but currently I have the issue to produce consistent output across different OS.
@j2l commented on GitHub (Jun 27, 2024):
Ok, thank you @d-kleine !
I use it on docker on my ubuntu host, maybe that's why.
@d-kleine commented on GitHub (Jun 27, 2024):
What I wanted to say is that when you switch your OS running the same code, you will get a slightly differently generated output, it's inconsistent across different OS.
@d-kleine commented on GitHub (Jun 27, 2024):
It seems like even with the same model params (same prompt, same model, same options like a fixed
seedandtemperatureset to 0), the firstly generated output seems to differ from the ones after that (the secondly generated output will be consistent with all following generated outputs).@Nayar commented on GitHub (Sep 9, 2024):
@d-kleine commented on GitHub (Sep 9, 2024):
@Nayar Due to the model's architecture. So try a different model (e.g. gemma2 worked good for me across different OS) or wait for PRs to be merged:
#4632
https://github.com/ggerganov/llama.cpp/issues/8353
@jtyska commented on GitHub (Nov 13, 2024):
Hey everyone, I have the opposite problem. With temperature 0, the generated content is exactly the same even if the seed is set differently. Model: qwen2.5:72b using options:{"seed":42 or 43 or 44 (always same response), temperature:0}. Does someone have this problem? Any clue on how to fix it?
@d-kleine commented on GitHub (Nov 13, 2024):
@jtyska Because seed and temp=0 is for making the output reproducible. If you want to generate a variable output each time you execute the generation process, don't use any seed and increase temperature to temp >0 and <=2. You could try 0.3 or 0.7 first to see if this fits your requirements.
@jtyska commented on GitHub (Nov 13, 2024):
Thanks for your reply @d-kleine.
I want it to be reproducible per seed value (this is usually how random seeds work, right?). In other words, for the same seed, I want the model to generate the same response, but for different seeds, different responses.
@jhpjhp1118 commented on GitHub (Nov 26, 2024):
I have one question.
I thought that
num_ctxis fixed as default value (=2048) even without explicitly setting this value as 2048.Does
num_ctxchange in every single execution?@d-kleine commented on GitHub (Nov 26, 2024):
Sorry, I just have revised my statement from above. You are right, every language model has a fixed predefined context length, depending the model itself (I always look it up for each model on HF). So, no, the
num_ctxdoes not change in every single execution when you use the same model (unless explicitly modified by the user or system).@jhpjhp1118 commented on GitHub (Nov 26, 2024):
@d-kleine Thanks for your response.
Then I have another question.
I expected a deterministic response to be obtained by simply setting the
temperatureto 0.However, based on my experience, just setting the
temperatureto 0 gave a slightly different response.And a deterministic response could be obtained only when setting up to
num_ctxto an arbitrary fixed value. Why do I have to also set upnum_ctxdirectly to have a consistent response, in addition totemperature?(This phenomenon mainly occurred when
num_ctxwas shorter than the token length of the prompt, in my executions)If there is something I am mistaken about, please correct it.
@d-kleine commented on GitHub (Nov 26, 2024):
So the idea of setting temp=0 is making a language model deterministic, that means it always picks the highest-probability token, ensuring the output is always the same for a given input, regardless of the seed (but it can vary across hardware differences, floating-point precision errors, and multithreading, see linked issues above)
About
num_ctx, you don't have to set this up too - a the max predefined context length is provided by the model. Sometimes it's helpful to reduce resources and inference time when shorten it. But if a part of your prompt is cut off due to a shortnum_ctx, the model's understanding of the task or context changes, leading to variations in output, even if other parameters like temperature are fixed to zero. Therefore you always have to ensure thatnum_ctxis large enough to fit your entire prompt without truncation.@jhpjhp1118 commented on GitHub (Nov 27, 2024):
I don't understand this comment yet.
After fixing the
temperatureto 0 and proceeding with the experiment further, it is deterministic whennum_ctxis longer than prompt length, but vice versa, the response was a little inconsistent.Even if
num_ctxis shorter than prompt length, I was thinking that if the truncation method was consistent, the response would be consistent because I expected the truncated input to be consistent. In fact, even ifnum_ctxis shorter than prompt length, most of the responses are consistent, but sometimes there are inconsistent cases.(below 10% possibility) Why is this happening?@d-kleine commented on GitHub (Nov 27, 2024):
Please provide a sample code, especially with the input, settings and the model are you using.
So, first of all, remember that most language models use subword tokenization techniques. You can test this out for example here for the tokenizers OpenAI uses for their models for your input, but there are more other tokenizers, each working differently. Therefore, you need to check which tokenizer your model uses if you want to reduce the context length. And please remember:
num_ctxis tokenized input + tokenized output, not the tokenized input only!About your question to the inconsistency of the output despite
temperatureset to 0: This can happen due to a lot of reasons, for example that the value is truly very small close to zero (to avoid zero division), sampling techniques, etc. Therefore it's important to know the model are you using and it's underlying architecture and techniques used in it.To your question:
That means that if important parts from your input being cut-off, the model might receive incomplete context and therefore can produce inconsistent output. Reasons for that - even with temperature 0, while the model deterministically selects the highest-probability token - can be ties in token probabilities, training biases, and positional sensitivity (e.g., primacy/recency effects), etc.
@jhpjhp1118 commented on GitHub (Nov 28, 2024):
This is my code. I use
llama3.1:8bmodel.There are 2 points that I want to say, after some more experiments.
num_ctx) than prompt, it showed consistent responses when the pod is not changed. Therefore, I still think that even ifnum_ctxis shorter than prompt length, responses would be consistent iftemperatureis 0. This means thatnum_ctxdoes not affect to consistency of responses if it is fixed.@d-kleine commented on GitHub (Nov 28, 2024):
My two cents on that:
num_ctxis way too short, maybe increase it to 8192 (LLama 3 original context length) or something like that if you want generate at least somewhat meaningful output for your input. LLama 3.1 8b supports 128k (actually ~131k) tokens.@hissain commented on GitHub (Dec 4, 2024):
Worked for me,
@jtyska commented on GitHub (Jan 6, 2025):
I'm back here. I'm still getting some non-expected results with the temperature parameter. Please take a look at these 10 calls to API/generate with the exact same prompt, temperature=0, and varying seeds (which shouldn't do anything since the output should be deterministic, right?). The first part is the counter of the responses, and the second is the exact payload and received responses. If you do parallel requests with varying temperatures (10 seeds for each), for instance,temperature=0 and temperature=10, this problem worsens, and temperature 0 generates several different responses to the same prompt. Could you test that and see if you can reproduce it? I suspect it is a real bug.
Response counter for Temperature 0 (same model, same prompt, different seeds)
{ "Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist with tasks such as writing essays, generating creative ideas, or even helping you learn new languages. Just let me know what you need assistance with!": 1,
"Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.\n\nIf you need any specific assistance, feel free to ask!": 9}
payload/response details for each seed --- first response is different from the others)
Starting tests with temperature: 0...
Seed 0
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 0,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist with tasks such as writing essays, generating creative ideas, or even helping you learn new languages. Just let me know what you need assistance with!
############
Seed 1
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 1,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 2
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 2,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 3
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 3,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 4
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 4,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 5
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 5,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 6
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 6,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 7
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 7,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 8
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 8,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
Seed 9
PAYLOAD SENT to api/generate
{
"model": "qwen2.5:1.5b",
"stop": "<|endoftext|>",
"prompt": "Hi! How are you? I am testing the impact of different temperatures in the response generated by you, could you please generate any answer you with for this request?",
"options": {
"seed": 9,
"temperature": 0
},
"stream": false
}
RESPONSE GOT
Hello! I'm just an AI language model and don't have feelings or emotions like humans do. However, I can help you find information on a wide range of topics and assist you with tasks such as writing essays, creating presentations, answering questions about science, history, literature, etc.
If you need any specific assistance, feel free to ask!
############
...done with temperature: 0.
--
Response counter for parallel requests temperature=0 and temperature=10
Temperature = 0 - Count number by unique response
[
7,
2,
1
]
Temperature = 10 - Count number by unique response
[
1,
1,
1,
1,
1,
1,
1,
1,
1,
1
]
Response counter for parallel requests of 8 different temperatures
Temperature = 0 - Count number by unique response
[
4,
1,
1,
1,
2,
1
]
*all other temperatures generated 10 different responses
@jessegross commented on GitHub (Jan 6, 2025):
Yes, there is known non-determinism as a result of prompt caching (which affects the first response) and parallelism. It's because floating point does not give exactly the same results for different combinations of operations that are mathematically equal. The effect is more pronounced on some hardware - for example, Nvidia GPUs show it more than Apple.
@jtyska commented on GitHub (Jan 6, 2025):
Actually, the parallelism is on the processes that are doing the API requests. Ollama server is answering them sequentially. Each time I run multiple API requests to the same model with different seeds/temperature (0 and some other value), I get completely random responses to temperature 0. This isn't supposed to happen, right?
About the prompt caching, is it possible to disable it?
@d-kleine commented on GitHub (Jan 7, 2025):
Not yet, see #5760
I think it would also be good to have these settings mentioned here: https://github.com/ggerganov/llama.cpp/issues/8353
These would make generated outputs truly deterministic
@d-kleine commented on GitHub (Jan 7, 2025):
@jtyska I just have seen this discussion here: https://github.com/ggerganov/llama.cpp/discussions/3005#discussioncomment-11151329
Have you tried this?
Edit: @jtyska I just saw that you have asked in the linked thread: Afaik there is no direct
--sampling-seqoption in ollama. But afaik I understand this param from it's documentation (see under "Sampling params", it decides in which order you are running the samplers.So, have you tried just using
"top_k"=1to get consistently reproducible outputs?Please also see ollama's param doc for reference. (just wanting to show where I got the param from; its description is actually not quite precise;
top_kconsiders the k most probable tokens per each generation step. Therefore,top_k=1will always select the most probable token at each step of the generation process.)@d-kleine commented on GitHub (Jan 14, 2025):
@jtyska Did this resolve your issue or do you need further help with this?
@jtyska commented on GitHub (Jan 14, 2025):
Hey @d-kleine, I'm running multiple experiments with temperature/seed values and different models, but I'm not confident that the behavior is consistent. I'm waiting for my experiments to finish, and then I will analyze their outputs to check if they are consistent or not. The workaround that I'm trying to avoid prompt-caching and any other parallel interference is not making the same model to answer multiple parallel requests. Also, when a specific experiment with seed/temperature ends, I unload the model (keep_alive=0) and load it again with the new parameters. Let's see.
I will also try to figure out how to use the top_k within the API when I have time. I let you how it goes.
@huynhducloi00 commented on GitHub (Feb 14, 2025):
this is a pretty bad issue. it basically means the platform answer differently for the same prompt. Imagine that this is used in a yes/no setup. an answer of 'yes' is 180 degree difference from no.
Without this being fixed. i don't see how Ollama can be used.
@d-kleine commented on GitHub (Feb 14, 2025):
That's not true - the seed still influences the initial probability distribution affecting the model's token selection process, even for a
top_p(limiting the cumulative probability distribution) near 0 or a lowtop_k(restricting the number of candidate tokens). Also, setting those parameters to 0 (excepttemperature, but even then this will be close to 0) doesn't make sense imho.@d-kleine commented on GitHub (Feb 14, 2025):
No problem - just to be precise, setting
top_pto zero is technically fine when temperature is zero since this overrides the nucleus sampling anyway (adjust eithertemperatureortop_p, but not both simultaneously). Buttop_kmust remain ≥1 to allow token selection.@huynhducloi00 commented on GitHub (Feb 14, 2025):
The issue still happens with top_k=1, seed=42, temperature=0, top_p (default, don't know). It does not make any sense. i ask it the same prompt. first time gives answer 'yes', second time gives answer 'no'. So, should i report this answer as yes or no in the paper. lol. This is really no use.
@d-kleine commented on GitHub (Feb 14, 2025):
@huynhducloi00 Please provide the model (and code, if possible)
@huynhducloi00 commented on GitHub (Feb 14, 2025):
it can be reproduced with model 'deepseek-r1:14b'
prompt="""
download from here https://justpaste.it/gk2xt
"""
It sometimes give answer yes, sometimes no, especially when you flip a different prompt in between.
@d-kleine commented on GitHub (Feb 14, 2025):
@huynhducloi00 Please try with
options={"temperature": 0, "seed": 42, "top_k": 1, "top_p": 1}, at least on my end (GPU) it's reproducible consistently across multiple runs (with r1 1.5b); Please let me know if that suits your requirements.Edit:
The prompt should be further engineered, e.g. making it more precise that it's a question and what the answer format should look like, e.g.
if you want to output only "Yes" or "No". For example, I have added a
?(indicating a question) and specified the desired output format a little more. This makes the input clearer for the model which has implications for the token selection process in the output.@huynhducloi00 commented on GitHub (Feb 14, 2025):
Thanks a lot for the response. That helps a lot
@huynhducloi00 commented on GitHub (Feb 15, 2025):
sadly, "top_p" does not solve it:
Here is the result of 10 questions:
They are off (different) at index 7.
model being test is deepseek-r1:14b. The option is
"temperature": 0, "seed": 42, "top_k": 1, 'top_p':1,'num_ctx':10000I am not sure why this is not being prioritized. This is very easy to reproduce. Is it due to the randomness in the quantization. I guess deepseek-r1:14b is a quantized model.
@d-kleine commented on GitHub (Feb 16, 2025):
Have you tried without limiting the context length (
num_ctx)?There can be another numerous reasons that can introduce randomness, e.g.
@dan31 commented on GitHub (Mar 26, 2025):
Wow this is disastrous. We encounter this a lot now. What is the true reason quantized models cannot produce deterministic outputs when all non-determinism is off in ollama? This is a completely serial execution on GPU by a single client for us with a Q4_K_M model, the most popular quantization format.
@kevin-pw commented on GitHub (Mar 26, 2025):
I wrote a longer comment in the related (and still open) issue here: https://github.com/ollama/ollama/issues/5321#issuecomment-2755465128 but the main takeaway is: Several models also produce inconsistent embeddings for the same inputs, which can significantly affect quality of downstream applications, for example RAG. That also means sampler options (like
temperature,seed,top_k, etc.) cannot fix this issue.@flexorx commented on GitHub (Mar 26, 2025):
@kevin-pw do you know if this issue is recently introduced?
I couldn't find any mentions of a similar issue until recently.
@huynhducloi00 commented on GitHub (May 9, 2025):
yes, not sure why the team does not prioritize this