mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 00:22:43 -05:00
Closed
opened 2026-05-03 13:29:48 -05:00 by GiteaMirror
·
95 comments
No Branch/Tag Specified
main
hoyyeva/anthropic-local-image-path
dhiltgen/ci
dhiltgen/llama-runner
parth-remove-claude-desktop-launch
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#63445
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @freQuensy23-coder on GitHub (Feb 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2415
Originally assigned to: @BruceMacD, @ParthSareen on GitHub.
Feature request:
How can i get logits (probabilites of each next token), during generation, just like I can do it in Open AI API (logprobs)? This feature will be helpfull for apps, that use logprobs to measure model avareness and confidence.
@neychevr commented on GitHub (Mar 25, 2024):
Hello! Seems like a really good feature for more complex usage of Ollama.
Is this feature already WIP, or contribution would be fine?
UPD: seems there is already a pending PR with this feature implemented: https://github.com/ollama/ollama/pull/1640
Could we help somehow to speed up the merge? :)
@josiahbryan commented on GitHub (Apr 19, 2024):
This would be super helpful to some ongoing research work I'm doing. Does anyone know of any providers that DO return logprobs, other than OpenAI of course? Any ETA when this might land here in ollama?
@mateon1 commented on GitHub (Apr 27, 2024):
I would also like to have this, I'm interested in having both echo + logprobs, so I can get information about the prompt too, instead of just the completion. Right now I'm using very small models with pytorch to compute logits directly, but that's really slow.
@magic-YuanTian commented on GitHub (May 2, 2024):
Any updates?
@briancleland commented on GitHub (May 6, 2024):
https://github.com/ollama/ollama/pull/1640#issuecomment-2043381653
@SharmaM-dev commented on GitHub (Jul 14, 2024):
Any updates?
@drdsgvo commented on GitHub (Jul 29, 2024):
Are there any updates on this very important issue? To not implement logits is not a valid solution. Anyone (including me) who needs logits will move from Ollama to a different solution! Please be aware of that.
@The-Inscrutable-X commented on GitHub (Aug 9, 2024):
support, would be very nice
@moritz-gross commented on GitHub (Aug 30, 2024):
I'm surprised this is not one of the first things implemented 🤔
@haukelicht commented on GitHub (Sep 11, 2024):
Hi there,
any progress on integrating this feature request?
@szocsbarni commented on GitHub (Sep 19, 2024):
Hi, is there a timeline available for integration?
@mommi84 commented on GitHub (Sep 19, 2024):
We are going to get AGI before this, 100%!
@SharmaM-dev commented on GitHub (Sep 19, 2024):
[laugh] Mridul Sharma reacted to your message:
From: Tom Soru @.>
Sent: Thursday, September 19, 2024 12:53:37 PM
To: ollama/ollama @.>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)
Hi, is there a timeline available for integration?
We are going to get AGI before this, 100%!
—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-2360907875, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUUYCRVXZP6V33UKE4LZXLCNDAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRQHEYDOOBXGU.
You are receiving this because you commented.Message ID: @.***>
@martinkozle commented on GitHub (Sep 19, 2024):
We are going to get GTA 6 before this.
@latent-variable commented on GitHub (Oct 16, 2024):
man, I really need this for to implement an CoT-Decoding pipeline for the open-webui. I guess I'll go back to playing Sparking Zero.
@josiahbryan commented on GitHub (Oct 16, 2024):
I've given up hope and switched back to llama.cpp for production inference.
Using it with ramalama which can pull from the ollama model library.
Really disappointed that the maintainers here show such disregard for such
a huge community request.
Makes me want to make sure I don't use the project in any way. If an
obvious thing like this is being totally ignored by the maintainers, then
it shows they don't really care much about what the community is asking
for.
On Tue, Oct 15, 2024, 11:39 PM Lino Valdovinos @.***>
wrote:
@briancleland commented on GitHub (Oct 16, 2024):
@jmorganca @bmizerany Has the team given up on implementing this feature?
@NumberChiffre commented on GitHub (Oct 19, 2024):
Pls make this happen lol, as a painful user on Mac :(
@josiahbryan commented on GitHub (Oct 19, 2024):
@jmorganca @bmizerany you guys broadcast your partnership with Hugging Face - great! What about this though? This seems like less than 1/10th the effort - why are you ignoring everyone asking for input here? Why don't you at least provide a timeline?
@athmanar commented on GitHub (Oct 22, 2024):
insane that this not given as an option? maybe better to switch to pure huggingface models
@codelion commented on GitHub (Oct 28, 2024):
Cot decoding and entropy decoding are available in optillm - https://github.com/codelion/optillm
@Cy-Fi commented on GitHub (Oct 29, 2024):
Ollama will stop being an option for us if crucial features like this are not beeing implemented...
@magic-YuanTian commented on GitHub (Nov 3, 2024):
For such a simple but important feature, the team demonstrates unexpected arrogance and ignorance over such a long time. I think this is a red flag for us to give up using Ollama as LLM backend since they cannot go far for sure.
@codelion commented on GitHub (Nov 3, 2024):
I have implemented it in PyTorch if anyone is looking for it they can use the following colab - https://colab.research.google.com/drive/1zPv47_tog2_KOFJY-WJxwPYR6mgoxKlK?usp=sharing
Here is the discussion on optillm where it was brought up as well - https://github.com/codelion/optillm/discussions/82
@codelion commented on GitHub (Nov 13, 2024):
Logprobs are now directly supported in our local inference server - https://github.com/codelion/optillm?tab=readme-ov-file#local-inference-server we use the OpenAI compatible API so you can get them using the same code.
@jooray commented on GitHub (Nov 13, 2024):
This would be very useful for enforcing the structure of output (output_cls with langchain, that currently works with llama.cpp and hugging face).
It can reject tokens that would break the output structure. Very useful for tool calling as well.
@drdsgvo commented on GitHub (Nov 13, 2024):
Great to hear that we have a cool alternative to ollama as those guys are not doing what needs to be done!
@jooray commented on GitHub (Nov 14, 2024):
I would suggest you being more kind. Ollama is an open source project, they are not working for you. Feel free to offer a bounty to implement this, or create a pull request.
I would really like to see this implemented, but that does not mean I have to be mean to authors of a software I get for free. And being an ass in comments (which many of you are here) is not very motivating for developers either.
@ParthSareen commented on GitHub (Dec 9, 2024):
Hey everyone! Sorry for the delay and no updates here - going to be picking this up soon and hopefully getting it in early Jan. There's been a ton of changes on the API even more so coming on the inference engine layer so just need to be a bit careful as it would be an API addition but it is something we want to support!
@ParthSareen commented on GitHub (Dec 13, 2024):
Hey folks would like to get your thoughts:
Would you care if it was logits vs logprobs? Would you prefer one over the other? If so why?
Working on designing the API and getting the functionality right along with that :) Appreciate your patience!
@josiahbryan commented on GitHub (Dec 13, 2024):
Personally would prefer log probs, just because all my tooling is setup for
that and used to thinking in logorobs haha
On Thu, Dec 12, 2024, 7:00 PM Parth Sareen @.***> wrote:
@martinkozle commented on GitHub (Dec 13, 2024):
But if you want to calculate the probability of the LLM generating "yes" or "no" for example you would either have to use constrained generation with logprobs, where only those 2 tokens will be non 0. Or you can directly use logits and do the constraining yourself.
@mommi84 commented on GitHub (Dec 13, 2024):
Definitely logprobs so that it doesn't deviate from the OpenAI standards (see examples here) and the following can be supported:
@ParthSareen commented on GitHub (Dec 13, 2024):
Thanks @josiahbryan, @martinkozle, @mommi84. I also think it makes more sense to have logprobs for now and then happy to re-evaluate when doing the new engine. Have some fun plans for sampling :)
@Elimane0800 commented on GitHub (Dec 15, 2024):
Personally I prefer logits because we can obtain ourselves a kind of logprobs if we have logits. Plus when trying tasks such as distillation or uncertainty observation, logits are more interesting to have. So please add logits 🙏, we can do the logprobs calculations by ourselves
@Elimane0800 commented on GitHub (Dec 15, 2024):
To be more precise : Logprobs can be derived directly from logits through a softmax followed by a logarithmic transformation. This kind of operation is neither the most difficult nor the most time consuming. It's easy to get logprobs from logits but not that easy to get logits from logprobs except if you add a scale constant to get an approximation of the logits value.
@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024):
I disagree with @Elimane0800 , to get logprobs from logits, one need to compute logsoftmax on logits of the complete vocabulary, which is an expensive operation and is already done in the model for the inference of the next token.
Moreover, it is often only necessary to get the N most probable tokens (with N of the order of a few dozen). In this case, logprobs (or probs) are relevant, logits are not.
@Elimane0800 commented on GitHub (Dec 20, 2024):
For having already tried to calculate logprobs from logits with cpu i can say that it is definitely not the type of task that will run during 24h. So if the only matter is a question of time and cost of the operation I can tell that it is nor the most time consuming neither the most expensive operation.
@BenjaminMarechalEVITECH commented on GitHub (Dec 20, 2024):
There are two use cases:
I think the first use case is the most requested today. To answer it effectively, the API has no choice but to give the logprobs (or probs) of the N most probable tokens. Indeed, if the API only provides the logits, then it must provide them for the entire vocabulary if we want to deduce the logprobs. With vocabulary sizes sometimes approaching 100k, this overloads the API's JSON response enormously.
@ParthSareen commented on GitHub (Jan 3, 2025):
Hey folks, thanks for your patience! Going to be doing something like top k log probs for the API. Not saying no to logits, but given the comments, API design, and system constraints it just makes sense to do this first.
There's also some adjacent work going on right now as well in which there's some API refactoring/design to be done. Upon it's completion this will be one of the first things that go out :)
@OriginalGoku commented on GitHub (Jan 25, 2025):
any timeframe for this update?
@codelion commented on GitHub (Jan 25, 2025):
It may take a while to get it here since they rely on llama.cpp underneath. You can try an alternative like optillm - https://github.com/codelion/optillm
@ClaudiuCreanga commented on GitHub (Jan 30, 2025):
Seems like this one is an openai tooling, while we're interested in other models.
@ParthSareen commented on GitHub (Jan 30, 2025):
Hey everyone - sorry for the delay, making my rounds right now.
It is not due to relying on llama.cpp, I did have a branch working back early in Jan. As you may know we've been working on a new go engine. All of us have been pretty heads down on that for the last month. There's going to be some bifurcation in running the new engine vs. current, which means that new API features would need to have parity on both engines. In order to do that, just need to make sure the behavior between the old and new engines are the same for the logprobs endpoint as well as tokenize/detokenize.
I've also been mainly working on our new sampling interfaces and the logprobs feature has been top of mind as well as I build it. I do hope to get to it soon I know it's super important to you all! The new engine is going to bring a lot of stability and maintainability throughout so we can support these kind of features much faster in the future.
@codelion commented on GitHub (Jan 30, 2025):
This uses the same format and API as OpenAI but works for any model from hugging face. You can use it to build datasets for distillation like this - https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits
@chaoyupeng commented on GitHub (Feb 18, 2025):
Hi, Ollama team, any updates on the logprobs functionality updates?
@BruceMacD commented on GitHub (Feb 20, 2025):
I've taken over implementation of this, expect some progress soon.
@BruceMacD commented on GitHub (Feb 27, 2025):
Update for those interested:
I've opened some pull requests that refactor the model runners to make it possible to get information such as logprobs:
https://github.com/ollama/ollama/pull/9282
Once that gets in I'll move forward with returning the values from the Ollama server.
@SabaPivot commented on GitHub (Mar 5, 2025):
Cool! This is really cool!
@SeriousJ55 commented on GitHub (Mar 19, 2025):
Hello! Do you know when this feature will be implemented? Pull request #9282 seems to be pending.
And thanks to the people in the dev team for their amazing work!
@BruceMacD commented on GitHub (Mar 19, 2025):
I'm still working on this at the same time as a few other things, but I haven't forgotten about it. Updated the #9282 pull request, so hopefully that one gets in soon.
@K0IN commented on GitHub (Apr 7, 2025):
Hei, i know this might be the wrong thread, and it is surly somewhere documented, but why did ollama switch from the llama.cpp server to a custom implementation in the first place?
@aakash232 commented on GitHub (Apr 9, 2025):
Hello, Any updates on this feature?
@qzhou711 commented on GitHub (Apr 9, 2025):
Thank you very much for the efforts of the developers. Is there any update to this feature? This is a very important feature.
@BruceMacD commented on GitHub (Apr 14, 2025):
Still working on it! I've take a diversion to work on some other stuff but more steps towards there are still in my short-term plans.
@brunodifranco commented on GitHub (Apr 21, 2025):
Hello! Any estimate on when the feature will be completed?
@Proteusiq commented on GitHub (Apr 28, 2025):
How can we help?
@CodeHatchling commented on GitHub (May 1, 2025):
Please consider merging in pull request #9282 that implements this very needed feature.
@BarryKeee commented on GitHub (Jun 26, 2025):
Any update on this? This feature is very much needed!
@enochlev commented on GitHub (Jun 30, 2025):
May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry.
Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAI), and we are limited to only using vLLM, which is quite GPU hungry.
Again, we understand this contribution comes from your free time, and it is much appreciated.
@SharmaM-dev commented on GitHub (Jun 30, 2025):
[like] Mridul Sharma reacted to your message:
From: enochlev @.>
Sent: Monday, June 30, 2025 5:51:23 PM
To: ollama/ollama @.>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)
[https://avatars.githubusercontent.com/u/47466848?s=20&v=4]enochlev left a comment (ollama/ollama#2415)https://github.com/ollama/ollama/issues/2415#issuecomment-3020176668
May I create a statement of motivation. The completion of this feature is critical for a wide range of research. Enabling Ollama to expose raw logits allows researchers to extract high-quality supervision signals from large models like LLaMA 70B, even with limited hardware. This supports efficient knowledge distillation, making it possible to train smaller models with soft targets on modest GPUs like a single L40s or a 24GB gpus which I argue is the hardware limited to 95% of the researchers. Unlocking this capability is a major contribution to the LLM research industry.
Most paid hosting LLMs prohibit the extraction of logits (all except for OPENAIhttps://docs.litellm.ai/docs/completion/input), and we are limited to only using vLLM, which is quite GPU hungry.
Again, we understand this contribution comes from your free time, and it is much appreciated.
—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-3020176668, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUVU4NYYI4LXICDOS433GF2JXAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRQGE3TMNRWHA.
You are receiving this because you commented.Message ID: @.***>
@baptistejamin commented on GitHub (Jul 1, 2025):
I can be used for other things besides research. At Crisp, we utilize log-probs with fine-tuned models for binary classification on highly complex queries.
It can be super helpful.
You can then have the log prob for SPAM, and use thresholds to classify.
@jaylinwylie commented on GitHub (Jul 15, 2025):
Looking like we are waiting on the pull request to be approved. Unless theres a branch/fork we can experiment with in the meantime?
@unacceptable commented on GitHub (Jul 16, 2025):
I am surprised that @jmorganca or @rick-github haven't closed this out yet saying to use a proxy. That's what they did in #1053 and #8573 for auth and token usage.
@rick-github commented on GitHub (Jul 16, 2025):
A proxy does not have access to logits. See here to learn more about logits.
@kjam commented on GitHub (Aug 12, 2025):
Commenting to also say this is useful for implementing privacy and security controls, such as differential privacy predictions and regularization when managing adversarial input. Would be nice to merge https://github.com/ollama/ollama/pull/9282
@codelion commented on GitHub (Aug 12, 2025):
Depends on how it is implemented, OptiLLM is also a proxy but has an inbuilt local inference server and supports logits - https://github.com/codelion/optillm/issues/182
@Tritonio commented on GitHub (Aug 12, 2025):
Is optiLLM able to get log-probs from Ollama though? I may be wrong but from a cursory look at the code it looks like it uses torch internally to get the logits, so if that is the case it's not acting as a proxy in front of ollama when it does so.
@codelion commented on GitHub (Aug 12, 2025):
No it is not through Ollama, you do not need ollama you can directly do the inference in OptiLLM with the in-built server which provides full logits via the standard OpenAI compatible API.
@rick-github commented on GitHub (Aug 12, 2025):
The issue is about getting logits from ollama, not optillm. Please don't distract from the issue.
@codelion commented on GitHub (Aug 12, 2025):
OptiLLM is just an alternative, this issue has been open for over 18 months, if ollama wanted to implement it, they would have done it by now.
@rick-github commented on GitHub (Aug 12, 2025):
Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.
@SharmaM-dev commented on GitHub (Aug 12, 2025):
[like] Mridul Sharma reacted to your message:
From: frob @.>
Sent: Tuesday, August 12, 2025 9:15:34 AM
To: ollama/ollama @.>
Cc: Mridul Sharma @.>; Comment @.>
Subject: Re: [ollama/ollama] Provide logits or logprobs in the API (Issue #2415)
[https://avatars.githubusercontent.com/u/14946854?s=20&v=4]rick-github left a comment (ollama/ollama#2415)https://github.com/ollama/ollama/issues/2415#issuecomment-3178468110
Feel free to use optillm. Others would like to get logits from ollama. If you want to discuss alternatives, open a new issue. This issue is for getting logits from ollama.
—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2415#issuecomment-3178468110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6MENUX3YFJQSFJMEUUJLJL3NGWDNAVCNFSM6AAAAABDANSBIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCNZYGQ3DQMJRGA.
You are receiving this because you commented.Message ID: @.***>
@VKrishna04 commented on GitHub (Aug 28, 2025):
yes please add it
@CodeHatchling commented on GitHub (Sep 7, 2025):
I figured I'd throw in a couple of the many possible use cases for a feature like this.
Suppose you wanted the LLM to perform in a multiple choice type situation, or any other scenario where you want the model to determine the best-fitting response from a finite selection of options. The cleanest way to do this would be to evaluate the probability of each option and select the highest scoring one. This eliminates the need to handle unexpected responses from the model.
Another case is assisted writing, where the top N continuations are offered instead of the usual single continuation, such as with my tokenscape project on github.
Cheers!
@martinkozle commented on GitHub (Sep 9, 2025):
Ollama does have structured outputs which you can use for this use-case.
https://github.com/ollama/ollama/blob/main/docs/openai.md#structured-outputs
https://ollama.com/blog/structured-outputs
In the Pydantic model use an enum with the valid options.
Not exactly what you said and doesn't excuse the lack of logits feature, but it may be useful if you need it for this specific case.
@Abdulrahman392011 commented on GitHub (Sep 21, 2025):
hey guys, so after this long read. I can't really tell when i can expect this to be available.
I also want to say that ollama already provide the logprobs for openai-python compatibility layer. so I don't really understand what's the hold up. logically speaking it should be almost copy and paste with some modifications. can someone please correct me.
@rick-github commented on GitHub (Sep 21, 2025):
Ollama does not make logprobs available, via the ollama API or the OpenAI compatible API.
@Abdulrahman392011 commented on GitHub (Sep 21, 2025):
yes you're right i mistakenly saw the logprobs in the
https://docs.ollama.com/openai
but it was unchecked. sorry for the confusion.
@Abdulrahman392011 commented on GitHub (Sep 21, 2025):
so what is needed to do this. why is it not implemented yet. I know that ollama runs a private instance of llama.cpp and llama.cpp does provide logprobs. so in my mind it's more like a command that just need modification when llama.cpp instance is initiated.
I am here to learn, so correct me please.
@rick-github commented on GitHub (Sep 21, 2025):
This issue was opened before llama.cpp provided OpenAI compatible logprobs. In the meantime, ollama has migrated away from llama.cpp as the primary backend. Work to support logprobs needs to be done on the new ollama engine. The main developers are busy with other tasks.
@baptistejamin commented on GitHub (Nov 1, 2025):
I just relesed this PR containing logprogs: https://github.com/ollama/ollama/pull/12899
You can try with:
@tobiaswuerth commented on GitHub (Nov 6, 2025):
+1
@rick-github commented on GitHub (Nov 13, 2025):
https://github.com/ollama/ollama/releases/tag/v0.12.11
@jmorganca commented on GitHub (Nov 13, 2025):
Wanted to say a huge thanks to @baptistejamin for the PR that got this in! And thank you to @BruceMacD who did some original work around this, @jessegross for the reviews and @ParthSareen for some fit and finish on the feature 🎉
Thanks for closing this @rick-github 😊
@neuhaus commented on GitHub (Nov 24, 2025):
I came across this new Ollama API feature while wondering how to implement the function described in the paper
"LLMs can hide text in other text of the same length" (Norelli & Bronstein, 2024/2025)" using the API.
The logprobs and top_logprobs endpoints return only the top N most likely tokens. I believe the API does not provide the full probability distribution over the entire vocabulary, nor does it allow efficient querying of a specific token's rank if it falls outside the top N.
I was wondering if this missing functionality could be added to the API.
@codelion commented on GitHub (Nov 24, 2025):
You can use OptiLLM for it if you want, it provides the full API - https://github.com/algorithmicsuperintelligence/optillm/issues/182
@rick-github commented on GitHub (Nov 24, 2025):
To avoid confusion, OptiLLM is an alternative to ollama, it does not provide the full probability distribution for ollama models.
@Bottlecap202 commented on GitHub (Nov 24, 2025):
Integrate "memlayer" locally
@Bottlecap202 commented on GitHub (Nov 24, 2025):
And use a coral edge TPU, they are public now.
@baptistejamin commented on GitHub (Nov 24, 2025):
Just to better understand. Would you like to build a product/inference system in production from this Paper?
IMO, what you need is something low-level API, such as LLama CPP, which is made for this.
Ollama CPP philosophy is to be similar to OpenAI API, and be easily plug & play
@Bottlecap202 commented on GitHub (Nov 24, 2025):
Kobold cpp.
@Bottlecap202 commented on GitHub (Nov 24, 2025):
Build around llama.cpp, BUT EASIER TO USE.
@Bottlecap202 commented on GitHub (Nov 24, 2025):
AND it has a community based horde. For all ages.
@Bottlecap202 commented on GitHub (Nov 24, 2025):
Kobold lite
On Mon, Nov 24, 2025, 8:08 AM Baptiste Jamin @.***>
wrote: