mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 16:11:34 -05:00
Closed
opened 2026-05-03 16:45:39 -05:00 by GiteaMirror
·
17 comments
No Branch/Tag Specified
main
dhiltgen/ci
parth-launch-plan-gating
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
feature request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#64251
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ParisNeo on GitHub (Apr 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3582
Originally assigned to: @ParthSareen on GitHub.
What are you trying to do?
I would like to propose the addition of tokenize and detokenize endpoints to the Ollama server. This feature is crucial for the Ollama client interfaces (such as lollms) to effectively prepare prompts and accurately estimate the number of tokens for the LLMs. Currently, the client uses tiktoken for tokenization, which is not optimal since the token distribution depends on the model. While this can work with chatgpt compatible models, it may fail to correctly estimate the number of tokens, leading to suboptimal token computing and, in some cases, errors when the number of requested tokens exceeds the capacity of the LLM.
How should we solve this?
Introduce two new endpoints, one for tokenization and another for detokenization, to the Ollama server:
Tokenize Endpoint:
Detokenize Endpoint:
These endpoints should return the right tokens or text depending on the model currently in use..
The tokenization endpoint should provide accurate token counting tailored to the specific LLM being used. This will ensure optimal token computing and help avoid potential errors caused by exceeding the capacity of the LLM.
What is the impact of not solving this?
Without these endpoints, users might have to continue relying on inefficient or suboptimal solutions for tokenizing and detokenizing text data.
Anything else?
Include documentation and examples demonstrating how to use these new functionalities effectively. Providing comprehensive guidance will help users quickly adopt these features and enhance the overall user experience.
@chigkim commented on GitHub (May 2, 2024):
Related issues to keep an eye on: https://github.com/ollama/ollama/issues/1716
@rohitgr7 commented on GitHub (Mar 25, 2025):
hey guys! any update on this?
@ParthSareen commented on GitHub (Mar 25, 2025):
Hey @rohitgr7! Haven't forgotten about this but with the new engine work we're still figuring out what the support would look like with new engine and old engine as some of this is dependent on model loading. Will keep you all posted!
@raffaeler commented on GitHub (Jun 20, 2025):
I see 3 different unreviewed PRs about this.
Can someone clarify if this is going to happen or not?
I don't get the reasons behind the blocks.
Thanks
@icedmoca commented on GitHub (Jun 26, 2025):
Surprised this still isn’t implemented given how many issues and PRs have piled up around it. Tokenization isn’t optional anymore — it’s critical for logit biasing, context estimation, and any serious token-level intervention (e.g., memory, contradiction suppression).
Relying on HF tokenizers is a hack — GGUF models often diverge, and without a native /tokenize, we can’t trust logit-level control to be accurate.
Ollama handles generation, so why not expose the tokenizer driving it? This should’ve shipped a long time ago.
@ParthSareen commented on GitHub (Jun 28, 2025):
Hey @icedmoca, @raffaeler, I empathize and agree that we should have this but it's a nontrivial implementation till there's more cleared up.
I had recently taken another look at my PR: https://github.com/ollama/ollama/pull/8106
There are a few current issues which we just need more info on in order to have a good experience for the upcoming years.
There's some other stuff in the works too which makes this a bit more complicated. Will keep you all posted but will continue tracking in my PR. Going to close this one out for now.
@raffaeler commented on GitHub (Jun 29, 2025):
@ParthSareen I understand and thank you and the other contributors for all the efforts in Ollama.
I am asking what is coming because I already had to migrate my talk demos to python transformers instead of Ollama because of this missing feature.
As others said, token counting is one of the many vital features.
Thanks again.
@icedmoca commented on GitHub (Jul 24, 2025):
My workaround for my project made me have to use vector-based semantic similarity and structured fact representation.. 🥀
@raffaeler commented on GitHub (Jul 24, 2025):
Could you please elaborate a bit more?
If you can´t estimate the tokens size, you are forced to continuously try to tokenize to see if there is an error or not.
Since my algorithm to chunk the documents is quite complex, this will cause it to take a very (too) long time.
Because of this lacking feature, I had to migrate to the Hugging Face libraries to host the model.
Does your workaround solve this issue?
@icedmoca commented on GitHub (Jul 24, 2025):
Yes, my workaround avoids token estimation entirely by operating at a higher semantic level. Instead of relying on tokenizer length constraints, I parse inputs into structured fact triplets and group them by semantic similarity using embedding vectors (like en_core_web_lg or MiniLM). This allows chunking and context window management to be driven by meaning rather than raw token count.
It’s not perfect for token-level logit biasing, but for document chunking, summarization, and reasoning, it avoids the need for a tokenizer altogether — and works with Ollama today. Let me know if you want a demo or outline.
You're absolutely right about the token estimation issue—especially when dealing with semantically dense or volatile input. That’s precisely why I moved away from dynamic token counting altogether and implemented a hybrid approach in MeRNSTA.
My system uses a dual-layer memory design:
Structured Fact Representation
Inputs are parsed into normalized (subject, predicate, object) triplets and stored in SQLite under the enhanced_facts table. Each entry is augmented with:
Contradiction and volatility flags
Confidence scores and change history
Temporal, session, and user-profile metadata
Optional vector embeddings for semantic operations
Contradictions are tracked via contradiction_records, enabling dynamic volatility scoring and automated contradiction resolution.
Vector-Based Semantic Similarity
Rather than attempting to tokenize large documents repeatedly (which as you've pointed out becomes untenable), I use sentence-transformers (MiniLM) to embed fact-level assertions at ingestion time. During recall, the system performs vector similarity search to surface contextually relevant memory without ever re-tokenizing the full memory base. This keeps latency predictable and scales much better under evolving memory state.
MeRNSTA avoids real-time token estimation altogether by decomposing input into discrete, vectorized facts. Only the semantically relevant subset is ever reassembled for LLM prompts. That design decision made token overflows a non-issue—even when working with complex contextual recall or highly entropic sessions.
I’ve only published portions of the system publicly so far.. like memory schemas, contradiction handling, and partial semantic indexing logic, because the full orchestration engine includes self-evolving code, real-time memory reinforcement, and internal scaffolding I’m not ready to open-source until I finalize the broader cognitive loop.
@raffaeler commented on GitHub (Jul 25, 2025):
This strategy doesn't work in my case. Either the upper limit is very high (Qwen3 for example), or you need to iterate until you fit your threshold.
In my research (and on papers) I found that aggregation by vector similarity is not the best way to create the chunks. This is a bad strategy in many use-cases like the legal one where you have many phrases that are relevant for a context but they express different concepts (depositions for example). In other use-cases I found the same problem with quotes.
My analysis starts from the syntax tree of the original document (that must have been accurately filtered in the previous step). This is where I can visualize clusters of concepts inside the documents. And when I see well-separated clusters, I know the result is good.
@js402 commented on GitHub (Jul 25, 2025):
Hey Raffaele Rialdi,
I know this is off-topic, but I'm super interested in your research.
I would be very grateful if you could share where to find your papers.
@raffaeler commented on GitHub (Jul 25, 2025):
Unfortunately the papers are very generic on this topic.
I explain in detail what I am doing in the talks I am giving on the topic, but there are no published videos (yet).
If I find some time, I may decide to publish something, but do not expect shortly as I am hard working on two large projects now.
Sorry
@icedmoca commented on GitHub (Jul 28, 2025):
It makes sense why vector similarity wouldn't cut it for your use case. Yes, semantic clustering breaks down when the input spans multiple dense but semantically divergent regions quotes, depositions, overlapping testimony, etc. In those contexts, syntax trees give you much tighter control over conceptual boundaries. Your use of syntactic clustering is right for preserving document logic and ensuring legal/technical fidelity. For MeRNSTA, I had different constraints: I needed a system that could reason across evolving conversational facts, detect contradictions over time, and maintain stability without ever re-tokenizing huge memory bases. So instead of using token counts or syntax trees, I chunk based on concept recurrence and contradiction volatility. Works well for dynamic memory systems, but yeah definitely not for high-precision document ingestion like yours. Would love to see your syntax-based clustering pipeline if you ever publish it especially how you're managing ambiguity and cross-cluster references. Also agree that if Ollama ever exposes /tokenize, the hybrid approach is probably ideal: syntax or semantic chunking first, then real token budget trimming as a final step. That s the missing piece for a lot of us.
Also @js402 I would like to chat about your work if possible, I'm also working on an intent engine utilizing blockchain architecture with ai agents.
@icedmoca commented on GitHub (Jul 28, 2025):
@js402 @raffaeler @ParisNeo @ParthSareen
Full Implementation of
/tokenizeand/detokenizeEndpoints for OllamaHi all I went ahead and implemented the missing
/tokenizeand/detokenizeendpoints for Ollama and validated them usingmistral:latest. This directly resolves the original request in #3582 and related threads.What I Built
The Ollama server currently exposes no way to perform model-accurate tokenization or detokenization.
This creates major problems for:
Developers trying to interoperate with GGUF or non-HF models Most of us were forced to hack around this using
tiktoken, which doesn't match GGUF or model-specific BPEs.What I Added
I used the existing
LlamaServerinterface which already contains internalTokenize()andDetokenize()methods. I exposed them via HTTP endpoints and integrated with the scheduler.Endpoints:
POST /api/tokenizeRequest:{ "model": "mistral:latest", "content": "Hello, world!" }Response:{ "tokens": [23325, 29493, 2294, 29576] }POST /api/detokenizeRequest:{ "model": "mistral:latest", "tokens": [23325, 29493, 2294, 29576] }Response:
{ "content": " Hello, world!" }✔ Fully model-aligned (works for GGUF and HF)
✔ Uses the same tokenization logic used in generation
✔ Returns timings and model info
✔ Full round-trip verification
Proof
⏱ Live Server Response
[GIN] 2025/07/28 - 16:28:25 | 200 | 3.33s | 127.0.0.1 | POST /api/tokenize[GIN] 2025/07/28 - 16:28:41 | 200 | 5.40ms | 127.0.0.1 | POST /api/detokenizeCURL Test
Tokenize
input:
bash curl http://localhost:11434/api/tokenize \ -H "Content-Type: application/json" \ -d '{ "model": "mistral:latest", "content": "Hello, world!" }'output:
{ "model": "mistral:latest", "tokens": [23325, 29493, 2294, 29576], "total_duration": 3333091020, "load_duration": 3332916624 }Detokenize
input:
bash curl http://localhost:11434/api/detokenize \ -H "Content-Type: application/json" \ -d '{ "model": "mistral:latest", "tokens": [23325, 29493, 2294, 29576] }'output:
{ "model": "mistral:latest", "content": " Hello, world!", "total_duration": 5376833, "load_duration": 5371033 }The detokenized output matches exactly, including the expected leading space!
Changes
api/types.go: AddedTokenizeRequest,TokenizeResponse,DetokenizeRequest,DetokenizeResponse-server/routes.go: AddedTokenizeHandler,DetokenizeHandler, registered both routes -api/client.go: AddedTokenize()andDetokenize()methods to the client -api/examples/tokenize/main.go: Round-trip demo -integration/api_test.go: Integration test for round-trip correctness -docs/api.md: Documented new endpoints - Uses:scheduleRunner()to access models, supportskeep_alive, consistent with other endpoints.View what I changed here:
aa855f2b15Also @ParthSareen I’ve implemented full /tokenize and /detokenize endpoints with support for media_type, keep_alive, and a modular TokenizerAdapter interface to future-proof for other engines and modalities. I validated round-trip integrity across a wide range of stress cases (multilingual, emojis, fuzzed Unicode, edge whitespace, markdown/math, RTL), and keep-alive drastically reduces cold start overhead. While this doesn’t yet handle multimodal input like images, the adapter pattern was explicitly designed to support it cleanly. I’d appreciate your review or suggestions on any blockers I may have missed.
@raffaeler commented on GitHub (Jul 29, 2025):
@icedmoca Exactly. There is certainly more than one winning strategy to chunk documents. This mostly depends on the document type and the use-case. Measuring them is the best way to decide what is better.
I'll probably publish my work once it is polished and tested across multiple scenarios, but I have to find the time.
About your strategy, I am not familiar with MeRNSTA or I can find anything with that precise name. What is that?
With regards to your new implementation, great work but as I was mentioning above, there are already multiple open PRs and none of them got reviewed or approved. I don´t know the reason, but an "official" version is needed before this feature can actively be used and promoted.
@icedmoca commented on GitHub (Aug 2, 2025):
Hey @raffaeler ,i agree on the PR backlog and needing an official version first and hopefully something similar is implemented..
As for MeRNSTA — it’s a custom neuro-symbolic "AGI" framework I’ve been developing that combines a contradiction-resolving memory graph, causal + temporal fact tracking, multi-agent debate/reflection layers, and recursive self-evolution (it mutates and tests its own agents).
I used the tokenizer/detokenizer endpoints to create an adaptive encoder that rewrites prompts based on past memory and contradiction history. And it actually works and doesn't hallucinate!! It’s crazy... I MIGHT open source once stable, but if you’re curious, happy to DM a breakdown or benchmark snippets. The mernsta repo on my profile is outdated but does reflect aspects of it. It's truly crazy what I've been able to accomplish with vibe planning and some vibe coding..
Edit: Also I think this might interest you, given you're looking for something to chunk large documents.