mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 08:30:05 -05:00
Closed
opened 2026-04-28 09:38:11 -05:00 by GiteaMirror
·
47 comments
No Branch/Tag Specified
main
hoyyeva/anthropic-local-image-path
dhiltgen/ci
dhiltgen/llama-runner
parth-remove-claude-desktop-launch
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#48830
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @moyix on GitHub (Apr 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3759
What is the issue?
I'm using
llama3:70bthrough the OpenAI-compatible endpoint. When generating, I am getting outputs like this:This is probably related to https://github.com/vllm-project/vllm/issues/4180 ? There is also an issue/PR on the LLaMA 3 HuggingFace repo: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/4
But it's a bit confusing since
<|eot_id|>is already included in the stop sequences:Is there some other config param that needs to be updated?
OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
0.1.32
@binaryc0de commented on GitHub (Apr 19, 2024):
Noticing the same behavior here and when using the langchain package with ollama often once prompted the model doesn't stop generating.
@JasonXiao89 commented on GitHub (Apr 19, 2024):
Same issue using llama3:latest 71a106a91016
@olinorwell commented on GitHub (Apr 20, 2024):
I had the same issue and got around it by adding the stop token to the request the front-end I am using (LibreChat) was making to Ollama's OpenAI compatible API end-point.
I'm sure a more permanent solution will arrive but for now that does the trick.
(Note: the Elephant in the room of course is that the stop token is in the model file as shown above - but that setting appears to be ignored when using the OpenAI compatible end-point. Perhaps that is fixed to OpenAI's traditional stop tokens? and needs my above solution to get around the limitation.)
@taozhiyuai commented on GitHub (Apr 20, 2024):
my model file works fine.
Modelfile generated by "ollama show"
To build a new Modelfile based on this one, replace the FROM line with:
FROM llama3:8b-instruct-fp16
FROM /Users/taozhiyu/.ollama/models/blobs/sha256-a4bbea838ebde985f2f99d710c849219979b9608e44e1c3c46416b5fbff72d64
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop ""<|reserved_special_token""
@telehan commented on GitHub (Apr 20, 2024):
create from gguf 70b q4, it's the same problem while
ollama runthis gguf download from https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF
@taozhiyuai commented on GitHub (Apr 20, 2024):
try my model file. your file is wrong, which maybe import from gguf
@Madd0g commented on GitHub (Apr 20, 2024):
happens to me too, macos using 0.1.32 and Meta-Llama-3-8B-Instruct-Q6_K.gguf
When I add
assistant\n, <|eot_id|>to stop tokens, it seems to work at first, but then begins stopping in the middle of sentences, I upgraded ollama just to see if it fixes the problem, so I removed the stop parameters from the client and I see it spamming <|eot_id|> in the middle of the sentence (like 30 of them in a row and then stopping)@telehan commented on GitHub (Apr 20, 2024):
this gguf version works fine, try it https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF/tree/main
the previous gguf 70b has the problem
@FutureGadget commented on GitHub (Apr 21, 2024):
Solved this manually by adding the
stopparameter, but I think this is a bug.@leotam commented on GitHub (Apr 21, 2024):
Only difference from the 70b-instruct is:
@jukofyork commented on GitHub (Apr 21, 2024):
https://github.com/ggerganov/llama.cpp/issues/6772
I edited my gguf to use the
<|eot_id|>token but it still prints it out? Usinggguf-dumpI can confirm I have made the change from the reddit thread but I don't understand why it prints<|eot_id|>? I've never had another gguf model print the EOS token defined in the gguf header so don't get what's special about this?So had to also add:
But does:
actually work as expected in Ollama and add
<|eot_id|>after the AI's response as required by the wrapped llama.cpp server:?
We really need some way to debug stuff like this in Ollama desperately as it seems model creators are currently competing for most confusingly complex prompt template possible 😞
@phalexo commented on GitHub (Apr 22, 2024):
Infinite loop here too.
FROM /opt/data/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct.Q5_K_S.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER num_ctx 8192
PARAMETER temperature 0
PARAMETER num_gpu 63
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
SYSTEM """You are an AI programming, planning assistant. You never refuse to answer questions or provide code.
Write a response that appropriately completes the request within the user message."""
Hello! It's nice to meet you. Is there something I can help you with, such as a programming problem or a question about a specific topic? Or would you like to
discuss a project idea you have in mind? I'm here to assist you in any way I can.assistant
Hello! It's nice to meet you. Is there something I can help you with, such as a programming problem or a question about a specific topic? Or would you like to
discuss a project idea you have in mind? I'm here to assist you in any way I can.assistant
Hello! It's nice to meet you. Is there something I can help you with, such as a programming problem or a question about a specific topic? Or would you like to
discuss a project idea you have in mind? I'm here to assist you in any way I can.assistant
Hello! It's nice to meet you. Is there something I can help you with, such as a programming problem or a question about a specific topic? Or would you like to
discuss a project idea you have in mind? I'm here to assist you in any way I can.assistant
Hello! It's nice to meet you. Is there something I can help you with, such as a programming problem or a question about a specific topic? Or would you like to
discuss a project idea you have in mind? I'm here to assist you in any way I can.assistant
Hello! It's nice to meet you. Is there something I can help you with, such as a programming problem or a question about a specific topic? Or would you like to
discuss a project idea you have in mind? I'm here to assist you in any way I can.assistant
@jukofyork commented on GitHub (Apr 22, 2024):
Check
tokenizer.ggml.eos_token_id = 128009.Seems to be working OK for me:
@phalexo commented on GitHub (Apr 23, 2024):
Ok, this quantized version works after ollama import.
FROM /opt/data/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF/Meta-Llama-3-70B-Instruct.Q5_K_M.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER num_ctx 8192
PARAMETER temperature 0.2
PARAMETER num_gpu 73
PARAMETER stop "<|eot_id|>"
PARAMETER stop '<|end_of_text|>'
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop '<|begin_of_text|>'
SYSTEM "You are a helpful AI which can plan, program, and test."
@romkage commented on GitHub (Apr 24, 2024):
I've got looping too. im testing with both llama3:8b and llama3:8b-instruct-fp16.
I have tried both models with the Modelfiles mentioned above, but still no luck.
This is with crewai, at the end of the first reply:
and it just goes on.
@richardgroves commented on GitHub (Apr 24, 2024):
I've been round the houses with this as above - eventually got it working with a
stopSequenceof["<|eot_id|>"]- tells the engine to stop asking for more responses when it sees that in the output stream of new data.Not sure if the Meta Llama 3 card at https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/#special-tokens-used-with-meta-llama-3 is wrong, or Ollama is using the model differently.
@kungfu-eric commented on GitHub (Apr 24, 2024):
The template was updated yesterday https://ollama.com/library/llama3:70b-instruct. The only change was:
Was your change different from this line that's always been in the file?:
@richardgroves commented on GitHub (Apr 24, 2024):
@kungfu-eric I think I had (still have) an older version:
ollama show --modelfile llama3:8bA newly pulled llama3 (latest) shows:
I'm getting extra issues as I'm working through modelfusion (https://github.com/vercel/modelfusion) and unformatted chat requests to /api/chat with no stop sequences specified work for Llama2 but not Llama3. Tracing through the modelfusion code to work out what is going on is sloooow. Quick hacks on the completion api code got Llama3 working by forcing the "<|eot_id|>" as a specified stop sequence.
@richardgroves commented on GitHub (Apr 25, 2024):
Further investigation has found the specific problem, but no clearer as to whether it is Ollama or modelfusion at fault.
So with llama3 this works:
curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama3","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}]}'and so does this:
curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama3","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}],"options":{"stop":["<|eot_id|>"]}}'but this doesn't:
curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama3","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}],"options":{"stop":[]}}'But with llama 2 all these work:
curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama2","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}]}'curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama2","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}],"options":{"stop":[]}}'curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama2","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}],"options":{"stop":["</s>"]}}'So the difference is that using Ollama with Llama 2 and specifying a
stopoption of[]works, but on Llama 3 it doesn't.Modelfusion 'chat' paths make it less easy to set the stop options, and they send an empty [], whereas the
completionmodels do allow setting of the stop options, which is what I'd got working in my earlier message.@reneleonhardt commented on GitHub (Apr 28, 2024):
@richardgroves thank you for analyzing this problem 👍
Would it be possible / feasible to fix it inside of ollama instead of requiring every user/application to specify different stop tokens for both models?
@nkeilar commented on GitHub (Apr 28, 2024):
Something is not right with the 70B model IMHO - I'm using the q4_K_M and the q4_0 version with crewai, and it diverges so much from the cloud version on Groq, that its not possible to use these models with crewai. AFAIK the model should more or less be the same, and I wouldn't have expected such poor performance compared with the same model apparently served by another provider. I wasted days trying to get good results with Ollama, but it just didn't happen, I thought I was going mad, so tried the cloud version of the same model on Groq and it just works. So either something is wrong, or there is significant model degradation - which I wouldn't expect with a Q4 version.
@jukofyork commented on GitHub (Apr 28, 2024):
There is some problem reported with the tokenizer so it could be that (assuming it's not the broken GGUF problem):
https://old.reddit.com/r/LocalLLaMA/comments/1cdmfoz/fyi_theres_some_bpe_tokenizer_issues_in_llamacpp/
@eracle commented on GitHub (Apr 28, 2024):
I fixed with this, but I am not really sure what I am doing since I don't know how ollama internals work.
@nkeilar commented on GitHub (Apr 29, 2024):
I just learned about the strange behaviour when exceeding context length. The change proposed in this thread (https://github.com/ollama/ollama/issues/3819) improved things, but still when running crewai ollama seems to get stuck in a loop generating something but never returning. So I have to force kill the process.
@richardgroves commented on GitHub (Apr 29, 2024):
There is the same problem with Phi-3 model on Ollama:
ollama pull phi3curl http://localhost:11434/api/chat -d '{"stream":true,"model":"phi3","messages":[{"role":"user","content":"What is 1+1?"}],"options":{"stop":[]}}'Note it will stop eventually when an empty content response is received. But a few marker tokens have been sent too.
Whereas:
curl http://localhost:11434/api/chat -d '{"stream":true,"model":"phi3","messages":[{"role":"user","content":"What is 1+1?"}]}'Stops without the extra tokens.
Hard to say it is a bug in Ollama, as
"options":{"stop":[]}is basically requesting it to not stop until an empty response is sent, but it appears that for older models (eg. mistral / llama2) it has worked to mean 'use the model file stop parameters'@danielgen commented on GitHub (Apr 29, 2024):
For me Llama3 works as expected in Ollama CLI.
However it does not work in CrewAi, not even specifying the same modelfile.
Not sure if Ollama is at fault here, might well be a langchain issue or something else.
Below is the modelfile:
@olinorwell commented on GitHub (Apr 29, 2024):
Agreed. In my case Llama3 was perfect when using the Ollama CLI. The issues were when other programs connected to Ollama via the OpenAI compatible interface.
@phalexo commented on GitHub (Apr 29, 2024):
I have llama3-70b-instruct Q5 working with gptpilot. Appears quite stable.
On Mon, Apr 29, 2024, 6:58 AM Oli Norwell @.***> wrote:
@richardgroves commented on GitHub (Apr 29, 2024):
@danielgen @olinorwell Are you able to trace the request sent to the Ollama server from those external tools to see if it is the same
"options":{"stop":[]}problem I've written about above, or some other issue?@boristopalov commented on GitHub (Apr 30, 2024):
It seems like an ollama issue, I have a program that hits the ollama API directly- doesn't use Langchain or any other wrappers and I was having this issue. Adding
PARAMETER stop "<|eot_id|>"fixed it for me but I now see the<|eot_id|>character at the end of each response which is annoying@sabaimran commented on GitHub (May 1, 2024):
I'm also experiencing this issue while routing to Ollama via the
openaichat completions python library. It streamed 30000 characters and emitted<|eot_id|><|start_header_id|><|end_header_id|>multiple times.Without knowing too much about the ollama internals, there may be an issue in the way the prompt template is being formatted in the requests? And like others have pointed out, stop words are not being honored.
@97k commented on GitHub (May 1, 2024):
I am also experiencing the same issue. This doesn't happen with ollama cli but I am not able to use APIs.
This doesn't happen with mistral.
I am using Langchain.chat_models.ollama.ChatOllama
@nickychung commented on GitHub (May 1, 2024):
Both the ollama CLI and ollama.chat resulted in a never-ending response. Changing the modelfile did not resolve the issue. However, instead of using ollama pull, I successfully addressed this problem by downloading the Llama3 GGUF from Hugging Face and running 'ollama create' with the modelfile provided there. This approach ultimately resolved the issue.
@vrijsinghani commented on GitHub (May 1, 2024):
@nickychung can you elaborate and specify the specific model and modelfile contents? I've tried a few of them and they all do not stop or generate gibberish so far.
@orangeswim commented on GitHub (May 1, 2024):
Okay so I had a similar issue today, this was the solution for me.
pip install gguf.After changing the token to the correct eos token, the model runs as expected.
@nickychung commented on GitHub (May 2, 2024):
Model: https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF
ModelFile:
@Sneakr commented on GitHub (May 4, 2024):
Loading the original llama3 instruct works fine, but creating a new model from a fine tuned llama3 has the continous non-stop generating issue still.
@phalexo commented on GitHub (May 4, 2024):
Only some quantized models have the issue. I am using a Q5_M model and it
terminates. I think some quantized models are not generating the stop token.
On Sat, May 4, 2024, 11:03 AM Sneakr @.***> wrote:
@Sneakr commented on GitHub (May 4, 2024):
I fine tuned my own model, works fine in inference mode after the tune, and converting to GGUF works fine with LM Studio, but when loaded into Ollama , it has the issue non-stop generating even with the prompt template defined for the llama3.
@VideoFX commented on GitHub (May 4, 2024):
Ive had the same issue.
Ive tried llama3 and llama3-gradient. Ive updated ollama, and the models.
Ive tried crewai, langchain, and openwebUI, they all behave similarly. Ive updated those to newest versions as well.
The model will run for what seems like forever, and eventually repeats itself in a loop, or talks gibberish. It has to be stopped manually.
@97k commented on GitHub (May 7, 2024):
Thank you! This worked
For any one else going through this, I will break down step by step!
Go through QuantFactory on HF, choose the quantised model you want
Download the gguf model, I personally prefer q5_K_M
Once downloaded, create a Modelfile (again thanks to @nickychung )
Create the model using ollama create
ollama lsthis will show you the modelCelebrate!


and it respects the EOS!
@richardgroves commented on GitHub (May 10, 2024):
The latest Ollama release appears to have fixed this problem for Llama3 and Phi3.
My
ollama --versionis nowollama version is 0.1.34curl http://localhost:11434/api/chat -d '{"stream":true,"model":"llama3","messages":[{"role":"system","content":"You are a helpful, respectful and honest assistant."},{"role":"user","content":"What is 1+1?"}],"options":{"stop":[]}}'Now stops properly, as does the same test with model: phi3.
So many Pull Requests merged recently that it is hard to find the exact change that fixed it.
@reneleonhardt commented on GitHub (May 11, 2024):
I'm glad this has been finally fixed!
Yeah, too many merges causing problems mixed with some trying to fix them later if you're lucky 😅
I wonder why the test suite doesn't catch tokens inside instruction model responses for different prompt templates and endpoints...
@joshuavial commented on GitHub (May 13, 2024):
I had the same problem, but the issue was an out of date ollama client - upgrading sorted things out.
@eevmanu commented on GitHub (May 13, 2024):
for any onlooker, if you're:
linux,updateollama recently as described here https://github.com/ollama/ollama/issues/3759#issuecomment-2104445225,don't forget to restart the service (
sudo systemctl restart ollama.service), ymmv but in my case started throwing memory errors, despite havingrestartinstructions9c76b30d72/docs/linux.md (L51-L52)@adryan-ai commented on GitHub (May 15, 2024):
upgrading ollama resolved this for me, from 0.1.32 to 0.1.37
@jmorganca commented on GitHub (Jun 25, 2024):
Great! Sorry for the issues and glad to see the newest versions fixed this for folks. Let me know if that's not the case and I can re-open