mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 00:22:43 -05:00
[GH-ISSUE #7288] embedding generation failed. wsarecv: An existing connection was forcibly closed by the remote host. #51143
Open
opened 2026-04-28 18:32:21 -05:00 by GiteaMirror
·
36 comments
No Branch/Tag Specified
main
hoyyeva/anthropic-local-image-path
dhiltgen/ci
dhiltgen/llama-runner
parth-remove-claude-desktop-launch
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#51143
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @viosay on GitHub (Oct 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7288
What is the issue?
embedding model
When I submit a single fragment, it responds normally, but when I submit multiple fragments, an exception occurs.
I encountered this error on different Windows systems as well.
This issue occurs in both versions 0.3.14 and 0.4.0-rc3. However, I also tested versions 0.3.13 and 0.3.10, and they work perfectly.
OS
Windows
GPU
No response
CPU
Intel
Ollama version
0.3.14~0.4.6
@rick-github commented on GitHub (Oct 21, 2024):
Which model are you using?
@viosay commented on GitHub (Oct 21, 2024):
I tried many embedding models, including 893379029/piccolo-large-zh-v2 and viosay/conan-embedding-v1, and they all have the same issue, although they worked perfectly fine before. However, a few models, like shaw/dmeta-embedding-zh, do not have this problem.
@rick-github commented on GitHub (Oct 21, 2024):
I am unable to replicate:
It might have something to with the client or the length of the inputs. Can you provide more context on your usage, or better yet, a script that demonstrates the problem.
@viosay commented on GitHub (Oct 22, 2024):
@rick-github You're right. Based on my tests, the issue is indeed related to the input length. When it exceeds a certain length, an error occurs.
It’s like the example below, where an error occurred.
@rick-github commented on GitHub (Oct 22, 2024):
viosay/conan-embedding-v1 has an embedding length of 1024 and your test text is 1905 bytes, so it's exceeding the window. The client should chunk the text to segments smaller than the embedding length otherwise the returned embeddings will be missing semantic content. However, ollama (or actually llama.cpp) should handle the situation more gracefully.
@viosay commented on GitHub (Oct 22, 2024):
Thank you for your response! The key issue is that this problem did not exist in versions prior to Ollama 0.3.14. I had been using various models without any issues before that.
@viosay commented on GitHub (Oct 22, 2024):
For example, the shaw/dmeta-embedding-zh model, which has an embedding length of 768, does not encounter this issue. Both the new and old versions do not have this issue.
@rick-github commented on GitHub (Oct 22, 2024):
ollama moved to a more recent llama.cpp snapshot for the granite model support (
f2890a4494) and presumably that has introduced some problems with embedding calls. I don't see any recent issues regarding that in the llama.cpp issue tracker, so this is not affecting too many users.Exceeding the length doesn't mean that all models will fail, some may be more resilient. I'll dig a bit more and file an issue with llama.cpp if this is the actual problem. In the meantime, you should adjust the text chunking anyway, as the embeddings will not contain all of the information in the original text.
@viosay commented on GitHub (Oct 22, 2024):
@rick-github Thank you very much! I will follow your advice. One more thing I noticed is that most of the models that encounter issues are those imported after being converted to GGUF using the
convert_hf_to_gguf.pyscript from llama.cpp. I'm not sure if this is the cause of the problem.@rick-github commented on GitHub (Oct 22, 2024):
Just to correct a mistake I made, viosay/conan-embedding-v1 has a limit of 512 tokens, and shaw/dmeta-embedding-zh a limit of 1024 tokens. The embedding length is the size of the generated embeddings.
@viosay commented on GitHub (Oct 22, 2024):
Yes, I understand the meaning of embedding length. Whether it’s 512 or 1024, they are both less than 1905. This is a puzzling issue, and as you mentioned, it seems to have arisen after Ollama updated llama.cpp. I'll continue testing and verifying the specific situation. Thank you!
@rick-github commented on GitHub (Oct 22, 2024):
Tokens are different to characters. A token is a sequence of characters, on average 2 or 3 characters in length. So a token length of 512 would handle 1024-1536 characters, and a token length of 1024 would handle 2048-3072 characters.
@viosay commented on GitHub (Oct 25, 2024):
I think I’ve figured out the issue; it seems that the

truncatedidn’t take effect.@rick-github commented on GitHub (Oct 25, 2024):
It does truncate, it's just the runner throws an
GGML_ASSERT(i01 >= 0 && i01 < ne01) failedexception and crashes when the number of tokens is close to the maximum allowed and the runner has been started with a context window greater than the actual supported value. If the model is loaded with the context size set to the actual supported context size, it works fine:@mokby commented on GitHub (Oct 25, 2024):
@viosay Hi, I met the same error, do you have any solution to solve it? I test many embeddings, but only
mxbai-embed-largeandnomic-embed-textworks fine.@viosay commented on GitHub (Oct 25, 2024):
There’s no good solution for now. Either roll back to version 0.3.13, or try setting the actual context size as Rick suggested above. However, there’s a risk of semantic loss in the returned embeddings. It might be best to first try splitting the text into chunks smaller than the embedding length on the client side, but due to differences in token calculation methods, your chunks may not match the model’s segmentation precisely. For example, when I use the jtokkit library from SpringAI for token calculation and segmentation, it oddly treats multiple consecutive dots as a single token. This results in the actual chunks being much larger than expected, which still causes the error.
@zydmtaichi commented on GitHub (Oct 29, 2024):
Hi @viosay and @rick-github ,
I met the same error and tried to reduce the length of chunk tokens sent to embedding api. However, it seems not working. I use model
milkey/gte:large-zh-f16and set embedding_dim to 1024 with chunk_token_size equals to 1200 in lightrag framework which access embed model via ollama. The 500 internal err is not changed even I modify the chunk_token_size to 400.@rick-github commented on GitHub (Oct 29, 2024):
It works if I set
num_ctxto 512. Perhaps the lightrag framework is adding extra tokens, or there is an issue wiithchunk_token_size.@mokby commented on GitHub (Oct 30, 2024):
Amazing! That works, thanks for your help!
@viosay commented on GitHub (Oct 30, 2024):
@mokby This is what I mentioned above about Rick's suggestion to set the actual context size. However, there is a risk of semantic loss in the returned embeddings. The engine might enforce input truncation, and the real reason has yet to be identified. The most likely cause is still issues with token segmentation and calculation.
@mokby commented on GitHub (Oct 30, 2024):
Yeah, that's may be a potential problem, can you share your solution if you can handle this issue? Many thanks
@viosay commented on GitHub (Nov 16, 2024):
ChatGPT pointed out that the issue lies in the llama_server startup command, where
ctx-sizewas set to 2048, but it should actually be 512. But compared to version 0.3.13, they are consistent.@viosay commented on GitHub (Nov 27, 2024):
Provide a reproduction along with the debug logs when



OLLAMA_DEBUG=1is enabled.And found that the content in the log output is inconsistent with the input text.
OpenAI's tokenizer Through calculation, it can be observed that the input tokens do not exceed the model's maximum supported token count of 512.
@viosay commented on GitHub (Nov 27, 2024):
@rick-github I hope the latest debugging logs I provide will help identify the issue. Thanks.



Additionally, after calculation, it was confirmed that the text in the previous example does not exceed the 512-token limit.
@rick-github commented on GitHub (Dec 23, 2024):
The tokenizer used by OpenaAI is different to the tokenizer used by conan-embedding-v1. You can see from your screenshots that all three OpenAI models return a different token count for the same text. The prompt that you are using with OpenAI is not quite the same as the one you provided (1893 characters vs 1905 characters) so we need to knock off a couple of tokens, but conan-embedding-v1 creates 571 tokens.
@rick-github commented on GitHub (Jan 15, 2025):
Just to summarize the content from above:
The problem is that the context length that ollama is using is longer than the context length that the embedding model supports. If
num_ctxis not supplied in the API call or the Modelfile, ollama will use a default context length of 2048. If this is longer than the context length of the library, a client can send a request longer than the model can accommodate and can cause the runner to crash.The models in the ollama library currently have the attributes in the table below. Models that are a crash risk with the default parameters are marked:
You can prevent these errors by setting
num_ctxin the API call (eg"options":{"num_ctx":512}), or modifying the model to specify the context length:Note that the reason the errors are occurring is because ollama is getting embeddings for text lengths greater than that supported by the model. This means that the text will be truncated and the embeddings will be losing semantic content. The chunk size of the embedding client should be less than
context_length.@rick-github commented on GitHub (Jan 15, 2025):
To dig a bit deeper: the root cause is a mis-calculation in the truncation logic. The prompt is truncated to
num_ctxat the entry point of the API, but further down the call tree BOS and EOS tokens are added, taking the input buffer to (say) 514 tokens rather than 512. There's more logic in the cache that tries to handle this but doesn't work when num_ctx >> context_length. Unfortunately, when the cache logic does kick in, it removes tokens from the start of the input, which is likely to impact the usefulness of the embedding.@viosay commented on GitHub (Feb 26, 2025):
Reconsidering this issue, when an exception occurs, the request input is completely normal, with characters properly segmented according to length, as shown in the example below. However, after reviewing the debug, it was found that spaces were added between each character in the output request content, which resulted in an increase in the number of tokens. The length exceeds the limit after adding spaces, causing an overflow and resulting in an error at 1026.
debug log :
What I want to know is why spaces are automatically added between each character.
https://github.com/ollama/ollama/issues/7288#issuecomment-2503273262 Actually, this issue with spaces has also been reflected in the previous replies.
@rick-github commented on GitHub (Feb 26, 2025):
The data in the log are characters, not tokens. The padding is a function of the tokenizer table in shaw/dmeta-embedding-zh. The tokenizer uses sentencepiece and the spaces are represented internally as the special character "▁". In the process of truncating the input to make it fit in the context buffer, the input is tokenized and the detokenized, and the latter step results in the padding. However, the number of tokens is the same. For example, tokenizing "府" (the first glyph from your test input) returns a value of 2424. De-tokenzing the value of 2424 returns the glyph sequence " 府". These are both the same as far as the tokenizer is concerned:
@viosay commented on GitHub (Feb 27, 2025):
@rick-github Thank you, I think I understand now.
@leodeslf commented on GitHub (Mar 24, 2025):
Just in case...
It happened to me. My specific problem was that the default value for
num_ctxinllamaindexexceeded that of the model I was trying to use.If you are here, you probably want to check out if there are conflicts between the defaults of whatever tool you are using and those of your specific model.
@rick-github commented on GitHub (Mar 24, 2025):
Now that the switch to 0.6 has happened and the new runner architecture looks a bit stable, I will look at creating a PR to fix this.
@wikty commented on GitHub (Apr 13, 2025):
I'm the maintainer of dmeta-embedding-zh, compatibility issue with ollama 0.6.x has been fixed, please re-download the model:
@ynott commented on GitHub (Apr 7, 2026):
Adding a data point for posterity (low priority)
Hit what looks like the same family of bug on Ollama v0.20.2 with
jeffh/intfloat-multilingual-e5-small:q8_0(BERT/XLM-R, GGUF Q8_0). Not asking for action — just leaving a record in case someone else lands here from a search.Symptom
POST /api/embed(and/api/embeddings) crashes the runner with:The runner is started with
--ollama-engine(new engine path).It's input-dependent, not "Japanese breaks it"
I wrote a small repro script(gist) that runs 17 single-string inputs against
/api/embed. Same model, same endpoint, no batching:So it's not "multibyte input", input length, or leading whitespace. Single-character katakana like
スandテwork fine, but the 2-characterテスandカナcrash. Single kanji and hiragana of any length I tested work. Inputs containing。or、always crash.Best guess: SentencePiece is producing certain merged-token IDs (e.g.
▁テス,▁カナ,▁。) that fall outside the valid embedding-table index range under the new engine, while single-character tokens or other merge paths stay within range. Notably,ollama run "テスト文章です。"works fine with the same model — only/api/embedcrashes — which suggests the issue is in the embedding-specific code path rather than the tokenizer in general.Same script against
bge-m3on the same Ollama instanceSo the issue is specific to this particular model + new engine combination, not the embedding code path as a whole.
Things that did NOT help
num_ctx: 512via Modelfile (the workaround that helped in #8431)truncate: false,keep_alive: 0,options.num_ctx: 512in the request bodyOLLAMA_NEW_ENGINE=falseenv var (no longer respected in v0.20.2 — runner still launches with--ollama-engine)/api/embeddingsendpoint instead of/api/embedWhat did help
Switching to
bge-m3(official Ollama library). Same endpoint, same Ollama version, same hardware, same exact inputs — clean embeddings every time, including batched requests.Environment
--ollama-enginerunnerjeffh/intfloat-multilingual-e5-small:q8_0(architecture: bert, n_ctx_train: 512, embedding length: 384, Q8_0)bge-m3#7288 since this looks like part of the same broader cluster of
/api/embed+GGML_ASSERT(i01 >= 0 && i01 < ne01)regressions reported here over multiple versions. Workaround for anyone hitting this from a search: try a different embedding model (e.g.bge-m3) before deeper debugging.@joaquinariasco-lab commented on GitHub (Apr 17, 2026):
Can you share a minimal reproducible example including: the exact “multiple fragments” payload you send (array vs concatenated string), the approximate token/character length of each fragment, the model + version used, and your truncate / context length settings, so we can determine whether the GGML_ASSERT failure is triggered by batching behavior or by exceeding the model’s embedding window?
@ynott commented on GitHub (Apr 23, 2026):
@joaquinariasco-lab Thank you for the follow-up.
Environment:
Below is the simplest command to reproduce the issue:
This fails on my side.
For comparison, this succeeds:
I also tested a multiple-fragments payload (failed):
I also tested the same inputs with the Ollama CLI:
Observed results:
Approximate lengths:
This does reproduce with a multiple-fragment payload, but also with very short single-string inputs. For this reason, this reproduction does not appear to be specific to batching, nor does it appear to be due to exceeding the embedding window.
I have uploaded the debug log here:
https://gist.github.com/ynott/8a9ad624e8aae8dfcb8221a343e4030b
Relevant details from the failing "テス" case: