mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 08:02:14 -05:00
Closed
opened 2026-05-04 20:07:30 -05:00 by GiteaMirror
·
15 comments
No Branch/Tag Specified
main
parth-mlx-decode-checkpoints
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#70038
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @liangstein on GitHub (Aug 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12014
Originally assigned to: @mxyng on GitHub.
What is the issue?
Each time when using qwen3-embedding-4b, there is error:
time=2025-08-21T14:35:05.168-04:00 level=INFO source=server.go:383 msg="starting runner" cmd="/usr/local/bin/ollama runner --model /zfspool/ollama_models/blobs/sha256-b60ae5ce2dd6a0b77f82cadf21def1f310a3e10cde380ad0081b07a9d416949d --port 43943"
time=2025-08-21T14:35:05.169-04:00 level=INFO source=server.go:488 msg="system memory" total="125.0 GiB" free="114.0 GiB" free_swap="4.0 GiB"
time=2025-08-21T14:35:05.169-04:00 level=INFO source=server.go:531 msg=offload library=cpu layers.requested=0 layers.model=37 layers.offload=0 layers.split=[] memory.available="[114.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.7 GiB" memory.required.partial="0 B" memory.required.kv="576.0 MiB" memory.required.allocations="[118.3 MiB]" memory.weights.total="4.0 GiB" memory.weights.repeating="3.6 GiB" memory.weights.nonrepeating="393.4 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB"
time=2025-08-21T14:35:05.177-04:00 level=INFO source=runner.go:864 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon PRO W7700, gfx1101 (0x1101), VMM: no, Wave Size: 32, ID: GPU-fab9c6b417b1ab5e
load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2025-08-21T14:35:07.265-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-08-21T14:35:07.266-04:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:43943"
time=2025-08-21T14:35:07.276-04:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:20 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon PRO W7700) - 15160 MiB free
time=2025-08-21T14:35:07.276-04:00 level=INFO source=server.go:1234 msg="waiting for llama runner to start responding"
time=2025-08-21T14:35:07.277-04:00 level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /zfspool/ollama_models/blobs/sha256-b60ae5ce2dd6a0b77f82cadf21def1f310a3e10cde380ad0081b07a9d416949d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B
llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding
llama_model_loader: - kv 4: general.size_label str = 4B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base
llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-...
llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme...
llama_model_loader: - kv 11: qwen3.block_count u32 = 36
llama_model_loader: - kv 12: qwen3.context_length u32 = 40960
llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560
llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 97
.....
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
llama_context: CPU output buffer size = 0.59 MiB
llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB
llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB
llama_context: ROCm0 compute buffer size = 694.64 MiB
llama_context: ROCm_Host compute buffer size = 17.01 MiB
llama_context: graph nodes = 1411
llama_context: graph splits = 472 (with bs=512), 73 (with bs=1)
time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1269 msg="llama runner started in 6.87 seconds"
time=2025-08-25T21:32:43.610-04:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1231 msg="waiting for llama runner to start responding"
time=2025-08-25T21:32:43.610-04:00 level=INFO source=server.go:1269 msg="llama runner started in 6.87 seconds"
//ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
[New LWP 3505330]
[New LWP 3505331]
[New LWP 3505332]
[New LWP 3505333]
[New LWP 3505334]
[New LWP 3505335]
[New LWP 3505336]
[New LWP 3505337]
[New LWP 3505338]
[New LWP 3505339]
[New LWP 3505342]
[New LWP 3505350]
[New LWP 3506719]
[New LWP 3506720]
[New LWP 3506721]
[New LWP 3506722]
[New LWP 3506723]
[New LWP 3506724]
[New LWP 3506725]
[New LWP 3506726]
Relevant log output
OS
No response
GPU
No response
CPU
No response
Ollama version
No response
@rick-github commented on GitHub (Aug 21, 2025):
There are no errors in this log. Please post the full log.
@cornfusing commented on GitHub (Aug 22, 2025):
I encountered the same issue using 0.11.5 and 0.11.6. With 0.11.4 it works without problems.
OS: Ubuntu 24.04 for Ollama; Windows 11 for python
CPU: Intel i9-13900K
Ollama version: 0.11.6
Docker logs at the end.
Using python:
Output:
Comparison nomic-embed-text:
Output:
Full docker log
``` time=2025-08-22T08:23:49.512Z level=INFO source=routes.go:1318 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-08-22T08:23:49.513Z level=INFO source=images.go:477 msg="total blobs: 20" time=2025-08-22T08:23:49.513Z level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2025-08-22T08:23:49.513Z level=INFO source=routes.go:1371 msg="Listening on [::]:11434 (version 0.11.6)" time=2025-08-22T08:23:49.514Z level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-08-22T08:23:49.514Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:503 msg="Searching for GPU library" name=libcuda.so* time=2025-08-22T08:23:49.515Z level=DEBUG source=gpu.go:527 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:560 msg="discovered GPU libraries" paths=[] time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:503 msg="Searching for GPU library" name=libcudart.so* time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:527 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:560 msg="discovered GPU libraries" paths=[/usr/lib/ollama/libcudart.so.12.8.90] cudaSetDevice err: 35 time=2025-08-22T08:23:49.516Z level=DEBUG source=gpu.go:576 msg="Unable to load cudart library /usr/lib/ollama/libcudart.so.12.8.90: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-08-22T08:23:49.516Z level=DEBUG source=amd_linux.go:422 msg="amdgpu driver not detected /sys/module/amdgpu" time=2025-08-22T08:23:49.516Z level=INFO source=gpu.go:379 msg="no compatible GPUs were discovered" time=2025-08-22T08:23:49.516Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.2 GiB" available="29.3 GiB" time=2025-08-22T08:25:20.929Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="29.3 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.6 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:20.929Z level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-08-22T08:25:20.936Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-22T08:25:20.936Z level=DEBUG source=sched.go:208 msg="loading first model" model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 8B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 8B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 8B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-8B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 4096 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 12288 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 7 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q8_0: 253 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 7.49 GiB (8.50 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 7.57 B print_info: general.name = Qwen3 Embedding 8B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-08-22T08:25:21.072Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="28.6 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.5 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=INFO source=server.go:383 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb --port 41285" time=2025-08-22T08:25:21.081Z level=DEBUG source=server.go:384 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/lib/ollama time=2025-08-22T08:25:21.081Z level=DEBUG source=gpu.go:393 msg="updating system memory data" before.total="31.2 GiB" before.free="28.5 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="28.5 GiB" now.free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=INFO source=server.go:488 msg="system memory" total="31.2 GiB" free="28.5 GiB" free_swap="8.0 GiB" time=2025-08-22T08:25:21.081Z level=DEBUG source=memory.go:177 msg=evaluating library=cpu gpu_count=1 available="[28.6 GiB]" time=2025-08-22T08:25:21.081Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-08-22T08:25:21.081Z level=DEBUG source=memory.go:177 msg=evaluating library=cpu gpu_count=1 available="[28.6 GiB]" time=2025-08-22T08:25:21.081Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=qwen3.vision.block_count default=0 time=2025-08-22T08:25:21.082Z level=INFO source=server.go:531 msg=offload library=cpu layers.requested=-1 layers.model=37 layers.offload=0 layers.split=[] memory.available="[28.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.6 GiB" memory.required.partial="0 B" memory.required.kv="576.0 MiB" memory.required.allocations="[8.6 GiB]" memory.weights.total="7.5 GiB" memory.weights.repeating="6.9 GiB" memory.weights.nonrepeating="629.5 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB" time=2025-08-22T08:25:21.129Z level=INFO source=runner.go:864 msg="starting go runner" time=2025-08-22T08:25:21.129Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-08-22T08:25:21.132Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-08-22T08:25:21.133Z level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:41285" time=2025-08-22T08:25:21.134Z level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:16 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-08-22T08:25:21.135Z level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-22T08:25:21.135Z level=INFO source=server.go:1268 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from /root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen3 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 8B llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding llama_model_loader: - kv 4: general.size_label str = 8B llama_model_loader: - kv 5: general.license str = apache-2.0 llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 8B Base llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-8B-... llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme... llama_model_loader: - kv 11: qwen3.block_count u32 = 36 llama_model_loader: - kv 12: qwen3.context_length u32 = 40960 llama_model_loader: - kv 13: qwen3.embedding_length u32 = 4096 llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 12288 llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32 llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128 llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128 llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643 llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 7 llama_model_loader: - type f32: 145 tensors llama_model_loader: - type q8_0: 253 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 7.49 GiB (8.50 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen3 print_info: vocab_only = 0 print_info: n_ctx_train = 40960 print_info: n_embd = 4096 print_info: n_layer = 36 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 12288 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 3 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 40960 print_info: rope_finetuned = unknown print_info: model type = 8B print_info: model params = 7.57 B print_info: general.name = Qwen3 Embedding 8B print_info: vocab type = BPE print_info: n_vocab = 151665 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151643 '<|endoftext|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: layer 0 assigned to device CPU, is_swa = 0 load_tensors: layer 1 assigned to device CPU, is_swa = 0 load_tensors: layer 2 assigned to device CPU, is_swa = 0 load_tensors: layer 3 assigned to device CPU, is_swa = 0 load_tensors: layer 4 assigned to device CPU, is_swa = 0 load_tensors: layer 5 assigned to device CPU, is_swa = 0 load_tensors: layer 6 assigned to device CPU, is_swa = 0 load_tensors: layer 7 assigned to device CPU, is_swa = 0 load_tensors: layer 8 assigned to device CPU, is_swa = 0 load_tensors: layer 9 assigned to device CPU, is_swa = 0 load_tensors: layer 10 assigned to device CPU, is_swa = 0 load_tensors: layer 11 assigned to device CPU, is_swa = 0 load_tensors: layer 12 assigned to device CPU, is_swa = 0 load_tensors: layer 13 assigned to device CPU, is_swa = 0 load_tensors: layer 14 assigned to device CPU, is_swa = 0 load_tensors: layer 15 assigned to device CPU, is_swa = 0 load_tensors: layer 16 assigned to device CPU, is_swa = 0 load_tensors: layer 17 assigned to device CPU, is_swa = 0 load_tensors: layer 18 assigned to device CPU, is_swa = 0 load_tensors: layer 19 assigned to device CPU, is_swa = 0 load_tensors: layer 20 assigned to device CPU, is_swa = 0 load_tensors: layer 21 assigned to device CPU, is_swa = 0 load_tensors: layer 22 assigned to device CPU, is_swa = 0 load_tensors: layer 23 assigned to device CPU, is_swa = 0 load_tensors: layer 24 assigned to device CPU, is_swa = 0 load_tensors: layer 25 assigned to device CPU, is_swa = 0 load_tensors: layer 26 assigned to device CPU, is_swa = 0 load_tensors: layer 27 assigned to device CPU, is_swa = 0 load_tensors: layer 28 assigned to device CPU, is_swa = 0 load_tensors: layer 29 assigned to device CPU, is_swa = 0 load_tensors: layer 30 assigned to device CPU, is_swa = 0 load_tensors: layer 31 assigned to device CPU, is_swa = 0 load_tensors: layer 32 assigned to device CPU, is_swa = 0 load_tensors: layer 33 assigned to device CPU, is_swa = 0 load_tensors: layer 34 assigned to device CPU, is_swa = 0 load_tensors: layer 35 assigned to device CPU, is_swa = 0 load_tensors: layer 36 assigned to device CPU, is_swa = 0 load_tensors: CPU model buffer size = 7668.64 MiB load_all_data: no device found for buffer type CPU for async uploads time=2025-08-22T08:25:21.637Z level=DEBUG source=server.go:1278 msg="model load progress 0.18" time=2025-08-22T08:25:21.888Z level=DEBUG source=server.go:1278 msg="model load progress 0.30" time=2025-08-22T08:25:22.139Z level=DEBUG source=server.go:1278 msg="model load progress 0.43" time=2025-08-22T08:25:22.390Z level=DEBUG source=server.go:1278 msg="model load progress 0.52" time=2025-08-22T08:25:22.640Z level=DEBUG source=server.go:1278 msg="model load progress 0.60" time=2025-08-22T08:25:22.891Z level=DEBUG source=server.go:1278 msg="model load progress 0.67" time=2025-08-22T08:25:23.142Z level=DEBUG source=server.go:1278 msg="model load progress 0.78" time=2025-08-22T08:25:23.393Z level=DEBUG source=server.go:1278 msg="model load progress 0.87" time=2025-08-22T08:25:23.644Z level=DEBUG source=server.go:1278 msg="model load progress 0.97" llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized set_abort_callback: call llama_context: CPU output buffer size = 0.59 MiB create_memory: n_ctx = 4096 (padded) llama_kv_cache_unified: layer 0: dev = CPU llama_kv_cache_unified: layer 1: dev = CPU llama_kv_cache_unified: layer 2: dev = CPU llama_kv_cache_unified: layer 3: dev = CPU llama_kv_cache_unified: layer 4: dev = CPU llama_kv_cache_unified: layer 5: dev = CPU llama_kv_cache_unified: layer 6: dev = CPU llama_kv_cache_unified: layer 7: dev = CPU llama_kv_cache_unified: layer 8: dev = CPU llama_kv_cache_unified: layer 9: dev = CPU llama_kv_cache_unified: layer 10: dev = CPU llama_kv_cache_unified: layer 11: dev = CPU llama_kv_cache_unified: layer 12: dev = CPU llama_kv_cache_unified: layer 13: dev = CPU llama_kv_cache_unified: layer 14: dev = CPU llama_kv_cache_unified: layer 15: dev = CPU llama_kv_cache_unified: layer 16: dev = CPU llama_kv_cache_unified: layer 17: dev = CPU llama_kv_cache_unified: layer 18: dev = CPU llama_kv_cache_unified: layer 19: dev = CPU llama_kv_cache_unified: layer 20: dev = CPU llama_kv_cache_unified: layer 21: dev = CPU llama_kv_cache_unified: layer 22: dev = CPU llama_kv_cache_unified: layer 23: dev = CPU llama_kv_cache_unified: layer 24: dev = CPU llama_kv_cache_unified: layer 25: dev = CPU llama_kv_cache_unified: layer 26: dev = CPU llama_kv_cache_unified: layer 27: dev = CPU llama_kv_cache_unified: layer 28: dev = CPU llama_kv_cache_unified: layer 29: dev = CPU llama_kv_cache_unified: layer 30: dev = CPU llama_kv_cache_unified: layer 31: dev = CPU llama_kv_cache_unified: layer 32: dev = CPU llama_kv_cache_unified: layer 33: dev = CPU llama_kv_cache_unified: layer 34: dev = CPU llama_kv_cache_unified: layer 35: dev = CPU llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 1 llama_context: max_nodes = 3184 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1 graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512 llama_context: CPU compute buffer size = 316.23 MiB llama_context: graph nodes = 1411 llama_context: graph splits = 1 time=2025-08-22T08:25:23.894Z level=INFO source=server.go:1272 msg="llama runner started in 2.81 seconds" time=2025-08-22T08:25:23.894Z level=INFO source=sched.go:473 msg="loaded runners" count=1 time=2025-08-22T08:25:23.894Z level=INFO source=server.go:1234 msg="waiting for llama runner to start responding" time=2025-08-22T08:25:23.895Z level=INFO source=server.go:1272 msg="llama runner started in 2.81 seconds" time=2025-08-22T08:25:23.895Z level=DEBUG source=sched.go:485 msg="finished setting up" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 time=2025-08-22T08:25:23.903Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-22T08:25:23.904Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=2 used=0 remaining=2 //ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed /usr/lib/ollama/libggml-base.so(+0x151a8)[0x7f59980c31a8] /usr/lib/ollama/libggml-base.so(ggml_print_backtrace+0x1e6)[0x7f59980c3576] /usr/lib/ollama/libggml-base.so(ggml_abort+0x11d)[0x7f59980c36fd] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x5d79e)[0x7f599371b79e] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x11c51)[0x7f59936cfc51] /usr/lib/ollama/libggml-cpu-alderlake.so(ggml_graph_compute+0xdc)[0x7f59936d205c] /usr/lib/ollama/libggml-cpu-alderlake.so(+0x144a0)[0x7f59936d24a0] /usr/bin/ollama(+0x109bf75)[0x55dc081f1f75] /usr/bin/ollama(+0x110f7a1)[0x55dc082657a1] /usr/bin/ollama(+0x110fac2)[0x55dc08265ac2] /usr/bin/ollama(+0x1115da7)[0x55dc0826bda7] /usr/bin/ollama(+0x1116c5c)[0x55dc0826cc5c] /usr/bin/ollama(+0x1034d21)[0x55dc0818ad21] /usr/bin/ollama(+0x360c21)[0x55dc074b6c21] SIGABRT: abort PC=0x7f59e201ab2c m=0 sigcode=18446744073709551610 signal arrived during cgo execution
goroutine 16 gp=0xc000102a80 m=0 mp=0x55dc091f7d20 [syscall]:
runtime.cgocall(0x55dc0818ace0, 0xc0000bdbd8)
runtime/cgocall.go:167 +0x4b fp=0xc0000bdbb0 sp=0xc0000bdb78 pc=0x55dc074ac3eb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7f598800a0d0, {0x2, 0x7f59880c7110, 0x0, 0x7f59880cbb40, 0x7f5988007150, 0x7f598a6e66d0, 0x7f5988010af0})
_cgo_gotypes.go:668 +0x4a fp=0xc0000bdbd8 sp=0xc0000bdbb0 pc=0x55dc0785b90a
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
github.com/ollama/ollama/llama/llama.go:150
github.com/ollama/ollama/llama.(*Context).Decode(0xc00068d1a0?, 0x1?)
github.com/ollama/ollama/llama/llama.go:150 +0xed fp=0xc0000bdcc0 sp=0xc0000bdbd8 pc=0x55dc0785e6ed
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004e12c0, 0xc000692640, 0xc0004adf28)
github.com/ollama/ollama/runner/llamarunner/runner.go:441 +0x209 fp=0xc0000bdee8 sp=0xc0000bdcc0 pc=0x55dc07924f29
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004e12c0, {0x55dc089553e0, 0xc0003a4730})
github.com/ollama/ollama/runner/llamarunner/runner.go:346 +0x1d5 fp=0xc0000bdfb8 sp=0xc0000bdee8 pc=0x55dc07924bb5
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x28 fp=0xc0000bdfe0 sp=0xc0000bdfb8 pc=0x55dc07929908
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000bdfe8 sp=0xc0000bdfe0 pc=0x55dc074b6fa1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/llamarunner/runner.go:880 +0x4c5
goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00058f790 sp=0xc00058f770 pc=0x55dc074af86e
runtime.netpollblock(0xc00058f7e0?, 0x7448666?, 0xdc?)
runtime/netpoll.go:575 +0xf7 fp=0xc00058f7c8 sp=0xc00058f790 pc=0x55dc07474357
internal/poll.runtime_pollWait(0x7f599acaceb0, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc00058f7e8 sp=0xc00058f7c8 pc=0x55dc074aea85
internal/poll.(*pollDesc).wait(0xc000685200?, 0x900000036?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00058f810 sp=0xc00058f7e8 pc=0x55dc07535ec7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000685200)
internal/poll/fd_unix.go:620 +0x295 fp=0xc00058f8b8 sp=0xc00058f810 pc=0x55dc0753b295
net.(*netFD).accept(0xc000685200)
net/fd_unix.go:172 +0x29 fp=0xc00058f970 sp=0xc00058f8b8 pc=0x55dc075ae249
net.(*TCPListener).accept(0xc00044d100)
net/tcpsock_posix.go:159 +0x1b fp=0xc00058f9c0 sp=0xc00058f970 pc=0x55dc075c3bfb
net.(*TCPListener).Accept(0xc00044d100)
net/tcpsock.go:380 +0x30 fp=0xc00058f9f0 sp=0xc00058f9c0 pc=0x55dc075c2ab0
net/http.(*onceCloseListener).Accept(0xc0004e83f0?)
:1 +0x24 fp=0xc00058fa08 sp=0xc00058f9f0 pc=0x55dc077da204
net/http.(*Server).Serve(0xc00004f800, {0x55dc08952f28, 0xc00044d100})
net/http/server.go:3424 +0x30c fp=0xc00058fb38 sp=0xc00058fa08 pc=0x55dc077b1acc
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034260, 0x4, 0x4})
github.com/ollama/ollama/runner/llamarunner/runner.go:901 +0x8f5 fp=0xc00058fd08 sp=0xc00058fb38 pc=0x55dc07929695
github.com/ollama/ollama/runner.Execute({0xc000034250?, 0x0?, 0x0?})
github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc00058fd30 sp=0xc00058fd08 pc=0x55dc079b3c34
github.com/ollama/ollama/cmd.NewCLI.func2(0xc00004f500?, {0x55dc0846e081?, 0x4?, 0x55dc0846e085?})
github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc00058fd58 sp=0xc00058fd30 pc=0x55dc08118ce5
github.com/spf13/cobra.(*Command).execute(0xc0004eaf08, {0xc00044cf40, 0x4, 0x4})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00058fe78 sp=0xc00058fd58 pc=0x55dc0762789c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0006a1508)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00058ff30 sp=0xc00058fe78 pc=0x55dc076280e5
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00058ff50 sp=0xc00058ff30 pc=0x55dc081197cd
runtime.main()
runtime/proc.go:283 +0x29d fp=0xc00058ffe0 sp=0xc00058ff50 pc=0x55dc0747b9dd
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00058ffe8 sp=0xc00058ffe0 pc=0x55dc074b6fa1
goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000aafa8 sp=0xc0000aaf88 pc=0x55dc074af86e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.forcegchelper()
runtime/proc.go:348 +0xb8 fp=0xc0000aafe0 sp=0xc0000aafa8 pc=0x55dc0747bd18
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aafe8 sp=0xc0000aafe0 pc=0x55dc074b6fa1
created by runtime.init.7 in goroutine 1
runtime/proc.go:336 +0x1a
goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000ab780 sp=0xc0000ab760 pc=0x55dc074af86e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.bgsweep(0xc0000d6000)
runtime/mgcsweep.go:316 +0xdf fp=0xc0000ab7c8 sp=0xc0000ab780 pc=0x55dc074664bf
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000ab7e0 sp=0xc0000ab7c8 pc=0x55dc0745a8a5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ab7e8 sp=0xc0000ab7e0 pc=0x55dc074b6fa1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66
goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x55dc08631af8?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000abf78 sp=0xc0000abf58 pc=0x55dc074af86e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.(*scavengerState).park(0x55dc091f4f00)
runtime/mgcscavenge.go:425 +0x49 fp=0xc0000abfa8 sp=0xc0000abf78 pc=0x55dc07463f09
runtime.bgscavenge(0xc0000d6000)
runtime/mgcscavenge.go:658 +0x59 fp=0xc0000abfc8 sp=0xc0000abfa8 pc=0x55dc07464499
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc0000abfe0 sp=0xc0000abfc8 pc=0x55dc0745a845
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000abfe8 sp=0xc0000abfe0 pc=0x55dc074b6fa1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5
goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc0000aa688?)
runtime/proc.go:435 +0xce fp=0xc0000aa630 sp=0xc0000aa610 pc=0x55dc074af86e
runtime.runfinq()
runtime/mfinal.go:196 +0x107 fp=0xc0000aa7e0 sp=0xc0000aa630 pc=0x55dc07459867
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000aa7e8 sp=0xc0000aa7e0 pc=0x55dc074b6fa1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:166 +0x3d
goroutine 6 gp=0xc0001fa8c0 m=nil [chan receive]:
runtime.gopark(0xc00025f540?, 0xc000118018?, 0x60?, 0xc7?, 0x55dc07594e88?)
runtime/proc.go:435 +0xce fp=0xc0000ac718 sp=0xc0000ac6f8 pc=0x55dc074af86e
runtime.chanrecv(0xc0000e2310, 0x0, 0x1)
runtime/chan.go:664 +0x445 fp=0xc0000ac790 sp=0xc0000ac718 pc=0x55dc0744b245
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:506 +0x12 fp=0xc0000ac7b8 sp=0xc0000ac790 pc=0x55dc0744add2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1799 +0x2f fp=0xc0000ac7e0 sp=0xc0000ac7b8 pc=0x55dc0745da4f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ac7e8 sp=0xc0000ac7e0 pc=0x55dc074b6fa1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1794 +0x85
goroutine 7 gp=0xc0001fac40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000acf38 sp=0xc0000acf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000acfc8 sp=0xc0000acf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000acfe0 sp=0xc0000acfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000acfe8 sp=0xc0000acfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000a6738 sp=0xc0000a6718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000a67c8 sp=0xc0000a6738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000a67e0 sp=0xc0000a67c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 34 gp=0xc000502380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000518738 sp=0xc000518718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005187c8 sp=0xc000518738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005187e0 sp=0xc0005187c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005187e8 sp=0xc0005187e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 8 gp=0xc0001fae00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000ad738 sp=0xc0000ad718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000ad7c8 sp=0xc0000ad738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000ad7e0 sp=0xc0000ad7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ad7e8 sp=0xc0000ad7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000514738 sp=0xc000514718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005147c8 sp=0xc000514738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005147e0 sp=0xc0005147c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005147e8 sp=0xc0005147e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000514f38 sp=0xc000514f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000514fc8 sp=0xc000514f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000514fe0 sp=0xc000514fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000514fe8 sp=0xc000514fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 35 gp=0xc000502540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000518f38 sp=0xc000518f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000518fc8 sp=0xc000518f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000518fe0 sp=0xc000518fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000518fe8 sp=0xc000518fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 9 gp=0xc0001fafc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0000adf38 sp=0xc0000adf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0000adfc8 sp=0xc0000adf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000adfe0 sp=0xc0000adfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000515738 sp=0xc000515718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005157c8 sp=0xc000515738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005157e0 sp=0xc0005157c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005157e8 sp=0xc0005157e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 53 gp=0xc000584540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000515f38 sp=0xc000515f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000515fc8 sp=0xc000515f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000515fe0 sp=0xc000515fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000515fe8 sp=0xc000515fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 54 gp=0xc000584700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000516738 sp=0xc000516718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005167c8 sp=0xc000516738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005167e0 sp=0xc0005167c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005167e8 sp=0xc0005167e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 36 gp=0xc000502700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000519738 sp=0xc000519718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005197c8 sp=0xc000519738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005197e0 sp=0xc0005197c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005197e8 sp=0xc0005197e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 10 gp=0xc0001fb180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004aa738 sp=0xc0004aa718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004aa7c8 sp=0xc0004aa738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004aa7e0 sp=0xc0004aa7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004aa7e8 sp=0xc0004aa7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 55 gp=0xc0005848c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000516f38 sp=0xc000516f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000516fc8 sp=0xc000516f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000516fe0 sp=0xc000516fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000516fe8 sp=0xc000516fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 37 gp=0xc0005028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000519f38 sp=0xc000519f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000519fc8 sp=0xc000519f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000519fe0 sp=0xc000519fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000519fe8 sp=0xc000519fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 11 gp=0xc0001fb340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004aaf38 sp=0xc0004aaf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004aafc8 sp=0xc0004aaf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004aafe0 sp=0xc0004aafc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004aafe8 sp=0xc0004aafe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 56 gp=0xc000584a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000517738 sp=0xc000517718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0005177c8 sp=0xc000517738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0005177e0 sp=0xc0005177c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005177e8 sp=0xc0005177e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 38 gp=0xc000502a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051a738 sp=0xc00051a718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051a7c8 sp=0xc00051a738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051a7e0 sp=0xc00051a7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051a7e8 sp=0xc00051a7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 12 gp=0xc0001fb500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004ab738 sp=0xc0004ab718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004ab7c8 sp=0xc0004ab738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004ab7e0 sp=0xc0004ab7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ab7e8 sp=0xc0004ab7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 57 gp=0xc000584c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000517f38 sp=0xc000517f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc000517fc8 sp=0xc000517f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000517fe0 sp=0xc000517fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000517fe8 sp=0xc000517fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 39 gp=0xc000502c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051af38 sp=0xc00051af18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051afc8 sp=0xc00051af38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051afe0 sp=0xc00051afc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051afe8 sp=0xc00051afe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 40 gp=0xc000502e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051b738 sp=0xc00051b718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051b7c8 sp=0xc00051b738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051b7e0 sp=0xc00051b7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051b7e8 sp=0xc00051b7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 13 gp=0xc0001fb6c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004abf38 sp=0xc0004abf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004abfc8 sp=0xc0004abf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004abfe0 sp=0xc0004abfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004abfe8 sp=0xc0004abfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 58 gp=0xc000584e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a6738 sp=0xc0004a6718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a67c8 sp=0xc0004a6738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a67e0 sp=0xc0004a67c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a67e8 sp=0xc0004a67e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 41 gp=0xc000502fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00051bf38 sp=0xc00051bf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc00051bfc8 sp=0xc00051bf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00051bfe0 sp=0xc00051bfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00051bfe8 sp=0xc00051bfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 14 gp=0xc0001fb880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004ac738 sp=0xc0004ac718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004ac7c8 sp=0xc0004ac738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004ac7e0 sp=0xc0004ac7c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ac7e8 sp=0xc0004ac7e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 15 gp=0xc0001fba40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004acf38 sp=0xc0004acf18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004acfc8 sp=0xc0004acf38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004acfe0 sp=0xc0004acfc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004acfe8 sp=0xc0004acfe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 59 gp=0xc000584fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a6f38 sp=0xc0004a6f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a6fc8 sp=0xc0004a6f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a6fe0 sp=0xc0004a6fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a6fe8 sp=0xc0004a6fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 60 gp=0xc000585180 m=nil [GC worker (idle)]:
runtime.gopark(0x52663eb4c0ff?, 0x1?, 0x56?, 0xd7?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a7738 sp=0xc0004a7718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a77c8 sp=0xc0004a7738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a77e0 sp=0xc0004a77c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a77e8 sp=0xc0004a77e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 61 gp=0xc000585340 m=nil [GC worker (idle)]:
runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x5c?, 0xa1?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a7f38 sp=0xc0004a7f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a7fc8 sp=0xc0004a7f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a7fe0 sp=0xc0004a7fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a7fe8 sp=0xc0004a7fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 62 gp=0xc000585500 m=nil [GC worker (idle)]:
runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x3b?, 0xc0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a8738 sp=0xc0004a8718 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a87c8 sp=0xc0004a8738 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a87e0 sp=0xc0004a87c8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a87e8 sp=0xc0004a87e0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 63 gp=0xc0005856c0 m=nil [GC worker (idle)]:
runtime.gopark(0x55dc092a4bc0?, 0x1?, 0x7e?, 0xe3?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0004a8f38 sp=0xc0004a8f18 pc=0x55dc074af86e
runtime.gcBgMarkWorker(0xc0000e3730)
runtime/mgc.go:1423 +0xe9 fp=0xc0004a8fc8 sp=0xc0004a8f38 pc=0x55dc0745cd69
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0004a8fe0 sp=0xc0004a8fc8 pc=0x55dc0745cc45
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0004a8fe8 sp=0xc0004a8fe0 pc=0x55dc074b6fa1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105
goroutine 66 gp=0xc000102c40 m=nil [chan receive]:
runtime.gopark(0x55dc074b4fb4?, 0xc0000478d8?, 0xf0?, 0xec?, 0xc0000478c0?)
runtime/proc.go:435 +0xce fp=0xc0000478a0 sp=0xc000047880 pc=0x55dc074af86e
runtime.chanrecv(0xc0003b3960, 0xc000047a70, 0x1)
runtime/chan.go:664 +0x445 fp=0xc000047918 sp=0xc0000478a0 pc=0x55dc0744b245
runtime.chanrecv1(0xc00031e5a0?, 0xc00044d500?)
runtime/chan.go:506 +0x12 fp=0xc000047940 sp=0xc000047918 pc=0x55dc0744add2
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings(0xc0004e12c0, {0x55dc08953108, 0xc00069ed20}, 0xc0004d3e00)
github.com/ollama/ollama/runner/llamarunner/runner.go:712 +0x697 fp=0xc000047ac0 sp=0xc000047940 pc=0x55dc07927657
github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings-fm({0x55dc08953108?, 0xc00069ed20?}, 0xc000047b40?)
:1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x55dc07929c96
net/http.HandlerFunc.ServeHTTP(0xc00053f2c0?, {0x55dc08953108?, 0xc00069ed20?}, 0xc000047b60?)
net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x55dc077ae109
net/http.(*ServeMux).ServeHTTP(0x55dc07453d85?, {0x55dc08953108, 0xc00069ed20}, 0xc0004d3e00)
net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x55dc077b0004
net/http.serverHandler.ServeHTTP({0x55dc0894f790?}, {0x55dc08953108?, 0xc00069ed20?}, 0x1?)
net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x55dc077cda8e
net/http.(*conn).serve(0xc0004e83f0, {0x55dc089553a8, 0xc000255e60})
net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x55dc077ac605
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x55dc077b1ec8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x55dc074b6fa1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3454 +0x485
goroutine 76 gp=0xc000102e00 m=nil [IO wait]:
runtime.gopark(0x55dc09189600?, 0xc0000a6e38?, 0x38?, 0x6e?, 0xb?)
runtime/proc.go:435 +0xce fp=0xc0000a6dd8 sp=0xc0000a6db8 pc=0x55dc074af86e
runtime.netpollblock(0x55dc074d2bd8?, 0x7448666?, 0xdc?)
runtime/netpoll.go:575 +0xf7 fp=0xc0000a6e10 sp=0xc0000a6dd8 pc=0x55dc07474357
internal/poll.runtime_pollWait(0x7f599acacd98, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc0000a6e30 sp=0xc0000a6e10 pc=0x55dc074aea85
internal/poll.(*pollDesc).wait(0xc000685280?, 0xc0002ac101?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000a6e58 sp=0xc0000a6e30 pc=0x55dc07535ec7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000685280, {0xc0002ac101, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc0000a6ef0 sp=0xc0000a6e58 pc=0x55dc075371ba
net.(*netFD).Read(0xc000685280, {0xc0002ac101?, 0xc00044d1d8?, 0xc0000a6f70?})
net/fd_posix.go:55 +0x25 fp=0xc0000a6f38 sp=0xc0000a6ef0 pc=0x55dc075ac2a5
net.(*conn).Read(0xc0000ae920, {0xc0002ac101?, 0x0?, 0x0?})
net/net.go:194 +0x45 fp=0xc0000a6f80 sp=0xc0000a6f38 pc=0x55dc075ba665
net/http.(*connReader).backgroundRead(0xc0002ac0f0)
net/http/server.go:690 +0x37 fp=0xc0000a6fc8 sp=0xc0000a6f80 pc=0x55dc077a64d7
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc0000a6fe0 sp=0xc0000a6fc8 pc=0x55dc077a6405
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0 pc=0x55dc074b6fa1
created by net/http.(*connReader).startBackgroundRead in goroutine 66
net/http/server.go:686 +0xb6
rax 0x0
rbx 0x15
rcx 0x7f59e201ab2c
rdx 0x6
rdi 0x15
rsi 0x15
rbp 0x7fffeeab7a70
rsp 0x7fffeeab7a30
r8 0x0
r9 0x7
r10 0x8
r11 0x246
r12 0x6
r13 0x7f5993766ca0
r14 0x16
r15 0x7f59887324a0
rip 0x7f59e201ab2c
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
[GIN] 2025/08/22 - 08:25:24 | 500 | 3.236966618s | 192.168.127.1 | POST "/api/embed"
time=2025-08-22T08:25:24.131Z level=ERROR source=server.go:409 msg="llama runner terminated" error="exit status 2"
time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:493 msg="context for request finished"
time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 duration=5m0s
time=2025-08-22T08:25:24.132Z level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/Qwen/Qwen3-Embedding-8B-GGUF:Q8_0 runner.inference=cpu runner.devices=1 runner.size="8.6 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=21 runner.model=/root/.ollama/models/blobs/sha256-d20ddc71e8a5c4344f2343481e242233a997dc5eaff442427a945836c97b4deb runner.num_ctx=4096 refCount=0
@liangstein commented on GitHub (Aug 26, 2025):
I have updated the error log in this thread. I'm using ollama 0.11.7
@really-hzy commented on GitHub (Sep 2, 2025):
+1, It has existed since 0.11.5 to 0.11.8. There is no problem with cuda. It will inevitably appear when running based on the CPU.
@really-hzy commented on GitHub (Sep 5, 2025):
time=2025-09-05T14:32:27.553+08:00 level=INFO source=routes.go:1331 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\huangzy\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-09-05T14:32:27.581+08:00 level=INFO source=images.go:477 msg="total blobs: 221"
time=2025-09-05T14:32:27.590+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-09-05T14:32:27.597+08:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)"
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-09-05T14:32:27.597+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=24 efficiency=16 threads=32
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:512 msg="Searching for GPU library" name=nvml.dll
time=2025-09-05T14:32:27.597+08:00 level=DEBUG source=gpu.go:536 msg="gpu library search" globs="[C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\nvml.dll E:\Program Files\Microsoft Visual Studio\2022\Community\VC\Redist\MSVC\14.42.34433\x64\Microsoft.VC143.CRT\nvml.dll E:\VulkanSDK\1.4.321.1\Bin\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp\nvml.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll C:\Program Files\YunShu\utils\nvml.dll C:\Windows\system32\nvml.dll C:\Windows\nvml.dll C:\Windows\System32\Wbem\nvml.dll C:\Windows\System32\WindowsPowerShell\v1.0\nvml.dll C:\Windows\System32\OpenSSH\nvml.dll E:\Program Files\CMake\bin\nvml.dll C:\Users\huangzy\.local\bin\nvml.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvml.dll C:\Users\huangzy\.dotnet\tools\nvml.dll C:\Users\huangzy\AppData\Roaming\npm\nvml.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll C:\ProgramData\chocolatey\bin\nvml.dll E:\Program Files\Git\cmd\nvml.dll C:\Program Files\Go\bin\nvml.dll C:\TDM-GCC-64\bin\nvml.dll C:\Users\huangzy\AppData\Local\nvm\nvml.dll C:\nvm4w\nodejs\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvml.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp\nvml.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2025.1.0\nvml.dll C:\Program Files\NVIDIA Corporation\NVIDIA app\NvDLISR\nvml.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvml.dll C:\Program Files\Docker\Docker\resources\bin\nvml.dll E:\software\cwrsync_6.4.4_x64_free\bin\nvml.dll C:\Program Files\dotnet\nvml.dll E:\Windows Kits\10\Windows Performance Toolkit\nvml.dll E:\Program Files\TortoiseGit\bin\nvml.dll E:\software\iMyFone Nut Studio\.nodejs\nvml.dll C:\Users\huangzy\.local\bin\nvml.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvml.dll C:\Users\huangzy\.dotnet\tools\nvml.dll E:\Users\huangzy\AppData\Local\Programs\Microsoft VS Code\bin\nvml.dll E:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvml.dll C:\Users\huangzy\go\bin\nvml.dll C:\Users\huangzy\AppData\Local\Programs\Ollama\nvml.dll C:\Users\huangzy\.lmstudio\bin\nvml.dll C:\Users\huangzy\AppData\Local\nvm\nvml.dll C:\nvm4w\nodejs\nvml.dll C:\Users\huangzy\.dotnet\tools\nvml.dll C:\Users\huangzy\AppData\Roaming\npm\nvml.dll c:\Windows\System32\nvml.dll]"
time=2025-09-05T14:32:27.599+08:00 level=DEBUG source=gpu.go:540 msg="skipping PhysX cuda library path" path="C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvml.dll"
time=2025-09-05T14:32:27.599+08:00 level=DEBUG source=gpu.go:569 msg="discovered GPU libraries" paths="[C:\Windows\system32\nvml.dll c:\Windows\System32\nvml.dll]"
time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:111 msg="nvidia-ml loaded" library=C:\Windows\system32\nvml.dll
time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:512 msg="Searching for GPU library" name=nvcuda.dll
time=2025-09-05T14:32:27.610+08:00 level=DEBUG source=gpu.go:536 msg="gpu library search" globs="[C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\nvcuda.dll E:\Program Files\Microsoft Visual Studio\2022\Community\VC\Redist\MSVC\14.42.34433\x64\Microsoft.VC143.CRT\nvcuda.dll E:\VulkanSDK\1.4.321.1\Bin\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp\nvcuda.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll C:\Program Files\YunShu\utils\nvcuda.dll C:\Windows\system32\nvcuda.dll C:\Windows\nvcuda.dll C:\Windows\System32\Wbem\nvcuda.dll C:\Windows\System32\WindowsPowerShell\v1.0\nvcuda.dll C:\Windows\System32\OpenSSH\nvcuda.dll E:\Program Files\CMake\bin\nvcuda.dll C:\Users\huangzy\.local\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvcuda.dll C:\Users\huangzy\.dotnet\tools\nvcuda.dll C:\Users\huangzy\AppData\Roaming\npm\nvcuda.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll C:\ProgramData\chocolatey\bin\nvcuda.dll E:\Program Files\Git\cmd\nvcuda.dll C:\Program Files\Go\bin\nvcuda.dll C:\TDM-GCC-64\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\nvm\nvcuda.dll C:\nvm4w\nodejs\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvcuda.dll C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp\nvcuda.dll C:\Program Files\NVIDIA Corporation\Nsight Compute 2025.1.0\nvcuda.dll C:\Program Files\NVIDIA Corporation\NVIDIA app\NvDLISR\nvcuda.dll C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvcuda.dll C:\Program Files\Docker\Docker\resources\bin\nvcuda.dll E:\software\cwrsync_6.4.4_x64_free\bin\nvcuda.dll C:\Program Files\dotnet\nvcuda.dll E:\Windows Kits\10\Windows Performance Toolkit\nvcuda.dll E:\Program Files\TortoiseGit\bin\nvcuda.dll E:\software\iMyFone Nut Studio\.nodejs\nvcuda.dll C:\Users\huangzy\.local\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps\nvcuda.dll C:\Users\huangzy\.dotnet\tools\nvcuda.dll E:\Users\huangzy\AppData\Local\Programs\Microsoft VS Code\bin\nvcuda.dll E:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin\nvcuda.dll C:\Users\huangzy\go\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\Programs\Ollama\nvcuda.dll C:\Users\huangzy\.lmstudio\bin\nvcuda.dll C:\Users\huangzy\AppData\Local\nvm\nvcuda.dll C:\nvm4w\nodejs\nvcuda.dll C:\Users\huangzy\.dotnet\tools\nvcuda.dll C:\Users\huangzy\AppData\Roaming\npm\nvcuda.dll c:\windows\system\nvcuda.dll]"
time=2025-09-05T14:32:27.611+08:00 level=DEBUG source=gpu.go:540 msg="skipping PhysX cuda library path" path="C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\nvcuda.dll"
time=2025-09-05T14:32:27.612+08:00 level=DEBUG source=gpu.go:569 msg="discovered GPU libraries" paths=[C:\Windows\system32\nvcuda.dll]
initializing C:\Windows\system32\nvcuda.dll
dlsym: cuInit - 00007FF81CDB5F80
dlsym: cuDriverGetVersion - 00007FF81CDB6020
dlsym: cuDeviceGetCount - 00007FF81CDB6816
dlsym: cuDeviceGet - 00007FF81CDB6810
dlsym: cuDeviceGetAttribute - 00007FF81CDB6170
dlsym: cuDeviceGetUuid - 00007FF81CDB6822
dlsym: cuDeviceGetName - 00007FF81CDB681C
dlsym: cuCtxCreate_v3 - 00007FF81CDB6894
dlsym: cuMemGetInfo_v2 - 00007FF81CDB6996
dlsym: cuCtxDestroy - 00007FF81CDB68A6
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-09-05T14:32:27.624+08:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\Windows\system32\nvcuda.dll
[GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] CUDA totalMem 24563mb
[GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] CUDA freeMem 23042mb
[GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f] Compute Capability 8.9
time=2025-09-05T14:32:27.730+08:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found."
releasing cuda driver library
releasing nvml library
time=2025-09-05T14:32:27.731+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 D" total="24.0 GiB" available="22.5 GiB"
time=2025-09-05T14:32:43.612+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.6 GiB" before.free_swap="26.8 GiB" now.total="63.8 GiB" now.free="37.5 GiB" now.free_swap="26.6 GiB"
time=2025-09-05T14:32:43.634+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="22.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB"
releasing nvml library
time=2025-09-05T14:32:43.636+08:00 level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-09-05T14:32:43.650+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=general.alignment default=32
time=2025-09-05T14:32:43.650+08:00 level=DEBUG source=sched.go:208 msg="loading first model" model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b
llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B
llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding
llama_model_loader: - kv 4: general.size_label str = 4B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base
llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-...
llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme...
llama_model_loader: - kv 11: qwen3.block_count u32 = 36
llama_model_loader: - kv 12: qwen3.context_length u32 = 40960
llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560
llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 9728
llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32
llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8
llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128
llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128
llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true
llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 34: general.quantization_version u32 = 2
llama_model_loader: - kv 35: general.file_type u32 = 15
llama_model_loader: - type f32: 145 tensors
llama_model_loader: - type q4_K: 216 tensors
llama_model_loader: - type q6_K: 37 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 2.32 GiB (4.95 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen3
print_info: vocab_only = 1
print_info: model type = ?B
print_info: model params = 4.02 B
print_info: general.name = Qwen3 Embedding 4B
print_info: vocab type = BPE
print_info: n_vocab = 151665
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151643 '<|endoftext|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-09-05T14:32:43.906+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.5 GiB" before.free_swap="26.6 GiB" now.total="63.8 GiB" now.free="37.4 GiB" now.free_swap="26.6 GiB"
time=2025-09-05T14:32:43.925+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="21.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB"
releasing nvml library
time=2025-09-05T14:32:43.935+08:00 level=INFO source=server.go:398 msg="starting runner" cmd="C:\Users\huangzy\AppData\Local\Programs\Ollama\ollama.exe runner --model C:\Users\huangzy\.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b --port 53597"
time=2025-09-05T14:32:43.936+08:00 level=DEBUG source=server.go:399 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8" CUDA_PATH_V11_3="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3" CUDA_PATH_V12_8="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8" OLLAMA_DEBUG=2 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_ORIGINS=* PATH="C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama;E:\Program Files\Microsoft Visual Studio\2022\Community\VC\Redist\MSVC\14.42.34433\x64\Microsoft.VC143.CRT;E:\VulkanSDK\1.4.321.1\Bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\libnvvp;e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;C:\Program Files\YunShu\utils;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;E:\Program Files\CMake\bin;C:\Users\huangzy\.local\bin;C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps;C:\Users\huangzy\.dotnet\tools;C:\Users\huangzy\AppData\Roaming\npm;e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;e:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;C:\ProgramData\chocolatey\bin;E:\Program Files\Git\cmd;C:\Program Files\Go\bin;C:\TDM-GCC-64\bin;C:\Users\huangzy\AppData\Local\nvm;C:\nvm4w\nodejs;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Program Files\NVIDIA Corporation\Nsight Compute 2025.1.0\;C:\Program Files\NVIDIA Corporation\NVIDIA app\NvDLISR;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Docker\Docker\resources\bin;E:\software\cwrsync_6.4.4_x64_free\bin;C:\Program Files\dotnet\;E:\Windows Kits\10\Windows Performance Toolkit\;E:\Program Files\TortoiseGit\bin;E:\software\iMyFone Nut Studio\.nodejs\;C:\Users\huangzy\.local\bin;C:\Users\huangzy\AppData\Local\Microsoft\WindowsApps;C:\Users\huangzy\.dotnet\tools;E:\Users\huangzy\AppData\Local\Programs\Microsoft VS Code\bin;E:\Users\huangzy\AppData\Local\Programs\cursor\resources\app\bin;C:\Users\huangzy\go\bin;C:\Users\huangzy\AppData\Local\Programs\Ollama;C:\Users\huangzy\.lmstudio\bin;C:\Users\huangzy\AppData\Local\nvm;C:\nvm4w\nodejs;C:\Users\huangzy\.dotnet\tools;C:\Users\huangzy\AppData\Roaming\npm;C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama" OLLAMA_LIBRARY_PATH=C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama
time=2025-09-05T14:32:43.938+08:00 level=DEBUG source=gpu.go:402 msg="updating system memory data" before.total="63.8 GiB" before.free="37.4 GiB" before.free_swap="26.6 GiB" now.total="63.8 GiB" now.free="37.4 GiB" now.free_swap="26.6 GiB"
time=2025-09-05T14:32:43.955+08:00 level=DEBUG source=gpu.go:452 msg="updating cuda memory data" gpu=GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f name="NVIDIA GeForce RTX 4090 D" overhead="0 B" before.total="24.0 GiB" before.free="21.5 GiB" now.total="24.0 GiB" now.free="21.5 GiB" now.used="2.4 GiB"
releasing nvml library
time=2025-09-05T14:32:43.956+08:00 level=INFO source=server.go:503 msg="system memory" total="63.8 GiB" free="37.4 GiB" free_swap="26.6 GiB"
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=memory.go:181 msg=evaluating library=cuda gpu_count=1 available="[21.5 GiB]"
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=qwen3.vision.block_count default=0
time=2025-09-05T14:32:43.957+08:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b library=cuda parallel=1 required="3.8 GiB" gpus=1
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=memory.go:181 msg=evaluating library=cuda gpu_count=1 available="[21.5 GiB]"
time=2025-09-05T14:32:43.957+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=qwen3.vision.block_count default=0
time=2025-09-05T14:32:43.957+08:00 level=INFO source=server.go:543 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[21.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.8 GiB" memory.required.partial="3.8 GiB" memory.required.kv="576.0 MiB" memory.required.allocations="[3.8 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="2.0 GiB" memory.weights.nonrepeating="303.8 MiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB"
time=2025-09-05T14:32:43.975+08:00 level=INFO source=runner.go:864 msg="starting go runner"
time=2025-09-05T14:32:43.980+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\huangzy\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2025-09-05T14:32:43.993+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-09-05T14:32:43.994+08:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:53597"
time=2025-09-05T14:32:44.003+08:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-e8ab67aa-202d-6c54-7afb-4f56f1310c8f Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-09-05T14:32:44.004+08:00 level=INFO source=server.go:1250 msg="waiting for llama runner to start responding"
time=2025-09-05T14:32:44.004+08:00 level=INFO source=server.go:1284 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 398 tensors from C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3 Embedding 4B
llama_model_loader: - kv 3: general.basename str = Qwen3-Embedding
llama_model_loader: - kv 4: general.size_label str = 4B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Qwen3 4B Base
llama_model_loader: - kv 8: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-4B-...
llama_model_loader: - kv 10: general.tags arr[str,5] = ["transformers", "sentence-transforme...
llama_model_loader: - kv 11: qwen3.block_count u32 = 36
llama_model_loader: - kv 12: qwen3.context_length u32 = 40960
llama_model_loader: - kv 13: qwen3.embedding_length u32 = 2560
llama_model_loader: - kv 14: qwen3.feed_forward_length u32 = 9728
llama_model_loader: - kv 15: qwen3.attention.head_count u32 = 32
llama_model_loader: - kv 16: qwen3.attention.head_count_kv u32 = 8
llama_model_loader: - kv 17: qwen3.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 18: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 19: qwen3.attention.key_length u32 = 128
llama_model_loader: - kv 20: qwen3.attention.value_length u32 = 128
llama_model_loader: - kv 21: qwen3.pooling_type u32 = 3
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151665] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151665] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 29: tokenizer.ggml.eot_token_id u32 = 151645
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = true
llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 33: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 34: general.quantization_version u32 = 2
llama_model_loader: - kv 35: general.file_type u32 = 15
llama_model_loader: - type f32: 145 tensors
llama_model_loader: - type q4_K: 216 tensors
llama_model_loader: - type q6_K: 37 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 2.32 GiB (4.95 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen3
print_info: vocab_only = 0
print_info: n_ctx_train = 40960
print_info: n_embd = 2560
print_info: n_layer = 36
print_info: n_head = 32
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 4
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 9728
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 3
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 40960
print_info: rope_finetuned = unknown
print_info: model type = 4B
print_info: model params = 4.02 B
print_info: general.name = Qwen3 Embedding 4B
print_info: vocab type = BPE
print_info: n_vocab = 151665
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151643 '<|endoftext|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: layer 0 assigned to device CPU, is_swa = 0
load_tensors: layer 1 assigned to device CPU, is_swa = 0
load_tensors: layer 2 assigned to device CPU, is_swa = 0
load_tensors: layer 3 assigned to device CPU, is_swa = 0
load_tensors: layer 4 assigned to device CPU, is_swa = 0
load_tensors: layer 5 assigned to device CPU, is_swa = 0
load_tensors: layer 6 assigned to device CPU, is_swa = 0
load_tensors: layer 7 assigned to device CPU, is_swa = 0
load_tensors: layer 8 assigned to device CPU, is_swa = 0
load_tensors: layer 9 assigned to device CPU, is_swa = 0
load_tensors: layer 10 assigned to device CPU, is_swa = 0
load_tensors: layer 11 assigned to device CPU, is_swa = 0
load_tensors: layer 12 assigned to device CPU, is_swa = 0
load_tensors: layer 13 assigned to device CPU, is_swa = 0
load_tensors: layer 14 assigned to device CPU, is_swa = 0
load_tensors: layer 15 assigned to device CPU, is_swa = 0
load_tensors: layer 16 assigned to device CPU, is_swa = 0
load_tensors: layer 17 assigned to device CPU, is_swa = 0
load_tensors: layer 18 assigned to device CPU, is_swa = 0
load_tensors: layer 19 assigned to device CPU, is_swa = 0
load_tensors: layer 20 assigned to device CPU, is_swa = 0
load_tensors: layer 21 assigned to device CPU, is_swa = 0
load_tensors: layer 22 assigned to device CPU, is_swa = 0
load_tensors: layer 23 assigned to device CPU, is_swa = 0
load_tensors: layer 24 assigned to device CPU, is_swa = 0
load_tensors: layer 25 assigned to device CPU, is_swa = 0
load_tensors: layer 26 assigned to device CPU, is_swa = 0
load_tensors: layer 27 assigned to device CPU, is_swa = 0
load_tensors: layer 28 assigned to device CPU, is_swa = 0
load_tensors: layer 29 assigned to device CPU, is_swa = 0
load_tensors: layer 30 assigned to device CPU, is_swa = 0
load_tensors: layer 31 assigned to device CPU, is_swa = 0
load_tensors: layer 32 assigned to device CPU, is_swa = 0
load_tensors: layer 33 assigned to device CPU, is_swa = 0
load_tensors: layer 34 assigned to device CPU, is_swa = 0
load_tensors: layer 35 assigned to device CPU, is_swa = 0
load_tensors: layer 36 assigned to device CPU, is_swa = 0
load_tensors: CPU model buffer size = 2375.37 MiB
load_all_data: no device found for buffer type CPU for async uploads
time=2025-09-05T14:32:44.256+08:00 level=DEBUG source=server.go:1294 msg="model load progress 0.21"
time=2025-09-05T14:32:44.508+08:00 level=DEBUG source=server.go:1294 msg="model load progress 0.64"
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
set_abort_callback: call
llama_context: CPU output buffer size = 0.59 MiB
create_memory: n_ctx = 4096 (padded)
llama_kv_cache_unified: layer 0: dev = CPU
llama_kv_cache_unified: layer 1: dev = CPU
llama_kv_cache_unified: layer 2: dev = CPU
llama_kv_cache_unified: layer 3: dev = CPU
llama_kv_cache_unified: layer 4: dev = CPU
llama_kv_cache_unified: layer 5: dev = CPU
llama_kv_cache_unified: layer 6: dev = CPU
llama_kv_cache_unified: layer 7: dev = CPU
llama_kv_cache_unified: layer 8: dev = CPU
llama_kv_cache_unified: layer 9: dev = CPU
llama_kv_cache_unified: layer 10: dev = CPU
llama_kv_cache_unified: layer 11: dev = CPU
llama_kv_cache_unified: layer 12: dev = CPU
llama_kv_cache_unified: layer 13: dev = CPU
llama_kv_cache_unified: layer 14: dev = CPU
llama_kv_cache_unified: layer 15: dev = CPU
llama_kv_cache_unified: layer 16: dev = CPU
llama_kv_cache_unified: layer 17: dev = CPU
llama_kv_cache_unified: layer 18: dev = CPU
llama_kv_cache_unified: layer 19: dev = CPU
llama_kv_cache_unified: layer 20: dev = CPU
llama_kv_cache_unified: layer 21: dev = CPU
llama_kv_cache_unified: layer 22: dev = CPU
llama_kv_cache_unified: layer 23: dev = CPU
llama_kv_cache_unified: layer 24: dev = CPU
llama_kv_cache_unified: layer 25: dev = CPU
llama_kv_cache_unified: layer 26: dev = CPU
llama_kv_cache_unified: layer 27: dev = CPU
llama_kv_cache_unified: layer 28: dev = CPU
llama_kv_cache_unified: layer 29: dev = CPU
llama_kv_cache_unified: layer 30: dev = CPU
llama_kv_cache_unified: layer 31: dev = CPU
llama_kv_cache_unified: layer 32: dev = CPU
llama_kv_cache_unified: layer 33: dev = CPU
llama_kv_cache_unified: layer 34: dev = CPU
llama_kv_cache_unified: layer 35: dev = CPU
llama_kv_cache_unified: CPU KV buffer size = 576.00 MiB
time=2025-09-05T14:32:44.759+08:00 level=DEBUG source=server.go:1294 msg="model load progress 1.00"
llama_kv_cache_unified: size = 576.00 MiB ( 4096 cells, 36 layers, 1/1 seqs), K (f16): 288.00 MiB, V (f16): 288.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 1
llama_context: max_nodes = 3184
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context: CPU compute buffer size = 308.23 MiB
llama_context: graph nodes = 1411
llama_context: graph splits = 1
time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1288 msg="llama runner started in 1.07 seconds"
time=2025-09-05T14:32:45.010+08:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1250 msg="waiting for llama runner to start responding"
time=2025-09-05T14:32:45.010+08:00 level=INFO source=server.go:1288 msg="llama runner started in 1.08 seconds"
time=2025-09-05T14:32:45.010+08:00 level=DEBUG source=sched.go:485 msg="finished setting up" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096
time=2025-09-05T14:32:45.026+08:00 level=DEBUG source=ggml.go:210 msg="key with type not found" key=general.alignment default=32
time=2025-09-05T14:32:45.027+08:00 level=TRACE source=server.go:1554 msg="embedding request" input="Why is the sky blue?"
time=2025-09-05T14:32:45.027+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=7 used=0 remaining=7
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:5280: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
[GIN] 2025/09/05 - 14:32:45 | 500 | 1.6630946s | 127.0.0.1 | POST "/api/embed"
time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:493 msg="context for request finished"
time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 duration=5m0s
time=2025-09-05T14:32:45.186+08:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/Qwen/Qwen3-Embedding-4B-GGUF:Q4_K_M runner.inference=cuda runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=1 runner.pid=645292 runner.model=C:\Users\huangzy.ollama\models\blobs\sha256-2b0cf8f17b4c723c27303015383c27ec4bf2d8314bb677d05e920dd70bb0f16b runner.num_ctx=4096 refCount=0
time=2025-09-05T14:32:45.273+08:00 level=ERROR source=server.go:424 msg="llama runner terminated" error="exit status 0xc0000409"
full flog with disabled gpu
@Muzixin commented on GitHub (Sep 12, 2025):
the same question
@pdevine commented on GitHub (Sep 17, 2025):
#12301 implements qwen3-embedding on the Ollama engine (instead of the legacy llama.cpp engine).
@ubaldus commented on GitHub (Oct 13, 2025):
Hi, the error is still present in version 0.12.5 when using the CPU only.
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 32768
llama_context: n_ctx_per_seq = 32768
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = disabled
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: CPU output buffer size = 0.58 MiB
llama_kv_cache: CPU KV buffer size = 3584.00 MiB
llama_kv_cache: size = 3584.00 MiB ( 32768 cells, 28 layers, 1/1 seqs), K (f16): 1792.00 MiB, V (f16): 1792.00 MiB
llama_context: CPU compute buffer size = 1104.01 MiB
llama_context: graph nodes = 1127
llama_context: graph splits = 1
ops.cpp:4663: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
@pdevine commented on GitHub (Oct 13, 2025):
OK, I can confirm it's happening on CPU w/ Windows. It works fine w/ GPU w/ CUDA on Windows, and is fine in both CPU/GPU w/ macOS.
@rick-github commented on GitHub (Oct 13, 2025):
Fails on Linux/CPU too.
@pdevine commented on GitHub (Oct 13, 2025):
My guess is x86 vs arm. There is a bump of GGML coming, so I'll test with that and see if it was fixed upstream.
@kitche1985 commented on GitHub (Oct 15, 2025):
fails on macOS GPU metal too
@pdevine commented on GitHub (Oct 15, 2025):
@kitche1985 I've tried reproducing it on metal both on the GPU and CPU and not seen any issues. What are you seeing?
@rasheduzzaman-brur commented on GitHub (Oct 16, 2025):
I am facing the same issues in updated version [ERROR] Error processing rashed.txt: Error raised by inference API HTTP code: 500, {"error":"do embedding request: Post "http://127.0.0.1:40825/embedding": EOF"}
@jmorganca commented on GitHub (Oct 27, 2025):
Hi all, this should be fixed now as of 0.12.6. Let me know if you're still seeing the issue