mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 16:11:34 -05:00
Closed
opened 2026-04-22 13:58:48 -05:00 by GiteaMirror
·
19 comments
No Branch/Tag Specified
main
dhiltgen/ci
parth-launch-plan-gating
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#32573
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @liuyixia-make on GitHub (Apr 22, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10369
What is the issue?
This has never happened with previous versions, this problem occurred with the 0.65 upgrade, and my environment has been configured to use the gpu:
Environment="USE_GPU=True"
Environment="CUDA_VISIBLE_DEVICES=0,1"
Environment="OLLAMA_FORCE_GPU=1"
Environment="OLLAMA_SCHED_SPREAD=1"
root@hpry:~# nvidia-smi
Tue Apr 22 20:17:46 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 42% 25C P8 26W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:05:00.0 Off | N/A |
| 42% 22C P8 13W / 350W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Relevant log output
OS
No response
GPU
No response
CPU
No response
Ollama version
No response
@rick-github commented on GitHub (Apr 22, 2025):
The full log will aid in debugging.
@liuyixia-make commented on GitHub (Apr 22, 2025):
Apr 22 21:20:46 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:46 | 200 | 35.346µs | 127.0.0.1 | HEAD "/"
Apr 22 21:20:46 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:46 | 200 | 19.322161ms | 127.0.0.1 | POST "/api/show"
Apr 22 21:20:46 hpry ollama[27413]: time=2025-04-22T21:20:46.666+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.6 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:46 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:46 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:46 hpry ollama[27413]: calling cuInit
Apr 22 21:20:46 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:46 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:46 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:46 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:46 hpry ollama[27413]: device count 2
Apr 22 21:20:46 hpry ollama[27413]: time=2025-04-22T21:20:46.927+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.003+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.003+08:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.022+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:47 hpry ollama[27413]: calling cuInit
Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:47 hpry ollama[27413]: device count 2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.175+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=INFO source=sched.go:732 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 library=cuda parallel=4 required="23.4 GiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.243+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:47 hpry ollama[27413]: calling cuInit
Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:47 hpry ollama[27413]: device count 2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.393+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=INFO source=server.go:105 msg="system memory" total="125.0 GiB" free="122.5 GiB" free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.459+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:20:47 hpry ollama[27413]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuInit - 0x7d4c95c7cbc0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDriverGetVersion - 0x7d4c95c7cbe0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetCount - 0x7d4c95c7cc20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGet - 0x7d4c95c7cc00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetAttribute - 0x7d4c95c7cd00
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetUuid - 0x7d4c95c7cc60
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuDeviceGetName - 0x7d4c95c7cc40
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxCreate_v3 - 0x7d4c95c7cee0
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuMemGetInfo_v2 - 0x7d4c95c86e20
Apr 22 21:20:47 hpry ollama[27413]: dlsym: cuCtxDestroy - 0x7d4c95ce1850
Apr 22 21:20:47 hpry ollama[27413]: calling cuInit
Apr 22 21:20:47 hpry ollama[27413]: calling cuDriverGetVersion
Apr 22 21:20:47 hpry ollama[27413]: raw version 0x2f08
Apr 22 21:20:47 hpry ollama[27413]: CUDA driver version: 12.4
Apr 22 21:20:47 hpry ollama[27413]: calling cuDeviceGetCount
Apr 22 21:20:47 hpry ollama[27413]: device count 2
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.609+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:20:47 hpry ollama[27413]: releasing cuda driver library
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[23.4 GiB 23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.4 GiB" memory.required.partial="23.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[12.0 GiB 11.4 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=INFO source=server.go:185 msg="enabling flash attention"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type=""
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.669+08:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[cuda_v11]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:20:47 hpry ollama[27413]: print_info: file format = GGUF V3 (latest)
Apr 22 21:20:47 hpry ollama[27413]: print_info: file type = Q4_K - Medium
Apr 22 21:20:47 hpry ollama[27413]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:20:47 hpry ollama[27413]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: special tokens cache size = 22
Apr 22 21:20:47 hpry ollama[27413]: load: token to piece cache size = 0.9310 MB
Apr 22 21:20:47 hpry ollama[27413]: print_info: arch = qwen2
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab_only = 1
Apr 22 21:20:47 hpry ollama[27413]: print_info: model type = ?B
Apr 22 21:20:47 hpry ollama[27413]: print_info: model params = 32.76 B
Apr 22 21:20:47 hpry ollama[27413]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab type = BPE
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_vocab = 152064
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_merges = 151387
Apr 22 21:20:47 hpry ollama[27413]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: LF token = 198 'Ċ'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: max token length = 256
Apr 22 21:20:47 hpry ollama[27413]: llama_model_load: vocab only - skipping tensors
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --flash-attn --parallel 4 --tensor-split 33,32 --port 44667"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin CUDA_VISIBLE_DEVICES=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4,GPU-de554a1a-5def-5a94-397d-5512a37432da LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v11:/usr/local/lib/ollama]"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.781+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=runner.go:853 msg="starting go runner"
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load blas: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-blas.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cann: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cann.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cuda: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cuda.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load hip: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-hip.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load kompute: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-kompute.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load metal: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-metal.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load rpc: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-rpc.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load sycl: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-sycl.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load vulkan: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-vulkan.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load opencl: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-opencl.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load musa: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-musa.so]
Apr 22 21:20:47 hpry ollama[27413]: ggml_backend_try_load_best: failed to load cpu: filesystem error: status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-cpu.so]
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.788+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Apr 22 21:20:47 hpry ollama[27413]: time=2025-04-22T21:20:47.789+08:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:44667"
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:20:47 hpry ollama[27413]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:20:47 hpry ollama[27413]: print_info: file format = GGUF V3 (latest)
Apr 22 21:20:47 hpry ollama[27413]: print_info: file type = Q4_K - Medium
Apr 22 21:20:47 hpry ollama[27413]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:20:47 hpry ollama[27413]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:20:47 hpry ollama[27413]: load: special tokens cache size = 22
Apr 22 21:20:47 hpry ollama[27413]: load: token to piece cache size = 0.9310 MB
Apr 22 21:20:47 hpry ollama[27413]: print_info: arch = qwen2
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab_only = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ctx_train = 32768
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd = 5120
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_layer = 64
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_head = 40
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_head_kv = 8
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_rot = 128
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_swa = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_head_k = 128
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_head_v = 128
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_gqa = 5
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_k_gqa = 1024
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_embd_v_gqa = 1024
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_norm_eps = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_norm_rms_eps = 1.0e-06
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_clamp_kqv = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_max_alibi_bias = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: f_logit_scale = 0.0e+00
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ff = 27648
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_expert = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_expert_used = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: causal attn = 1
Apr 22 21:20:47 hpry ollama[27413]: print_info: pooling type = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: rope type = 2
Apr 22 21:20:47 hpry ollama[27413]: print_info: rope scaling = linear
Apr 22 21:20:47 hpry ollama[27413]: print_info: freq_base_train = 1000000.0
Apr 22 21:20:47 hpry ollama[27413]: print_info: freq_scale_train = 1
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_ctx_orig_yarn = 32768
Apr 22 21:20:47 hpry ollama[27413]: print_info: rope_finetuned = unknown
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_conv = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_inner = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_d_state = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_dt_rank = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: ssm_dt_b_c_rms = 0
Apr 22 21:20:47 hpry ollama[27413]: print_info: model type = 32B
Apr 22 21:20:47 hpry ollama[27413]: print_info: model params = 32.76 B
Apr 22 21:20:47 hpry ollama[27413]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:20:47 hpry ollama[27413]: print_info: vocab type = BPE
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_vocab = 152064
Apr 22 21:20:47 hpry ollama[27413]: print_info: n_merges = 151387
Apr 22 21:20:47 hpry ollama[27413]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: LF token = 198 'Ċ'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:20:47 hpry ollama[27413]: print_info: max token length = 256
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 0 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 1 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 2 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 3 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 4 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 5 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 6 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 7 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 8 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 9 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 10 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 11 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 12 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 13 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 14 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 15 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 16 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 17 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 18 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 19 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 20 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 21 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 22 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 23 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 24 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 25 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 26 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 27 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 28 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 29 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 30 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 31 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 32 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 33 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 34 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 35 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 36 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 37 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 38 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 39 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 40 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 41 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 42 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 43 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 44 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 45 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 46 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 47 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 48 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 49 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 50 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 51 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 52 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 53 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 54 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 55 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 56 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 57 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 58 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 59 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 60 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 61 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 62 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 63 assigned to device CPU
Apr 22 21:20:47 hpry ollama[27413]: load_tensors: layer 64 assigned to device CPU
Apr 22 21:20:48 hpry ollama[27413]: time=2025-04-22T21:20:48.033+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
Apr 22 21:20:48 hpry ollama[27413]: load_tensors: CPU_Mapped model buffer size = 18926.01 MiB
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_seq_max = 4
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx = 8192
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx_per_seq = 2048
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_batch = 2048
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ubatch = 512
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: flash_attn = 1
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: freq_base = 1000000.0
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: freq_scale = 1
Apr 22 21:20:48 hpry ollama[27413]: llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
Apr 22 21:20:48 hpry ollama[27413]: time=2025-04-22T21:20:48.784+08:00 level=DEBUG source=server.go:625 msg="model load progress 1.00"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.035+08:00 level=DEBUG source=server.go:628 msg="model load completed, waiting for server to become available" status="llm server loading model"
Apr 22 21:20:49 hpry ollama[27413]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: CPU output buffer size = 2.40 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: CPU compute buffer size = 307.00 MiB
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: graph nodes = 1991
Apr 22 21:20:49 hpry ollama[27413]: llama_init_from_model: graph splits = 1
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.76 seconds"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:464 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41
Apr 22 21:20:49 hpry ollama[27413]: [GIN] 2025/04/22 - 21:20:49 | 200 | 2.880933624s | 127.0.0.1 | POST "/api/generate"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:468 msg="context for request finished"
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 duration=5m0s
Apr 22 21:20:49 hpry ollama[27413]: time=2025-04-22T21:20:49.536+08:00 level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 refCount=0
root@hpry:~#
@rick-github commented on GitHub (Apr 22, 2025):
It seems like you have some file permission problems:
This is preventing the runner from loading the GPU enabled backends, and so only basic CPU is used for inference:
The reason that
ollama psreports "100% GPU" is because it fully expected to be able to load the GPU backends, but the permissions problem prevented that.What's the output of the following:
@liuyixia-make commented on GitHub (Apr 22, 2025):
root@hpry:~# p="/usr/local/lib/ollama/cuda_v11/libggml-blas.so"; ls -ld / $(while [ "$p" != "/" ]; do echo "$p"; p=$(dirname "$p"); done | tac)
ls: cannot access '/usr/local/lib/ollama/cuda_v11/libggml-blas.so': No such file or directory
drwxr-xr-x 25 root root 4096 Apr 2 08:16 /
drwxr-xr-x 12 root root 4096 Feb 17 04:51 /usr
drwxr-xr-x 10 root root 4096 Feb 17 04:51 /usr/local
drwxr-xr-x 6 root root 4096 Apr 17 08:43 /usr/local/lib
drwxr-xr-x 3 ollama ollama 4096 Apr 17 08:43 /usr/local/lib/ollama
drwxr-xr-x 2 ollama ollama 4096 Apr 17 08:44 /usr/local/lib/ollama/cuda_v11
@liuyixia-make commented on GitHub (Apr 22, 2025):
I solved the permissions problem but it's still assigned to the cpu.
root@hpry:~# journalctl -u ollama -n 500 --no-pager
Apr 22 21:35:11 hpry ollama[28685]: [GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4] Compute Capability 8.6
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5710 unique_id=0
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card0/device/vendor error="open /sys/class/drm/card0/device/vendor: no such file or directory"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card0-Unknown-1/device/vendor error="open /sys/class/drm/card0-Unknown-1/device/vendor: no such file or directory"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
Apr 22 21:35:11 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-de554a1a-5def-5a94-397d-5512a37432da library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB"
Apr 22 21:35:11 hpry ollama[28685]: time=2025-04-22T21:35:11.156+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.4 GiB"
Apr 22 21:35:35 hpry ollama[28685]: [GIN] 2025/04/22 - 21:35:35 | 200 | 42.629µs | 127.0.0.1 | HEAD "/"
Apr 22 21:35:35 hpry ollama[28685]: [GIN] 2025/04/22 - 21:35:35 | 200 | 18.904389ms | 127.0.0.1 | POST "/api/show"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.422+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.6 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:35 hpry ollama[28685]: calling cuInit
Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:35 hpry ollama[28685]: device count 2
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.675+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.749+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.749+08:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.766+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.4 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:35 hpry ollama[28685]: calling cuInit
Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:35 hpry ollama[28685]: device count 2
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.925+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:35 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=INFO source=sched.go:732 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 library=cuda parallel=4 required="23.4 GiB"
Apr 22 21:35:35 hpry ollama[28685]: time=2025-04-22T21:35:35.988+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.4 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:35 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:35 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:35 hpry ollama[28685]: calling cuInit
Apr 22 21:35:35 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:35 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:35 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:35 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:35 hpry ollama[28685]: device count 2
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.141+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=INFO source=server.go:105 msg="system memory" total="125.0 GiB" free="122.5 GiB" free_swap="8.0 GiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=2 available="[23.4 GiB 23.4 GiB]"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.205+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.206+08:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.0 GiB" before.free="122.5 GiB" before.free_swap="8.0 GiB" now.total="125.0 GiB" now.free="122.5 GiB" now.free_swap="8.0 GiB"
Apr 22 21:35:36 hpry ollama[28685]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuInit - 0x72007127cbc0
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDriverGetVersion - 0x72007127cbe0
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetCount - 0x72007127cc20
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGet - 0x72007127cc00
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetAttribute - 0x72007127cd00
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetUuid - 0x72007127cc60
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuDeviceGetName - 0x72007127cc40
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuCtxCreate_v3 - 0x72007127cee0
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuMemGetInfo_v2 - 0x720071286e20
Apr 22 21:35:36 hpry ollama[28685]: dlsym: cuCtxDestroy - 0x7200712e1850
Apr 22 21:35:36 hpry ollama[28685]: calling cuInit
Apr 22 21:35:36 hpry ollama[28685]: calling cuDriverGetVersion
Apr 22 21:35:36 hpry ollama[28685]: raw version 0x2f08
Apr 22 21:35:36 hpry ollama[28685]: CUDA driver version: 12.4
Apr 22 21:35:36 hpry ollama[28685]: calling cuDeviceGetCount
Apr 22 21:35:36 hpry ollama[28685]: device count 2
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.355+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-de554a1a-5def-5a94-397d-5512a37432da name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4 name="NVIDIA GeForce RTX 3090" overhead="0 B" before.total="23.7 GiB" before.free="23.4 GiB" now.total="23.7 GiB" now.free="23.4 GiB" now.used="260.9 MiB"
Apr 22 21:35:36 hpry ollama[28685]: releasing cuda driver library
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[23.4 GiB 23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="23.4 GiB" memory.required.partial="23.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[12.0 GiB 11.4 GiB]" memory.weights.total="18.1 GiB" memory.weights.repeating="17.5 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=128
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=INFO source=server.go:185 msg="enabling flash attention"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type=""
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.423+08:00 level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[cuda_v11]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:35:36 hpry ollama[28685]: print_info: file format = GGUF V3 (latest)
Apr 22 21:35:36 hpry ollama[28685]: print_info: file type = Q4_K - Medium
Apr 22 21:35:36 hpry ollama[28685]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:35:36 hpry ollama[28685]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: special tokens cache size = 22
Apr 22 21:35:36 hpry ollama[28685]: load: token to piece cache size = 0.9310 MB
Apr 22 21:35:36 hpry ollama[28685]: print_info: arch = qwen2
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab_only = 1
Apr 22 21:35:36 hpry ollama[28685]: print_info: model type = ?B
Apr 22 21:35:36 hpry ollama[28685]: print_info: model params = 32.76 B
Apr 22 21:35:36 hpry ollama[28685]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab type = BPE
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_vocab = 152064
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_merges = 151387
Apr 22 21:35:36 hpry ollama[28685]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: LF token = 198 'Ċ'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: max token length = 256
Apr 22 21:35:36 hpry ollama[28685]: llama_model_load: vocab only - skipping tensors
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --flash-attn --parallel 4 --tensor-split 33,32 --port 37071"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin CUDA_VISIBLE_DEVICES=GPU-d73a3740-e0e8-a3a5-0ce2-0388bf79bdf4,GPU-de554a1a-5def-5a94-397d-5512a37432da LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v11:/usr/local/lib/ollama]"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.534+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.535+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.535+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.542+08:00 level=INFO source=runner.go:853 msg="starting go runner"
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.542+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v11
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Apr 22 21:35:36 hpry ollama[28685]: time=2025-04-22T21:35:36.543+08:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:37071"
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: loaded meta data with 34 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-eabc98a9bcbfce7fd70f3e07de599f8fda98120fefed5881934161ede8bd1a41 (version GGUF V3 (latest))
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 0: general.architecture str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 1: general.type str = model
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 2: general.name str = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 3: general.finetune str = Instruct
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 4: general.basename str = Qwen2.5
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 5: general.size_label str = 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 6: general.license str = apache-2.0
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-3...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 8: general.base_model.count u32 = 1
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 14: qwen2.block_count u32 = 64
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 5120
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 27648
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 40
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 8
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 22: general.file_type u32 = 15
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - kv 33: general.quantization_version u32 = 2
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type f32: 321 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q4_K: 385 tensors
Apr 22 21:35:36 hpry ollama[28685]: llama_model_loader: - type q6_K: 65 tensors
Apr 22 21:35:36 hpry ollama[28685]: print_info: file format = GGUF V3 (latest)
Apr 22 21:35:36 hpry ollama[28685]: print_info: file type = Q4_K - Medium
Apr 22 21:35:36 hpry ollama[28685]: print_info: file size = 18.48 GiB (4.85 BPW)
Apr 22 21:35:36 hpry ollama[28685]: init_tokenizer: initializing tokenizer for type 2
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151660 '<|fim_middle|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151653 '<|vision_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151648 '<|box_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151649 '<|box_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151655 '<|image_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151651 '<|quad_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151652 '<|vision_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151654 '<|vision_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151656 '<|video_pad|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151644 '<|im_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: control token: 151650 '<|quad_start|>' is not marked as EOG
Apr 22 21:35:36 hpry ollama[28685]: load: special tokens cache size = 22
Apr 22 21:35:36 hpry ollama[28685]: load: token to piece cache size = 0.9310 MB
Apr 22 21:35:36 hpry ollama[28685]: print_info: arch = qwen2
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab_only = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ctx_train = 32768
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd = 5120
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_layer = 64
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_head = 40
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_head_kv = 8
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_rot = 128
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_swa = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_head_k = 128
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_head_v = 128
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_gqa = 5
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_k_gqa = 1024
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_embd_v_gqa = 1024
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_norm_eps = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_norm_rms_eps = 1.0e-06
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_clamp_kqv = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_max_alibi_bias = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: f_logit_scale = 0.0e+00
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ff = 27648
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_expert = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_expert_used = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: causal attn = 1
Apr 22 21:35:36 hpry ollama[28685]: print_info: pooling type = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: rope type = 2
Apr 22 21:35:36 hpry ollama[28685]: print_info: rope scaling = linear
Apr 22 21:35:36 hpry ollama[28685]: print_info: freq_base_train = 1000000.0
Apr 22 21:35:36 hpry ollama[28685]: print_info: freq_scale_train = 1
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_ctx_orig_yarn = 32768
Apr 22 21:35:36 hpry ollama[28685]: print_info: rope_finetuned = unknown
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_conv = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_inner = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_d_state = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_dt_rank = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: ssm_dt_b_c_rms = 0
Apr 22 21:35:36 hpry ollama[28685]: print_info: model type = 32B
Apr 22 21:35:36 hpry ollama[28685]: print_info: model params = 32.76 B
Apr 22 21:35:36 hpry ollama[28685]: print_info: general.name = Qwen2.5 32B Instruct
Apr 22 21:35:36 hpry ollama[28685]: print_info: vocab type = BPE
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_vocab = 152064
Apr 22 21:35:36 hpry ollama[28685]: print_info: n_merges = 151387
Apr 22 21:35:36 hpry ollama[28685]: print_info: BOS token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOS token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOT token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: PAD token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: LF token = 198 'Ċ'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PRE token = 151659 '<|fim_prefix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SUF token = 151661 '<|fim_suffix|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM MID token = 151660 '<|fim_middle|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM PAD token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM REP token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: FIM SEP token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151643 '<|endoftext|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151645 '<|im_end|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151662 '<|fim_pad|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151663 '<|repo_name|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: EOG token = 151664 '<|file_sep|>'
Apr 22 21:35:36 hpry ollama[28685]: print_info: max token length = 256
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: loading model tensors, this can take a while... (mmap = true)
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 0 assigned to device CPU
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 1 assigned to device CPU
Apr 22 21:35:36 hpry ollama[28685]: load_tensors: layer 2 assigned to device CPU
...
...
@rick-github commented on GitHub (Apr 22, 2025):
Your installation appears to be incomplete.
@liuyixia-make commented on GitHub (Apr 22, 2025):
Something doesn't seem right.
root@hpry:/usr/local/lib/ollama/cuda_v11# ls -l /usr/local/lib/ollama/cuda_v11
total 1010100
-rwxr-xr-x 1 ollama ollama 93134848 Apr 17 08:44 libcublasLt.so.11.5.1.109
lrwxrwxrwx 1 ollama ollama 23 Apr 7 12:34 libcublas.so.11 -> libcublas.so.11.5.1.109
-rwxr-xr-x 1 ollama ollama 121866104 May 5 2021 libcublas.so.11.5.1.109
-rwxr-xr-x 1 ollama ollama 819327824 Apr 7 12:34 libggml-cuda.so
@shaofanqi commented on GitHub (Apr 23, 2025):
please help.
4月 23 17:39:17 ollama[2069]: time=2025-04-23T17:39:17.821+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="246.1 GiB" before.free_swap="2.0 GiB" now.total="251.5 GiB" now.free="246.1 GiB" now.free_swap="2.0 GiB"
4月 23 17:39:17 ollama[2069]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 17:39:17 ollama[2069]: dlsym: cuInit - 0x7f6c62679ef0
4月 23 17:39:17 ollama[2069]: dlsym: cuDriverGetVersion - 0x7f6c62679f10
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetCount - 0x7f6c62679f50
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGet - 0x7f6c62679f30
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetAttribute - 0x7f6c6267a030
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetUuid - 0x7f6c62679f90
4月 23 17:39:17 ollama[2069]: dlsym: cuDeviceGetName - 0x7f6c62679f70
4月 23 17:39:17 ollama[2069]: dlsym: cuCtxCreate_v3 - 0x7f6c6267a210
4月 23 17:39:17 ollama[2069]: dlsym: cuMemGetInfo_v2 - 0x7f6c62684190
4月 23 17:39:17 ollama[2069]: dlsym: cuCtxDestroy - 0x7f6c626de7f0
4月 23 17:39:17 ollama[2069]: calling cuInit
4月 23 17:39:17 ollama[2069]: calling cuDriverGetVersion
4月 23 17:39:17 ollama[2069]: raw version 0x2f08
4月 23 17:39:17 ollama[2069]: CUDA driver version: 12.4
4月 23 17:39:17 ollama[2069]: calling cuDeviceGetCount
4月 23 17:39:17 ollama[2069]: device count 2
4月 23 17:39:17 ollama[2069]: time=2025-04-23T17:39:17.932+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.039+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.4 GiB"
4月 23 17:39:18 ollama[2069]: releasing cuda driver library
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.080+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.081+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.081+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.082+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.083+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.084+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.085+08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 library=cuda parallel=4 required="37.4 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.085+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="246.1 GiB" before.free_swap="2.0 GiB" now.total="251.5 GiB" now.free="246.1 GiB" now.free_swap="2.0 GiB"
4月 23 17:39:18 ollama[2069]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 17:39:18 ollama[2069]: dlsym: cuInit - 0x7f6c62679ef0
4月 23 17:39:18 ollama[2069]: dlsym: cuDriverGetVersion - 0x7f6c62679f10
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetCount - 0x7f6c62679f50
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGet - 0x7f6c62679f30
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetAttribute - 0x7f6c6267a030
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetUuid - 0x7f6c62679f90
4月 23 17:39:18 ollama[2069]: dlsym: cuDeviceGetName - 0x7f6c62679f70
4月 23 17:39:18 ollama[2069]: dlsym: cuCtxCreate_v3 - 0x7f6c6267a210
4月 23 17:39:18 ollama[2069]: dlsym: cuMemGetInfo_v2 - 0x7f6c62684190
4月 23 17:39:18 ollama[2069]: dlsym: cuCtxDestroy - 0x7f6c626de7f0
4月 23 17:39:18 ollama[2069]: calling cuInit
4月 23 17:39:18 ollama[2069]: calling cuDriverGetVersion
4月 23 17:39:18 ollama[2069]: raw version 0x2f08
4月 23 17:39:18 ollama[2069]: CUDA driver version: 12.4
4月 23 17:39:18 ollama[2069]: calling cuDeviceGetCount
4月 23 17:39:18 ollama[2069]: device count 2
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.176+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.4 GiB"
4月 23 17:39:18 ollama[2069]: releasing cuda driver library
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=INFO source=server.go:104 msg="system memory" total="251.5 GiB" free="246.1 GiB" free_swap="2.0 GiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.263+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.264+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=65 layers.split=33,32 memory.available="[20.2 GiB 19.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="37.4 GiB" memory.required.partial="37.4 GiB" memory.required.kv="2.0 GiB" memory.required.allocations="[19.1 GiB 18.3 GiB]" memory.weights.total="32.9 GiB" memory.weights.repeating="32.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="916.1 MiB" memory.graph.partial="916.1 MiB"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=DEBUG source=gpu.go:713 msg="no filter required for library cpu"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 --ctx-size 8192 --batch-size 512 --n-gpu-layers 65 --verbose --threads 16 --parallel 4 --tensor-split 33,32 --port 35827"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=DEBUG source=server.go:393 msg=subprocess environment="[PATH=/home/eppei/miniconda3/envs/vllm/bin:/home/eppei/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:.:/usr/local/bin]"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.265+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.266+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:936 msg="starting go runner"
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.287+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:35827"
4月 23 17:39:18 ollama[2069]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest))
4月 23 17:39:18 ollama[2069]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 0: general.architecture str = qwen2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 1: general.type str = model
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 2: general.name str = QwQ 32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 3: general.basename str = QwQ
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 4: general.size_label str = 32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 5: general.license str = apache-2.0
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 7: general.base_model.count u32 = 1
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"]
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 31: general.quantization_version u32 = 2
4月 23 17:39:18 ollama[2069]: llama_model_loader: - kv 32: general.file_type u32 = 7
4月 23 17:39:18 ollama[2069]: llama_model_loader: - type f32: 321 tensors
4月 23 17:39:18 ollama[2069]: llama_model_loader: - type q8_0: 450 tensors
4月 23 17:39:18 ollama[2069]: time=2025-04-23T17:39:18.517+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
4月 23 17:39:18 ollama[2069]: llm_load_vocab: special tokens cache size = 26
4月 23 17:39:18 ollama[2069]: llm_load_vocab: token to piece cache size = 0.9311 MB
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: format = GGUF V3 (latest)
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: arch = qwen2
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: vocab type = BPE
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_vocab = 152064
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_merges = 151387
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: vocab_only = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ctx_train = 131072
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd = 5120
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_layer = 64
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_head = 40
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_head_kv = 8
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_rot = 128
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_swa = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_head_k = 128
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_head_v = 128
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_gqa = 5
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_k_gqa = 1024
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_embd_v_gqa = 1024
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_norm_eps = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: f_logit_scale = 0.0e+00
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ff = 27648
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_expert = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_expert_used = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: causal attn = 1
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: pooling type = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope type = 2
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope scaling = linear
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: freq_base_train = 1000000.0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: freq_scale_train = 1
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: n_ctx_orig_yarn = 131072
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: rope_finetuned = unknown
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_conv = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_inner = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_d_state = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_dt_rank = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: ssm_dt_b_c_rms = 0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model type = 32B
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model ftype = Q8_0
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model params = 32.76 B
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW)
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: general.name = QwQ 32B
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: LF token = 148848 'ÄĬ'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151645 '<|im_end|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
4月 23 17:39:18 ollama[2069]: llm_load_print_meta: max token length = 256
4月 23 17:39:18 ollama[2069]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 770 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
4月 23 17:39:20 ollama[2069]: llm_load_tensors: CPU_Mapped model buffer size = 33202.08 MiB
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_seq_max = 4
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx = 8192
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx_per_seq = 2048
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_batch = 2048
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ubatch = 512
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: flash_attn = 0
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: freq_base = 1000000.0
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: freq_scale = 1
4月 23 17:39:20 ollama[2069]: llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 17:39:20 ollama[2069]: time=2025-04-23T17:39:20.278+08:00 level=DEBUG source=server.go:600 msg="model load progress 1.00"
4月 23 17:39:20 ollama[2069]: time=2025-04-23T17:39:20.529+08:00 level=DEBUG source=server.go:603 msg="model load completed, waiting for server to become available" status="llm server loading model"
4月 23 17:39:21 ollama[2069]: llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: CPU output buffer size = 2.40 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: CPU compute buffer size = 696.01 MiB
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: graph nodes = 2246
4月 23 17:39:21 ollama[2069]: llama_new_context_with_model: graph splits = 1
4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.534+08:00 level=INFO source=server.go:594 msg="llama runner started in 3.27 seconds"
4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.534+08:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 17:39:21 ollama[2069]: time=2025-04-23T17:39:21.536+08:00 level=DEBUG source=server.go:966 msg="new runner detected, loading model for cgo tokenization"
4月 23 17:39:21 ollama[2069]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest))
4月 23 17:39:21 ollama[2069]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 0: general.architecture str = qwen2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 1: general.type str = model
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 2: general.name str = QwQ 32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 3: general.basename str = QwQ
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 4: general.size_label str = 32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 5: general.license str = apache-2.0
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 7: general.base_model.count u32 = 1
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"]
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 31: general.quantization_version u32 = 2
4月 23 17:39:21 ollama[2069]: llama_model_loader: - kv 32: general.file_type u32 = 7
4月 23 17:39:21 ollama[2069]: llama_model_loader: - type f32: 321 tensors
4月 23 17:39:21 ollama[2069]: llama_model_loader: - type q8_0: 450 tensors
4月 23 17:39:21 ollama[2069]: llm_load_vocab: special tokens cache size = 26
4月 23 17:39:22 ollama[2069]: llm_load_vocab: token to piece cache size = 0.9311 MB
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: format = GGUF V3 (latest)
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: arch = qwen2
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: vocab type = BPE
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: n_vocab = 152064
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: n_merges = 151387
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: vocab_only = 1
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model type = ?B
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model ftype = all F32
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model params = 32.76 B
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW)
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: general.name = QwQ 32B
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: LF token = 148848 'ÄĬ'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151645 '<|im_end|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
4月 23 17:39:22 ollama[2069]: llm_load_print_meta: max token length = 256
4月 23 17:39:22 ollama[2069]: llama_model_load: vocab only - skipping tensors
4月 23 17:39:22 ollama[2069]: time=2025-04-23T17:39:22.347+08:00 level=DEBUG source=routes.go:1470 msg="chat request" images=0 prompt="<|im_start|>system\n你是xxx<|im_end|>\n<|im_start|>user\n你好<|im_end|>\n<|im_start|>assistant\n"
4月 23 17:39:22 ollama[2069]: time=2025-04-23T17:39:22.404+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=1594 used=0 remaining=1594
4月 23 17:39:35 ollama[2069]: [GIN] 2025/04/23 - 17:39:35 | 200 | 42.814µs | 127.0.0.1 | HEAD "/"
4月 23 17:39:35 ollama[2069]: [GIN] 2025/04/23 - 17:39:35 | 200 | 53.549µs | 127.0.0.1 | GET "/api/ps"
4月 23 18:01:48 ollama[2069]: [GIN] 2025/04/23 - 18:01:48 | 200 | 22m31s | 127.0.0.1 | POST "/v1/chat/completions"
4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:466 msg="context for request finished"
4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 duration=5m0s
4月 23 18:01:48 ollama[2069]: time=2025-04-23T18:01:48.824+08:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0
@rick-github commented on GitHub (Apr 23, 2025):
Runner didn't find any backends. What's the output from
@shaofanqi commented on GitHub (Apr 23, 2025):
/usr/local/lib/ollama
this folder is empty
@rick-github commented on GitHub (Apr 23, 2025):
You installation is incomplete or otherwise incorrect. The output should be something like:
That is, the directory should contain the CPU and GPU backends for ollama. How did you install ollama?
@shaofanqi commented on GitHub (Apr 23, 2025):
sudo lsof -c ollama
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/128/gvfs
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.portal file system /run/user/128/doc
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ollama 2069 ollama cwd DIR 259,3 4096 2 /
ollama 2069 ollama rtd DIR 259,3 4096 2 /
ollama 2069 ollama txt REG 259,3 30038968 36732173 /usr/local/bin/ollama
ollama 2069 ollama mem REG 259,3 28392536 36599527 /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
ollama 2069 ollama mem CHR 195,1 984 /dev/nvidia1
ollama 2069 ollama mem CHR 195,0 983 /dev/nvidia0
ollama 2069 ollama mem REG 259,3 2220400 36571227 /usr/lib/x86_64-linux-gnu/libc.so.6
ollama 2069 ollama mem REG 259,3 2260296 36597726 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
ollama 2069 ollama mem REG 259,3 125488 36596952 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
ollama 2069 ollama mem REG 259,3 940560 36571232 /usr/lib/x86_64-linux-gnu/libm.so.6
ollama 2069 ollama mem REG 259,3 14664 36571276 /usr/lib/x86_64-linux-gnu/librt.so.1
ollama 2069 ollama mem REG 259,3 21448 36571274 /usr/lib/x86_64-linux-gnu/libpthread.so.0
ollama 2069 ollama mem REG 259,3 14432 36571230 /usr/lib/x86_64-linux-gnu/libdl.so.2
ollama 2069 ollama mem REG 259,3 68552 36571275 /usr/lib/x86_64-linux-gnu/libresolv.so.2
ollama 2069 ollama mem REG 259,3 240936 36571221 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
ollama 2069 ollama 0r CHR 1,3 0t0 5 /dev/null
ollama 2069 ollama 1u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM
ollama 2069 ollama 2u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM
ollama 2069 ollama 3u IPv4 48380 0t0 TCP localhost:11434 (LISTEN)
ollama 2069 ollama 4u a_inode 0,15 0 1049 [eventpoll]
ollama 2069 ollama 5u a_inode 0,15 0 1049 [eventfd]
ollama 2069 ollama 6u IPv4 57398 0t0 TCP localhost:11434->localhost:42744 (ESTABLISHED)
ollama 2069 ollama 7u a_inode 0,15 0 1049 [eventfd]
ollama 2069 ollama 8r FIFO 0,14 0t0 33089 pipe
ollama 2069 ollama 9w FIFO 0,14 0t0 33089 pipe
ollama 2069 ollama 10r FIFO 0,14 0t0 33090 pipe
ollama 2069 ollama 11w FIFO 0,14 0t0 33090 pipe
ollama 2069 ollama 12u CHR 195,255 0t0 982 /dev/nvidiactl
ollama 2069 ollama 13u CHR 234,0 0t0 985 /dev/nvidia-uvm
ollama 2069 ollama 14u CHR 195,0 0t0 983 /dev/nvidia0
ollama 2069 ollama 15u CHR 195,1 0t0 984 /dev/nvidia1
ollama 2069 ollama 16u CHR 195,0 0t0 983 /dev/nvidia0
ollama 2069 ollama 17u CHR 195,0 0t0 983 /dev/nvidia0
ollama 2069 ollama 18u CHR 195,1 0t0 984 /dev/nvidia1
ollama 2069 ollama 19u CHR 195,1 0t0 984 /dev/nvidia1
ollama 2069 ollama 20u unix 0xffffa097cadf7700 0t0 33097 @cuda-uvmfd-4026531836-2069@ type=SEQPACKET
ollama 2069 ollama 22r FIFO 0,14 0t0 36779 pipe
ollama 2069 ollama 26u a_inode 0,15 0 1049 [pidfd]
ollama 16277 ollama cwd DIR 259,3 4096 2 /
ollama 16277 ollama rtd DIR 259,3 4096 2 /
ollama 16277 ollama txt REG 259,3 30038968 36732173 /usr/local/bin/ollama
ollama 16277 ollama mem REG 259,3 34820885056 36603946 /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
ollama 16277 ollama mem REG 259,3 2220400 36571227 /usr/lib/x86_64-linux-gnu/libc.so.6
ollama 16277 ollama mem REG 259,3 2260296 36597726 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
ollama 16277 ollama mem REG 259,3 125488 36596952 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
ollama 16277 ollama mem REG 259,3 940560 36571232 /usr/lib/x86_64-linux-gnu/libm.so.6
ollama 16277 ollama mem REG 259,3 14664 36571276 /usr/lib/x86_64-linux-gnu/librt.so.1
ollama 16277 ollama mem REG 259,3 21448 36571274 /usr/lib/x86_64-linux-gnu/libpthread.so.0
ollama 16277 ollama mem REG 259,3 14432 36571230 /usr/lib/x86_64-linux-gnu/libdl.so.2
ollama 16277 ollama mem REG 259,3 68552 36571275 /usr/lib/x86_64-linux-gnu/libresolv.so.2
ollama 16277 ollama mem REG 259,3 240936 36571221 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
ollama 16277 ollama 0r CHR 1,3 0t0 5 /dev/null
ollama 16277 ollama 1u unix 0xffffa097d54b5d80 0t0 24785 type=STREAM
ollama 16277 ollama 2w FIFO 0,14 0t0 36779 pipe
ollama 16277 ollama 3u IPv4 36782 0t0 TCP localhost:44491 (LISTEN)
ollama 16277 ollama 4u a_inode 0,15 0 1049 [eventpoll]
ollama 16277 ollama 5u a_inode 0,15 0 1049 [eventfd]
It works fine this morning, I use /set parameter num_ctx 131072 this noon and ollama run at cpu and gpu mode. When i use ctrl+d and run /set parameter num_ctx 3000 again, it fails to use gpu.
@rick-github commented on GitHub (Apr 23, 2025):
How do you know this?
@shaofanqi commented on GitHub (Apr 23, 2025):
use ollama ps, it shows cpu 57% gpu 43%
@rick-github commented on GitHub (Apr 23, 2025):
ollama psis what ollama expects the split to be, based on being able load a GPU backend. But if the runner can't find a backend, it will fall back to just CPU, and the output fromollama pswill be incorrect. You need to check the logs to determine whether the runner did in fact load 43% of the model in VRAM.@shaofanqi commented on GitHub (Apr 23, 2025):
This is the log
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.688+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:11 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:11 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:11 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:11 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:11 ollama[2186]: calling cuInit
4月 23 16:32:11 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:11 ollama[2186]: raw version 0x2f08
4月 23 16:32:11 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:11 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:11 ollama[2186]: device count 2
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.777+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.865+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:11 ollama[2186]: releasing cuda driver library
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.865+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:11 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:11 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:11 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:11 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:11 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:11 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:11 ollama[2186]: calling cuInit
4月 23 16:32:11 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:11 ollama[2186]: raw version 0x2f08
4月 23 16:32:11 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:11 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:11 ollama[2186]: device count 2
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.937+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.452710664 model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:11 ollama[2186]: time=2025-04-23T16:32:11.952+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.041+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:12 ollama[2186]: releasing cuda driver library
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.041+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:12 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:12 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:12 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:12 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:12 ollama[2186]: calling cuInit
4月 23 16:32:12 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:12 ollama[2186]: raw version 0x2f08
4月 23 16:32:12 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:12 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:12 ollama[2186]: device count 2
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.078+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.078+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-10670930-fdf7-6410-320c-f8808f4aac75 library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="20.2 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="41.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.079+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="19.8 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="41.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.082+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[20.2 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.084+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=1 available="[19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.085+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-10670930-fdf7-6410-320c-f8808f4aac75 library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="20.2 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="51.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:186 msg="gpu has too little memory to allocate any layers" id=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="21.7 GiB" available="19.8 GiB" minimum_memory=479199232 layer_size="2.5 GiB" gpu_zer_overhead="0 B" partial_offload="51.0 GiB" full_offload="51.0 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:330 msg="insufficient VRAM to load any model layers"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.087+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[20.2 GiB 19.8 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.131+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:12 ollama[2186]: releasing cuda driver library
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.733618916 model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.218+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:32:12 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:32:12 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:32:12 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:32:12 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:32:12 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:32:12 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:32:12 ollama[2186]: calling cuInit
4月 23 16:32:12 ollama[2186]: calling cuDriverGetVersion
4月 23 16:32:12 ollama[2186]: raw version 0x2f08
4月 23 16:32:12 ollama[2186]: CUDA driver version: 12.4
4月 23 16:32:12 ollama[2186]: calling cuDeviceGetCount
4月 23 16:32:12 ollama[2186]: device count 2
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.305+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:32:12 ollama[2186]: releasing cuda driver library
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=INFO source=server.go:104 msg="system memory" total="251.5 GiB" free="243.1 GiB" free_swap="1.4 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.392+08:00 level=DEBUG source=memory.go:107 msg=evaluating library=cuda gpu_count=2 available="[19.8 GiB 20.2 GiB]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=11 layers.split=5,6 memory.available="[19.8 GiB 20.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="92.1 GiB" memory.required.partial="39.2 GiB" memory.required.kv="32.0 GiB" memory.required.allocations="[19.1 GiB 20.1 GiB]" memory.weights.total="62.9 GiB" memory.weights.repeating="62.1 GiB" memory.weights.nonrepeating="788.9 MiB" memory.graph.full="12.8 GiB" memory.graph.partial="12.8 GiB"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=DEBUG source=gpu.go:713 msg="no filter required for library cpu"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 --ctx-size 131072 --batch-size 512 --n-gpu-layers 11 --verbose --threads 16 --parallel 1 --tensor-split 5,6 --port 39391"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.393+08:00 level=DEBUG source=server.go:393 msg=subprocess environment="[PATH=/home/eppei/miniconda3/envs/vllm/bin:/home/eppei/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama:/usr/local/bin]"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.394+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:936 msg="starting go runner"
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | CPU : LLAMAFILE = 1 | AARCH64_REPACK = 1 | cgo(gcc)" threads=16
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.416+08:00 level=INFO source=runner.go:995 msg="Server listening on 127.0.0.1:39391"
4月 23 16:32:12 ollama[2186]: llama_model_loader: loaded meta data with 33 key-value pairs and 771 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 (version GGUF V3 (latest))
4月 23 16:32:12 ollama[2186]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 0: general.architecture str = qwen2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 1: general.type str = model
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 2: general.name str = QwQ 32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 3: general.basename str = QwQ
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 4: general.size_label str = 32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 5: general.license str = apache-2.0
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/QWQ-32B/b...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 7: general.base_model.count u32 = 1
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 8: general.base_model.0.name str = Qwen2.5 32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-32B
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 11: general.tags arr[str,2] = ["chat", "text-generation"]
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 13: qwen2.block_count u32 = 64
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 14: qwen2.context_length u32 = 131072
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 15: qwen2.embedding_length u32 = 5120
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 27648
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 40
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 8
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 22: tokenizer.ggml.pre str = qwen2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 151645
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 151643
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 151643
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = false
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 30: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 31: general.quantization_version u32 = 2
4月 23 16:32:12 ollama[2186]: llama_model_loader: - kv 32: general.file_type u32 = 7
4月 23 16:32:12 ollama[2186]: llama_model_loader: - type f32: 321 tensors
4月 23 16:32:12 ollama[2186]: llama_model_loader: - type q8_0: 450 tensors
4月 23 16:32:12 ollama[2186]: time=2025-04-23T16:32:12.646+08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server loading model"
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
4月 23 16:32:12 ollama[2186]: llm_load_vocab: special tokens cache size = 26
4月 23 16:32:13 ollama[2186]: llm_load_vocab: token to piece cache size = 0.9311 MB
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: format = GGUF V3 (latest)
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: arch = qwen2
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: vocab type = BPE
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_vocab = 152064
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_merges = 151387
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: vocab_only = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ctx_train = 131072
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd = 5120
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_layer = 64
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_head = 40
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_head_kv = 8
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_rot = 128
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_swa = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_head_k = 128
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_head_v = 128
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_gqa = 5
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_k_gqa = 1024
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_embd_v_gqa = 1024
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_norm_eps = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: f_logit_scale = 0.0e+00
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ff = 27648
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_expert = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_expert_used = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: causal attn = 1
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: pooling type = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope type = 2
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope scaling = linear
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: freq_base_train = 1000000.0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: freq_scale_train = 1
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: n_ctx_orig_yarn = 131072
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: rope_finetuned = unknown
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_conv = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_inner = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_d_state = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_dt_rank = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: ssm_dt_b_c_rms = 0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model type = 32B
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model ftype = Q8_0
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model params = 32.76 B
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: model size = 32.42 GiB (8.50 BPW)
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: general.name = QwQ 32B
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOS token = 151645 '<|im_end|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOT token = 151645 '<|im_end|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: LF token = 148848 'ÄĬ'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151645 '<|im_end|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
4月 23 16:32:13 ollama[2186]: llm_load_print_meta: max token length = 256
4月 23 16:32:13 ollama[2186]: llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 770 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
4月 23 16:32:14 ollama[2186]: llm_load_tensors: CPU_Mapped model buffer size = 33202.08 MiB
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_seq_max = 1
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ctx = 131072
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ctx_per_seq = 131072
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_batch = 512
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: n_ubatch = 512
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: flash_attn = 0
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: freq_base = 1000000.0
4月 23 16:32:14 ollama[2186]: llama_new_context_with_model: freq_scale = 1
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: kv_size = 131072, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 64, can_shift = 1
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 48: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 49: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 50: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 51: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 52: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 53: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 54: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 55: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 56: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 57: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 58: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 59: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 60: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 61: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 62: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: llama_kv_cache_init: layer 63: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
4月 23 16:32:14 ollama[2186]: time=2025-04-23T16:32:14.906+08:00 level=DEBUG source=server.go:600 msg="model load progress 1.00"
4月 23 16:32:15 ollama[2186]: time=2025-04-23T16:32:15.158+08:00 level=DEBUG source=server.go:603 msg="model load completed, waiting for server to become available" status="llm server loading model"
4月 23 16:32:31 ollama[2186]: llama_kv_cache_init: CPU KV buffer size = 32768.00 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: KV self size = 32768.00 MiB, K (f16): 16384.00 MiB, V (f16): 16384.00 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: CPU output buffer size = 0.60 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: CPU compute buffer size = 10536.01 MiB
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: graph nodes = 2246
4月 23 16:32:31 ollama[2186]: llama_new_context_with_model: graph splits = 1
4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.489+08:00 level=INFO source=server.go:594 msg="llama runner started in 19.09 seconds"
4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.489+08:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: [48.0K blob data]
4月 23 16:32:31 ollama[2186]: 使提出相应意见建议<|im_end|>\n<|im_start|>assistant\n"
4月 23 16:32:31 ollama[2186]: time=2025-04-23T16:32:31.889+08:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=70179 used=0 remaining=70179
4月 23 16:33:57 ollama[2186]: [GIN] 2025/04/23 - 16:33:57 | 200 | 38.816µs | 127.0.0.1 | HEAD "/"
4月 23 16:33:57 ollama[2186]: [GIN] 2025/04/23 - 16:33:57 | 200 | 63.861µs | 127.0.0.1 | GET "/api/ps"
4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:466 msg="context for request finished"
4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 duration=5m0s
4月 23 16:34:30 ollama[2186]: time=2025-04-23T16:34:30.883+08:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0
4月 23 16:34:30 ollama[2186]: [GIN] 2025/04/23 - 16:34:30 | 200 | 2m24s | 127.0.0.1 | POST "/api/chat"
4月 23 16:34:35 ollama[2186]: [GIN] 2025/04/23 - 16:34:35 | 200 | 35.318µs | 127.0.0.1 | HEAD "/"
4月 23 16:34:35 ollama[2186]: [GIN] 2025/04/23 - 16:34:35 | 200 | 39.087821ms | 127.0.0.1 | POST "/api/show"
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:283 msg="resetting model to expire immediately to make room" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695 refCount=0
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:296 msg="waiting for pending requests to complete and unload to occur" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:35 ollama[2186]: time=2025-04-23T16:34:35.936+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="210.8 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:35 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:35 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:35 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:35 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:35 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:35 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:35 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:35 ollama[2186]: calling cuInit
4月 23 16:34:35 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:35 ollama[2186]: raw version 0x2f08
4月 23 16:34:35 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:35 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:35 ollama[2186]: device count 2
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.075+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.218+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:36 ollama[2186]: releasing cuda driver library
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.218+08:00 level=DEBUG source=server.go:1079 msg="stopping llama server"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.219+08:00 level=DEBUG source=server.go:1085 msg="waiting for llama server to exit"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="210.8 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="216.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:36 ollama[2186]: calling cuInit
4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:36 ollama[2186]: raw version 0x2f08
4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:36 ollama[2186]: device count 2
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.566+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.667+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:36 ollama[2186]: releasing cuda driver library
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.720+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="216.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="222.2 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:36 ollama[2186]: calling cuInit
4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:36 ollama[2186]: raw version 0x2f08
4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:36 ollama[2186]: device count 2
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.817+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.910+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:36 ollama[2186]: releasing cuda driver library
4月 23 16:34:36 ollama[2186]: time=2025-04-23T16:34:36.969+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="222.2 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="227.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:36 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:36 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:36 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:36 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:36 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:36 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:36 ollama[2186]: calling cuInit
4月 23 16:34:36 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:36 ollama[2186]: raw version 0x2f08
4月 23 16:34:36 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:36 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:36 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.066+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.160+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="227.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="231.8 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.328+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.415+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.470+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="231.8 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="236.6 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.566+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.659+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="236.6 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="241.4 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.811+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.898+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:37 ollama[2186]: releasing cuda driver library
4月 23 16:34:37 ollama[2186]: time=2025-04-23T16:34:37.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="241.4 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:37 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:37 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:37 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:37 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:37 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:37 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:37 ollama[2186]: calling cuInit
4月 23 16:34:37 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:37 ollama[2186]: raw version 0x2f08
4月 23 16:34:37 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:37 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:37 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.061+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.152+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.306+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.391+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="242.9 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.562+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.653+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.687+08:00 level=DEBUG source=server.go:1089 msg="llama server stopped"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.687+08:00 level=DEBUG source=sched.go:380 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-6c0b473616167f14af406d122ed3a12f374f6958d2ed2a5af0a889e0cf9f1695
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="242.9 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.814+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.904+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:38 ollama[2186]: releasing cuda driver library
4月 23 16:34:38 ollama[2186]: time=2025-04-23T16:34:38.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:38 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:38 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:38 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:38 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:38 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:38 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:38 ollama[2186]: calling cuInit
4月 23 16:34:38 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:38 ollama[2186]: raw version 0x2f08
4月 23 16:34:38 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:38 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:38 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.059+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.143+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.219+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:39 ollama[2186]: calling cuInit
4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:39 ollama[2186]: raw version 0x2f08
4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:39 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.316+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.408+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.469+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:39 ollama[2186]: calling cuInit
4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:39 ollama[2186]: raw version 0x2f08
4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:39 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.560+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.651+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.719+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
4月 23 16:34:39 ollama[2186]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.15
4月 23 16:34:39 ollama[2186]: dlsym: cuInit - 0x7fa7ace79ef0
4月 23 16:34:39 ollama[2186]: dlsym: cuDriverGetVersion - 0x7fa7ace79f10
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetCount - 0x7fa7ace79f50
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGet - 0x7fa7ace79f30
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetAttribute - 0x7fa7ace7a030
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetUuid - 0x7fa7ace79f90
4月 23 16:34:39 ollama[2186]: dlsym: cuDeviceGetName - 0x7fa7ace79f70
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxCreate_v3 - 0x7fa7ace7a210
4月 23 16:34:39 ollama[2186]: dlsym: cuMemGetInfo_v2 - 0x7fa7ace84190
4月 23 16:34:39 ollama[2186]: dlsym: cuCtxDestroy - 0x7fa7acede7f0
4月 23 16:34:39 ollama[2186]: calling cuInit
4月 23 16:34:39 ollama[2186]: calling cuDriverGetVersion
4月 23 16:34:39 ollama[2186]: raw version 0x2f08
4月 23 16:34:39 ollama[2186]: CUDA driver version: 12.4
4月 23 16:34:39 ollama[2186]: calling cuDeviceGetCount
4月 23 16:34:39 ollama[2186]: device count 2
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.807+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-31e889b0-8e81-ae1f-c7c3-56b903ee95de name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="19.8 GiB" now.total="21.7 GiB" now.free="19.8 GiB" now.used="1.9 GiB"
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.891+08:00 level=DEBUG source=gpu.go:456 msg="updating cuda memory data" gpu=GPU-10670930-fdf7-6410-320c-f8808f4aac75 name="NVIDIA GeForce RTX 2080 Ti" overhead="0 B" before.total="21.7 GiB" before.free="20.2 GiB" now.total="21.7 GiB" now.free="20.2 GiB" now.used="1.5 GiB"
4月 23 16:34:39 ollama[2186]: releasing cuda driver library
4月 23 16:34:39 ollama[2186]: time=2025-04-23T16:34:39.970+08:00 level=DEBUG source=gpu.go:406 msg="updating system memory data" before.total="251.5 GiB" before.free="243.1 GiB" before.free_swap="1.4 GiB" now.total="251.5 GiB" now.free="243.1 GiB" now.free_swap="1.4 GiB"
@shaofanqi commented on GitHub (Apr 23, 2025):
I may find the reason of runners
I may find the reason why the backends deleted, i failed to run the update sh (https://ollama.com/install.sh), I think "if [ -d "$OLLAMA_INSTALL_DIR/lib/ollama" ] ; then
status "Cleaning up old version at $OLLAMA_INSTALL_DIR/lib/ollama"
$SUDO rm -rf "$OLLAMA_INSTALL_DIR/lib/ollama"
fi" this bash deleted the old runners, and the newer was not installed successfully.
@rick-github commented on GitHub (Apr 23, 2025):
The ollama server calculates that it can offload 11 of 65 layers to the GPU. This is why
ollama psshows GPU usage,But the runner is unable to find a GPU backend and only runs on CPU:
@HENScience commented on GitHub (Apr 29, 2025):
Check if the installed Ollama version is compatible with the corresponding system, arm or amd.