mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 16:11:34 -05:00
[GH-ISSUE #12564] Doens't offload any layer into GPU RAM since 0.12.4 (AMD RX 7900 XTX on Windows) #54848
Closed
opened 2026-04-29 07:36:45 -05:00 by GiteaMirror
·
41 comments
No Branch/Tag Specified
main
dhiltgen/ci
parth-launch-plan-gating
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#54848
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jack-running on GitHub (Oct 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12564
Originally assigned to: @dhiltgen on GitHub.
What is the issue?
On Ollama 0.12.3 it works fine, in 0.12.5 and 0.12.4 the AMD GPUs are recognized, but are not used at all and inference runs only on CPU.
Relevant log output
OS
Windows
GPU
AMD
CPU
AMD
Ollama version
0.12.5
@jessegross commented on GitHub (Oct 10, 2025):
With 0.12.5:
On startup:
time=2025-10-11T00:03:35.623+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.6 GiB"We detect 23.6 GB of available VRAM.
Right before the model is loaded:
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:689 msg="gpu memory" id=0 library=ROCm available="370.0 MiB" free="827.0 MiB" minimum="457.0 MiB" overhead="0 B"We only see 827 MB of free VRAM. What does Windows task manager show right before you launch the model?
We have improved free memory reporting for AMD on Windows with this version - it could be buggy or it could be more accurately reflecting reality.
@dhiltgen commented on GitHub (Oct 10, 2025):
If task manager claims much more VRAM is available, could you try again with
$env:OLLAMA_DEBUG="1"so we can see more details in the log?@dhiltgen commented on GitHub (Oct 10, 2025):
Actually we may need
$env:OLLAMA_DEBUG="2"to see why the AMD GPU VRAM reporting is getting things wrong. The easiest way to get this is to quit Ollama from the system tray, then in a powershell terminal run:Then in another terminal run
@StrykeSlammerII commented on GitHub (Oct 10, 2025):
Similar issue here (AMD card, Manjaro/Linux, Intel CPU), just started after upgrading ollama to 0.12.4
I typically use
$ OLLAMA_FLASH_ATTENTION=1 ollama start, and (in 0.12.4) that log ends showing 0 VRAM regardless of what other programs I have running.log.txt
Not sure whether this is different enough for a new ticket, I'll be happy to submit one if requested.
@esmorun commented on GitHub (Oct 11, 2025):
Same issue on 9070 XT after upgrade to v0.12.5, downgrading to v0.12.3 solved it. I noticed this line claiming there's no free vram:
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 9070 XT) (0000:03:00.0) - 0 MiB freetime=2025-10-11T113353.627+0200 lev.txt
@inforithmics commented on GitHub (Oct 12, 2025):
I had a look into the log and saw following line
runner exited" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c]
ROCR_VISIBLE_DEVICES is initialized with a GPU-UUID
but it only supports numbers 0,1,2 So that. The Device is not found and returns0 available Memory. I had to fix this in the Vulkan Pull Request to get correct Sizes.
replace with
Add following to bootstrapDevices
Maybe a Graphic Driver change suddenly retunred the UUID and it stopped workingMaybe to GPU UUIDs stopped working and only Device Ids work (0,1,2).
@dhiltgen commented on GitHub (Oct 12, 2025):
@esmorun and anyone else facing this issue, please run the server with OLLAMA_DEBUG="2" and share the startup logs up to the point of "inference compute" or where the system is reporting 0mb free when loading a model so we can see what the problem is.
@StrykeSlammerII commented on GitHub (Oct 13, 2025):
Mine reports 0mb free before loading a model, just when starting the server--but attached below is a log with OLLAMA_DEBUG="2" up to
source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/PG-snow:latest runner.size="16.2 GiB" runner.vram="0 B"There's a SIGSEGV before the model is loaded:
not sure if that's relevant as it still proceeds to find the correct gfx1200 device
though it reports 0MiB free
Loading the model comes later, confirms 0 layers offloaded to GPU (which is atypical)
I'm running Ollama 0.12.5 from CLI, loading model via open-webui
$ uname -r
6.12.51-1-MANJARO
$ ollama -v
ollama version is 0.12.5
Full log:
log.txt
@geminigeek commented on GitHub (Oct 13, 2025):
same issue works with
ollama/ollama:0.12.3-rocmwith gpu AMD 5600g@dhiltgen commented on GitHub (Oct 13, 2025):
Thanks for the logs @StrykeSlammerII - it looks like your failure may be different than @jack-running. Did prior versions of Ollama work correctly on your GPU? I don't have a matching GPU to test, but an RX 9070 (gfx1201) seems to be working OK on linux. Which model do you have? You may be dependent on #10676 which updates the ROCm version we're using for the official binary releases.
@StrykeSlammerII commented on GitHub (Oct 13, 2025):
Yes, Ollama was working fine until I updated recently--sorry I don't know the exact version that worked correctly.
My GPU is a RX 9060 XT.
Should I make a new issue so as not to clutter this one?
@dhiltgen commented on GitHub (Oct 13, 2025):
Yes please.
If possible, please try to share logs from a prior run that did bind to the GPU as well.
The new discovery code in 0.12.5+ tries harder to verify the GPU will work properly, where older versions would sometimes crash for some models during inference. I'm not sure, but it's possible if you only used certain models before, you may not have exercised the code paths that would have led to a crash which the new code tries to verify at startup.
One other thing to try is setting
HSA_OVERRIDE_GFX_VERSION_0=12.0.1and see if that changes behavior at all.@thk-socal commented on GitHub (Oct 17, 2025):
I have the same type of issue running docker on Ubuntu 24.04. Running 0.12.0-rocm was fine and now with the last few releases it will NOT work with my GPU. I have even tagged it directly to the GPU with the ID and told it not to use discovery. It defaults back to CPU mode.
@thk-socal commented on GitHub (Oct 17, 2025):
Here is 0.12.6-rocm
@thk-socal commented on GitHub (Oct 17, 2025):
and here is 0.12.0-rocm working. The ONLY difference is the docker image I pull.
@jack-running commented on GitHub (Oct 17, 2025):
My apologies for delay in my response.
The issue is still there even with Ollama 0.12.6
The startup logs:
and then the logs running the inference:
@dhiltgen commented on GitHub (Oct 17, 2025):
@jack-running the GPU discovery output looks like it found your discrete GPU. Could you include a bit more of the portion of the log you omitted in the
....which looks something like this?@dhiltgen commented on GitHub (Oct 17, 2025):
@thk-socal can you try running with OLLAMA_DEBUG=2 and not setting any overrides for visible devices or the LLM Library to use, and if it still doesn't find your GPU, share the logs up to the point of "inference compute" reporting CPU?
@thk-socal commented on GitHub (Oct 17, 2025):
I removed all the previous environment variables and mappings that I have been using for a long time. Went very generic and now it appears to be doing discovery correctly etc... So the new auto configuring is working well now. The only environment variable I have set is TZ now and one volume mapped for the docker container for the ollama root folder.
Tried 0.12.3, 0.12.5 and 0.12.6 all with success now for identifying, but the models will NOT load and run. Working on that data later when I have a moment.
@thk-socal commented on GitHub (Oct 17, 2025):
and now back to this:
@dhiltgen commented on GitHub (Oct 17, 2025):
@thk-socal it looks like ROCm is taking a really long time to initialize your GPU and occasionally hitting a timeout. I'll look at increasing the timeout and see if we can do anything to speed it up.
#12681 should mitigate this while we look to find ways to make it faster.
Until that ships in a future release, my best guess is the GPU has gone into a low power/idle state and takes a long time to warm back up. If you can find a way to warm it up before Ollama starts, or immediately restart Ollama that should workaround this until then.
@thk-socal commented on GitHub (Oct 17, 2025):
I have also rebuilt the host from scratch now and will watch how it does. 24.04.03 with 6.14 HWE kernel and 7.0.2 drivers installed on the host. New DEBUG logs from first run.
time=2025-10-17T20:38:00.100Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-17T20:38:00.101Z level=INFO source=images.go:522 msg="total blobs: 0" time=2025-10-17T20:38:00.101Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T20:38:00.101Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)" time=2025-10-17T20:38:00.101Z level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T20:38:00.101Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T20:38:00.101Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T20:38:00.101Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 46473" time=2025-10-17T20:38:00.109Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T20:38:00.109Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:46473" time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:00.114Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T20:38:00.114Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T20:38:00.114Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T20:38:00.120Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-44d96d7bf1798b3e load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T20:38:01.535Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-17T20:38:01.535Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=1.423881543s time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=26.978µs time=2025-10-17T20:38:01.536Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:GPU-44d96d7bf1798b3e Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilteredID: Integrated:false PCIID:06:00.0 TotalMemory:25753026560 FreeMemory:25715277824 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.43527571s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=1 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1100 pci_id=06:00.0 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" time=2025-10-17T20:38:01.537Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" cmd="/usr/bin/ollama runner --ollama-engine --port 46607" time=2025-10-17T20:38:01.544Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T20:38:01.544Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:46607" time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:01.547Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T20:38:01.547Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T20:38:01.547Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T20:38:01.551Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 ggml_cuda_init: rocBLAS initialized on device 0 Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-44d96d7bf1798b3e load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T20:38:03.317Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=1.77050867s time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=18.181µs time=2025-10-17T20:38:03.318Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:GPU-44d96d7bf1798b3e Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilteredID: Integrated:false PCIID:06:00.0 TotalMemory:25753026560 FreeMemory:25193086976 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.781482227s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" time=2025-10-17T20:38:03.318Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[GPU-44d96d7bf1798b3e:0]]] time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.217050308s time=2025-10-17T20:38:03.318Z level=INFO source=types.go:112 msg="inference compute" id=GPU-44d96d7bf1798b3e library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=06:00.0 type=discrete total="24.0 GiB" available="23.9 GiB"@thk-socal commented on GitHub (Oct 17, 2025):
Next set of errors after it loaded up a model, responded, and then I ran another small model:
time=2025-10-17T20:56:18.269Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:18.269Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values" time=2025-10-17T20:56:20.016Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:20.017Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values" time=2025-10-17T20:56:20.017Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:20.017Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values"@dhiltgen commented on GitHub (Oct 17, 2025):
@thk-socal from those latest logs it looks like it did discover your GPU and run inference on it. Is that correct?
@thk-socal commented on GitHub (Oct 17, 2025):
@dhiltgen it did discover and run inference correctly, but then it appeared to have memory issue and break. Dropping back to 0.12.3 now to see what happens and will continue trying to narrow down the issue.
@dhiltgen commented on GitHub (Oct 17, 2025):
@thk-socal depending on what models you're trying to use, you can try to leverage OLLAMA_NEW_ENGINE=1 to get the benefits of the new memory management logic which might help.
@thk-socal commented on GitHub (Oct 20, 2025):
@dhiltgen in previous version, it did not seem to matter if the OS had done power management on the GPU when not in use. Since some of the latest versions, that appears to be an issue. I just told Linux to not power manage that specific PCI card and things appear to be stabilizing the issues. I am wondering if the new codes does not allow time to spin the GPU backup from a power minimizing state.
@dhiltgen commented on GitHub (Oct 20, 2025):
@thk-socal thanks for that info. We do want GPUs to be able to go back to low power state when not in use. I've merged a change to give the system more time for discovering AMD GPUs which should help mitigate this, but I'm hoping we can find a way to speed up the process of waking the device back up.
@jack-running commented on GitHub (Oct 21, 2025):
It looks it found the discrete GPU only in the first case, but then it doesn't use the GPU RX 7900 XTX 24GB at all, everything runs as if there is only the onboard GPU and only CPU is used for inference.
Trying to attach the whole debug log file from the commands given by you previously.
ollama_0.12.6_DEBUG_2.txt
time=2025-10-17T14:42:56.662+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="1.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="123.2 MiB"
time=2025-10-17T14:42:56.662+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers="3[ID:0 Layers:3(21..23)]"
time=2025-10-17T14:42:56.662+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:8 GPULayers:3[ID:0 Layers:3(21..23)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-17T14:42:56.696+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:42:56.697+02:00 level=TRACE source=ggml.go:277 msg="created tensor" name=blk.0.attn_k.bias shape=[512] dtype=0 buffer_type=CPU
time=2025-10-17T14:42:56.697+02:00 level=TRACE source=ggml.go:277 msg="created tensor" name=blk.0.attn_k.weight shape="[2880 512]" dtype=30 buffer_type=CPU
@dhiltgen commented on GitHub (Oct 21, 2025):
@jack-running it looks like the IDs may be getting messed up and it's incorrectly looking up the iGPU information when it should be matching the discrete GPU. This might be fixed by #12540, however as a temporary workaround, you can try setting
HIP_VISIBLE_DEVICES=1until we get this fixed.@dhiltgen commented on GitHub (Oct 29, 2025):
The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look.
@thk-socal commented on GitHub (Oct 30, 2025):
Looks good so far. I left all device IDs commented out and they were discovered properly. I am running the new engine flag only. Loaded and ran inference....now to see what it does after sitting for a few hours before I run it again.
Thanks for all the work!
@thk-socal commented on GitHub (Oct 30, 2025):
@dhiltgen worked great for the first run. Then after I let it sit for 30 minutes or so, I tried again and it hung. The first run model appeared to still be in memory of the GPU and it would not unload it or use it again. I restarted the docker container and the memory was STILL locked from the previous run so instead of 24GB I had 11GB. Not enough to load the 13GB and it just hung.
Not sure yet why.
@dhiltgen commented on GitHub (Oct 30, 2025):
@thk-socal bouncing the docker container should clear everything up. Maybe there's a docker bug or amdgpu driver bug? You could try bouncing the docker service itself to see if that clears things up to help isolate where the problem is.
@thk-socal commented on GitHub (Oct 30, 2025):
@dhiltgen bounce/kill/etc... had to reboot it when I have 0.12.7 running. Now with 0.12.1 and I do not have the problem. It is like it does not unload the model properly after the time period expires. 0.12.1 is working as intended without issues. Again, first run on 0.12.7 was perfect, but after the standard 5 minuted timeout for unloading models, it is like something hangs there. I will try other things to try to narrow it down.
@dhiltgen commented on GitHub (Oct 30, 2025):
One data point that will be helpful to understand is if the actual VRAM usage stays up, or if we're getting bad information about VRAM usage for some reason. On Windows, we are using a windows specific AMD Library that yields accurate VRAM information, but on Linux, we're leveraging ROCm APIs in the new discovery code. The old discovery code used sysfs.
@jack-running commented on GitHub (Oct 30, 2025):
I was very excited when I saw your message about 0.12.7 being available and I ran the tests with 0.12.7 as soon as possible, but my excitement faded away quite quickly :(
It seems that the inference by default still runs on CPU, despite detecting the 24GB GPU and even with suggested environment variable set HIP_VISIBLE_DEVICES=1 I couldn't get it running any better. (it seems I need to get back to version 0.12.3, which is still running fine on RX 7900 XTX)
Here are the logs:
serve_2025-10-30.log
serve_2025-10-30_HIP1.log
@dhiltgen commented on GitHub (Oct 30, 2025):
@jack-running thanks for the updated logs. I think I see what is going wrong and I'm working on a fix...
@thk-socal commented on GitHub (Oct 31, 2025):
@dhiltgen I rolled back my driver to 6.4 from 7.1 on the AMD GPU Linux driver and that cleared this up as well.
@dhiltgen commented on GitHub (Oct 31, 2025):
@thk-socal thanks for that data point! Please go ahead and file a new issue noting that driver 7.1 seems to have problems with current Ollama so we can track that.
@jack-running commented on GitHub (Oct 31, 2025):
0.12.8 works as a charm!