mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 16:11:34 -05:00
[GH-ISSUE #12342] Bug: inconsistent to use VRAM and GTT of iGPU of AMD Ryzen AI Processor #54711
Closed
opened 2026-04-29 07:02:45 -05:00 by GiteaMirror
·
30 comments
No Branch/Tag Specified
main
dhiltgen/ci
parth-launch-plan-gating
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#54711
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @alexhegit on GitHub (Sep 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12342
What is the issue?
Ollama dose not use VRAM and GTT of iGPU of AMD Ryzen AI Processor with consistent logic.
There are two parts of memory of iGPU of AMD Ryzen AI Processor. The VRAM size is set in BIOS and the half of residual is GTT.
e.g. for 128GB DDR memory of AMD Ryzen AI Max+ laptop - ROG Flow Z13 (2025) GZ302
Test platform:
Hardware: AMD Ryzen AI Max+395 laptop - ROG Flow Z13 (2025) GZ302 , set 96GB VRAM (by BIOS)
OS: Ubuntu24.04
Ollama: version is 0.11.11
Test1 :
VRAM=96GB, GTT=16GB
Use
radeontopto monitor the VRAM usageThe model gpt-oss:20b is loaded to GTT(16GB) rather than VRAM(96GB).
Expect to load the model to VRAM.
Other tests:
Run qwen3:32b failed.
Expect to load the model to VRAM.
Relevant log output
OS
Linux
GPU
AMD
CPU
AMD
Ollama version
0.11.11
@alexhegit commented on GitHub (Sep 19, 2025):
Test2 :
VRAM=8GB, GTT=60GB
Then I reset the VRAM to 8GB while GTT is 60GB and run gpt-oss:20b which the GTT is enough for load the full model.
But the ollama ps show it run with GPU/CPU hybird mode. It seems that the ollama use VRAM size to estimate the memoy footprint to decide CPU or GPU or CPU/GPU to run the model. But load the model into GTT in runtime. That means ollama has the bug that do not use consistent logic for estimate memory footprint and real runtime memory usage.
@alexhegit commented on GitHub (Sep 19, 2025):
Expectation logic:
If chose use GTT for loading the model , it should use GTT to estimate the memoy footprint to decide if use CPU or GPU or CPU+GPU running mode.
Or more simple to use VRAM for loading the model, since
@rick-github commented on GitHub (Sep 19, 2025):
Server logs may help in debugging.
I have an evo-x2 (AMD Ryzen AI Max+ 395) with kernel 6.11.0-29-generic, with 96GB VRAM set in the BIOS ollama loads models into VRAM.
@Ricky1975 commented on GitHub (Sep 20, 2025):
I have the same issue with a M890 Pro Mini PC (AMD Ryzen 9 8945HS w/ Radeon 780M Graphics)
Kernel 6.1.0-39-amd64, ROCk module version 6.12.12
I think the issue is:
When ollama starts up, it uses the amount of VRAM to calculate the layers to push to the GPU. When loading, it pushes the layers to the GTT.
Pushing them to GTT seems correct for me (and when I force more layers to be pushed there, it works until my GTT is full).
If I minimize my VRAM, Ollama does not even allow me to use the GTT.
Suspected solution (from a n00b):
The calculation of the available GPU-useable memory should take VRAM or GTT into account and not only VRAM. A possible implementation might be as environment parameter to switch or to override (e.g. OLLAMA_USE_GTT=true or OLLAMA_VRAM_OVERRIDE=60G)
@rick-github commented on GitHub (Sep 20, 2025):
As mentioned, it works fine for me. Useful information to debug this might be found in the server logs.
@alexhegit commented on GitHub (Sep 20, 2025):
Yes. we have same issue and same expectation.
@rick-github commented on GitHub (Sep 20, 2025):
As mentioned, it works fine for me. Useful information to debug this might be found in the server logs.
@alexhegit commented on GitHub (Sep 20, 2025):
Have you use the same Ollama version 0.11.11 with running gpt-oss:20b ? Please use
ollama psto see if it is run with 100%CPU rather than 100%GPU.The current logic of ollama use VRAM to judge if the GPU memory is enought to load the model but load and run it with GTT.
@alexhegit commented on GitHub (Sep 20, 2025):
There is one fork repo trying to solve this issue for AMD APU (with iGPU) https://github.com/rjmalagon/ollama-linux-amd-apu
It add a new path to use GTT for AMD APU in https://github.com/rjmalagon/ollama-linux-amd-apu/blob/main/discover/amd_linux.go
@rick-github commented on GitHub (Sep 20, 2025):
Useful information to debug this might be found in the server logs.
@alexhegit commented on GitHub (Sep 21, 2025):
Have you use radeontop to monitor where the model to be loaded, GTT or VRAM.
My test show it use VRAM to estimate the model memory footprint but use GTT to load and run the model.
Memory Setting: VRAM=16GB, GTT=56GB
OS
@rick-github commented on GitHub (Sep 21, 2025):
Useful information to debug this might be found in the server logs.
@alexhegit commented on GitHub (Sep 21, 2025):
Interesting. Can not explain the difference resluts between us. We are use same ollama and same model in same test cases.
I re-set the VRAM=96GB in bios and test again. Still load the model into GTT monitoring by radeontop.
The log show the VRAM=96GB and gpt-oss-20b run with AMD iGPU gfx1151.
@alexhegit commented on GitHub (Sep 21, 2025):
@rick-github
I'd like to check the gpu driver info with you. Here is mine.
@alexhegit commented on GitHub (Sep 21, 2025):
I get some infomation about GTT, VRAM from https://github.com/ollama/ollama/issues/5471.
The kernel changed the behavior about memory allcation about GTT & VRAM. But Ollama should consider to use GTT+VRAM together for loading model and load VRAM first and then use GTT if VRAM size is not enough. It should be good for the iGPU of AMD with UMA.
@rick-github Do you use the linux kernel version < 6.10.0?
@rick-github commented on GitHub (Sep 21, 2025):
https://github.com/ollama/ollama/issues/12342#issuecomment-3311721448
@rick-github commented on GitHub (Sep 21, 2025):
Have you set
amdgpu.no_system_mem_limitin the boot params?@alexhegit commented on GitHub (Sep 22, 2025):
@rick-github
I never modify this boot param.
@rick-github commented on GitHub (Sep 22, 2025):
Then maybe modify this boot param?
@alexhegit commented on GitHub (Sep 24, 2025):
Then maybe modify this boot param?
Still use GTT ahead VRAM with ollama test with /sys/module/amdgpu/parameters/no_system_mem_limit = Y.
what's your settings of this sysfs args?
@rick-github commented on GitHub (Sep 24, 2025):
@MrUhu commented on GitHub (Sep 30, 2025):
@Ricky1975 I also noticed that somehow the behaviour of Ollama changed.
Usually I had the problem, that Ollama used VRAM for estimation and then loaded the model into GTT. Now it's running in VRAM but I can't force it anymore to load the model into GTT.
It's a pitty...the last couple of days I tinkered around a bit with some scripts to make the set up process easier but unfortunately I can't force Ollama to use the GTT anymore. That's quite a bummer because I want to be able to use more than the 16gb I can set in my BIOS.
@alexhegit commented on GitHub (Oct 10, 2025):
@MrUhu Hi, so you forgot what changes make it run with VRAM (changed from GTT)?
@MrUhu commented on GitHub (Oct 13, 2025):
No. I didn't forget.
Since one of the earlier versions the vRAM handling changed. I don't know if it was a change in ROCm or Ollama but now Ollama checks VRAM and writes the modell to vRAM and not GTT.
When GTT was still used I created custom modelfile where I told ollama to write all layers to Video Memory. In my case the video memory was the GTT. But now it uses the VRAM instead - so no more GTT tinkering.
I could try an older version of Ollama but have to take a look at the .sh file beforehand.
@Djip007 commented on GitHub (Oct 21, 2025):
1 395 / 5 000
Sorry, I didn't see this issue.
There are a lot of assumptions in the comments that aren't correct, I'll try to be clear.
Ollama doesn't manage any allocations; it only configures llama.cpp.
In the rocm/hip backend of llama.cpp, no allocation is ever requested on the GTT. I don't even think this is possible with hip. With HIP, you can allocate memory on the device or on the host. In the case of host allocation, it's possible to configure cache coherence for better performance.
On the llama.cpp side, there's the option to choose whether to allocate memory on the host or the device. Enabling this on a dGPU significantly reduces performance. But on an eGPU, it allows you to use all of the RAM with very little loss. (And I'd say it can work on Windows.)
What's changed is the AMD driver in the Linux kernel. Previously, device allocation was always done on vRAM. Since many programs don't handle allocation on the host, this severely limited the usable size. With a kernel running under Linux (>6.11???), AMD made a change to Linux so that the allocation on an APU's device could be allocated either to vRAM or to the GTT. This made it easy to access more memory on laptops where the manufacturer doesn't allow changing the vRAM size in the BIOS.
And there were no changes to llama.cpp/ollama to handle this.
The problem lies with ollama. It doesn't physically allocate memory, but rather tries to estimate the memory size usable by GPUs to determine how many LLM layers it can fit into it. And this is where it gets complicated:
GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON, then the memory size that llama.cpp can allocate is the RAM size. Otherwise, it's vRAM + GTT.Note: I know there's already been some discussion on this topic, but I don't know where it stands.
On the llama.cpp side, they've recently (?) changed the way they handle this allocation: previously, it had to be done at compile time (which was a problem for ollama), but now it's doable via an env variable.
@Djip007 commented on GitHub (Oct 21, 2025):
To finally address the iGPU case (potentially also under Windows), we need to review the way to calculate available memory, and to configure llama.cpp. I would say for a simple case:
(and because it did not need vRAM at all on iGPU it may be best to make it as small as possible...)
@rick-github commented on GitHub (Oct 21, 2025):
ollama is no longer a wrapper for llama.cpp, it doesn't configure it. It uses the same ggml.org library that llama.cpp does.
GGML_CUDA_ENABLE_UNIFIED_MEMORYis not the unified memory is the sense that and iGPU and the CPU share it. It enables the dGPU to access system RAM via the PCI bus and is not included in the memory estimation done by ollama, except if the layers are forced onto the GPU withnum_gpu.@Djip007 commented on GitHub (Oct 22, 2025):
GGML_CUDA_ENABLE_UNIFIED_MEMORY is part of ggml so so we can still use it.
It is like you point tu use RAM on GPU, the fact that the APU use unified memory mean that with good config it can be use without without loss of performance (or nearly).
For now the CUDA/HIP backend has an inconsistency. When this variable is set, it continues to report the size of the vRAM (+GTT on APUs with a resent kernel) and not the size of the RAM on which it actually allocates.
I have a question, where is the memory "compute" did you use what ggml report?
@MrUhu commented on GitHub (Nov 7, 2025):
Thanks for this info. Works for me.
If anyone is also using Fedora, here are my Ollama scripts for Fedora:
https://github.com/MrUhu/handy-fedora-scripts-for-ollama
With the update.sh script it will update your PC, check the Ollama release page for the latest update and only run the update command when there is a new update. Then it will add a couple of environment variables to the ollama.service file and restart the service - GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON and HSA_OVERRIDE_GFX_VERSION=11.0.2 primarily. Change the HSA OVERRIDE to your preffered version.
With the change_gtt_size_for_amd_igpu.sh it will take your desired GTT Size in GB (!!!), check your available system memory and write the set GTT size to grubby. I've added a limiter of 50% of your available system memory. You can edit this out if you want.
And finally overwrite_gpu_restriction_to_modelfiles.sh takes your list of models in ollama ls and looks up their layercount on ollama.com, writes this layer count to a new modelfile with PARAMETER num_gpu and creates these models for ollama to use.
I currently run qwen3-coder (q4 model with 19gb total) on 4gb of VRAM allocated to my 7940hs with a RX 780m as "myqwen3-code" (the edited Models are just called my+Model Name) with all layers running on 28gb of GTT.
Use at your own risk, of course.
@namecaps3k commented on GitHub (Dec 5, 2025):
I have simmilar problem but my VRAM is set to 1GB (lower value I can set in Bios) because I want to use only GTT (Full 120GB or so). I have ollama installed in official way, newest rocm, ollama finds it but also sets low VRAM and load everything to GTT or partially if model is small.
Also can see user here as exactly same issue: https://github.com/ollama/ollama/issues/12062
Thing is that all inference is done on CPU which is painfully slow. Any idea what can I do to run on GPU? llamacpp runs perfectly fine with this setup. When I set VRAM in bios to something larger it will show 100% GPU instead of CPU and works way faster (50 tokens vs 20)
ai@minis:~$ ollama -v
ollama version is 0.13.1
ai@minis:~$ ollama run gpt-oss:20b hello; ollama ps
Thinking...
User says "hello". They want greeting? We should respond politely, maybe ask how can help.
...done thinking.
Hello! 👋 How can I help you today?
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gpt-oss:20b 17052f91a42e 14 GB 100% CPU 4096 4 minutes from now
Nov 30 12:20:35 minis ollama[193626]: time=2025-11-30T12:20:35.977Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=00>
Nov 30 12:20:35 minis ollama[193626]: time=2025-11-30T12:20:35.977Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="1.0 GiB" threshold="20.0 GiB"
Nov 30 12:21:11 minis ollama[193626]
Nov 30 12:24:32 minis ollama[193626]: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0