mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 08:02:14 -05:00
Closed
opened 2026-05-03 14:19:15 -05:00 by GiteaMirror
·
171 comments
No Branch/Tag Specified
main
parth-mlx-decode-checkpoints
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#63598
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @DocMAX on GitHub (Feb 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2637
Originally assigned to: @dhiltgen on GitHub.
Opening a new issue (see https://github.com/ollama/ollama/pull/2195) to track support for integrated GPUs. I have a AMD 5800U CPU with integrated graphics. As far as i did research ROCR lately does support integrated graphics too.
Currently Ollama seems to ignore iGPUs in general.
@GZGavinZhao commented on GitHub (Feb 22, 2024):
ROCm's support for integrated GPUs is not that well. This issue may largely depend on AMD's progress on improving ROCm.
@DocMAX commented on GitHub (Feb 22, 2024):
OK, but i would like to have an option to have it enable. Just to check if it works.
@DocMAX commented on GitHub (Feb 22, 2024):
This is what i get with the new docker image (rocm support). Detects Radeon and then says no GPU detected?!?
@GZGavinZhao commented on GitHub (Feb 22, 2024):
Their
AMDDetected()function is a bit broken and I haven't figured out a fix for it.@sid-cypher commented on GitHub (Feb 23, 2024):
I've seen this behavior in #2411, but only with the version from ollama.com.
Try it with the latest released binary?
https://github.com/ollama/ollama/releases/tag/v0.1.27
@GZGavinZhao commented on GitHub (Feb 23, 2024):
Yes, latest release fixed this behavior.
@DocMAX commented on GitHub (Feb 23, 2024):
I had a permission issue with lxc/docker. Now:
So as the topic says, please add integrated GPU support (AMD 5800U here)
@robertvazan commented on GitHub (Feb 24, 2024):
Latest (0.1.27) docker image with ROCm works for me on Ryzen 5600G with 8GB VRAM allocation. Prompt processing is 2x faster than with CPU. Generation runs at max speed even if CPU is busy running other processes. I am on Fedora 39.
Container setup:
HCC_AMDGPU_TARGETS=gfx900(unnecessary)/dev/dri/card1, /dev/dri/renderD128, /dev/dri, /dev/kfdadditional options:(unnecessary)--group-add video --security-opt seccomp:unconfinedIt's however still shaky:
@robertvazan commented on GitHub (Feb 24, 2024):
See also discussion in the #738 epic.
@DocMAX commented on GitHub (Feb 24, 2024):
Why does it work for you??
Still not working here.
Also the non-docker version doesnt work...
@dhiltgen please have a look
@DocMAX commented on GitHub (Feb 24, 2024):
And by the way there is no /sys/module/amdgpu/version. You have to correct the code.
@robertvazan commented on GitHub (Feb 24, 2024):
Ollama skipped the iGPU, because it has less than 1GB of VRAM. You have to configure VRAM allocation for the iGPU in BIOS to something like 8GB.
@DocMAX commented on GitHub (Feb 24, 2024):
Thanks i will check if i can do that.
But normal behaviour for the iGPU should be that it requests more VRAM if needed.
@robertvazan commented on GitHub (Feb 24, 2024):
Why do you think so? Where is it documented? Mine maxes at 512MB unless I explicitly configure it in BIOS.
@sid-cypher commented on GitHub (Feb 24, 2024):
Detecting and using this VRAM information without sharing with the user the reason for the iGPU rejection leads to "missing support" issues being opened, rather than "increase my VRAM allocation" steps taken. I think the log output should be improved in this case. This task would probably qualify for a "good first issue" tag, too.
@DocMAX commented on GitHub (Feb 24, 2024):
Totally agree!
@chiragkrishna commented on GitHub (Feb 24, 2024):
i have 2 systems.
Ryzen 5500U system always gets stuck here. ive allotted 4gb vram for it in the bios. its the max.
export HSA_OVERRIDE_GFX_VERSION=9.0.0
export HCC_AMDGPU_TARGETS=gfx900
building with
my 6750xt system works perfectly
@DocMAX commented on GitHub (Feb 24, 2024):
OK i was wrong. Works now with 8GB VRAM, thank you!
@DocMAX commented on GitHub (Feb 24, 2024):
Hmm, i see the model loaded into VRAM, but nothing happens...
@DocMAX commented on GitHub (Feb 24, 2024):
Do i need another amdgpu module on the host than the one from the kernel (6.7.6)?
@sid-cypher commented on GitHub (Feb 24, 2024):
Maybe, https://github.com/ROCm/ROCm/issues/816 seems relevant. I'm just using AMD-provided DKMS modules from https://repo.radeon.com/amdgpu/6.0.2/ubuntu to be sure.
@DocMAX commented on GitHub (Feb 24, 2024):
Hmm, tinyllama model does work with 5800U. The bigger ones stuck as i mentioned before.
Edit: Codellama works too.
@chiragkrishna commented on GitHub (Feb 25, 2024):
i added this "-DLLAMA_HIP_UMA=ON" to "ollama/llm/generate/gen_linux.sh"
now its stuck here
@robertvazan commented on GitHub (Feb 25, 2024):
iGPUs indeed do allocate system RAM on demand. It's called GTT/GART. Here's what I get when I run
sudo dmesg | grep "M of"on my system with 32GB RAM:If I set VRAM to Auto in BIOS:
If I set VRAM to 8GB in BIOS:
If I set VRAM to 16GB in BIOS:
It looks like GTT size is 0.5*(RAM-VRAM). I wonder how far can this go if you have 64GB or 96GB RAM. Can you have iGPU with 32GB or 48GB of GTT memory? That would make $200 APU with $200 DDR5 RAM superior to $2,000 dGPU for running Mixtral and future sparse models. I also wonder whether any BIOS offers 32GB VRAM setting if you have 64GB of RAM.
Unfortunately, ROCm does not use GTT. That thread mentions several workarounds (torch-apu-helper, force-host-alloction-APU, Rusticl, unlock VRAM allocation), but I am not sure whether Ollama would be able to use any of them. Chances are highest in docker container where Ollama has greatest control over dependencies.
@DocMAX commented on GitHub (Feb 25, 2024):
Very cool findings. Interesting you mention 96GB. I did a research and it seems thats the max. we can buy right now for SO-DIMMS. Wasn't aware it's called GTT. Let's hope someday we get support for this.
If host can't handle GTT for ROCm, then i doubt docker can't do anything about it.
https://github.com/segurac/force-host-alloction-APU looks like the best solution to me if it works. Will try in my docker containers...
This is how much i would get :-) (64GB system)
@DocMAX commented on GitHub (Feb 25, 2024):
OK, doesn't work with ollama. Wasn't aware that it doesn't use PyTorch right?
@chiragkrishna commented on GitHub (Feb 26, 2024):
llama.cpp supports it. thats what i was trying to do in my previous post. Support AMD Ryzen Unified Memory Architecture (UMA)
@robertvazan commented on GitHub (Feb 26, 2024):
@chiragkrishna Do you mean this? https://github.com/ggerganov/llama.cpp/pull/4449
Since llama.cpp already supports UMA (GGT/GART), Ollama could perhaps include llama.cpp build with UMA enabled and use it when the conditions are right (AMD iGPU with VRAM smaller than the model).
PS: UMA support seems a bit unstable, so perhaps enable it with environment variable at first.
@DocMAX commented on GitHub (Feb 26, 2024):
How does the env thing work? Like this? (Doesn't do anything btw)
LLAMA_HIP_UMA=1 HSA_OVERRIDE_GFX_VERSION=9.0.0 HCC_AMDGPU_TARGETS==gfx900 ollama start@robertvazan commented on GitHub (Feb 26, 2024):
@DocMAX I don't think there's UMA support in ollama yet. It's a compile-time option in llama.cpp. The other env variables (HSA_OVERRIDE_GFX_VERSION was sufficient in my experiments) are correctly passed down to ROCm.
@chiragkrishna commented on GitHub (Feb 26, 2024):
git clone and add them here
@dhiltgen commented on GitHub (Feb 26, 2024):
I haven't dug deeply into this yet, but from what I've seen, I believe we'll need a second variant for ROCm compiled with system/unified memory support to support modern iGPUs. Setting these flags in llama.cpp will degrade performance on discrete GPUs, but since we have a model already to support multiple variants, it shouldn't be a problem to have both.
I'm working on some refinements to amdgpu discovery to try to pivot over to pure sysfs discovery which should help here.
@DocMAX commented on GitHub (Feb 26, 2024):
Did so, but i still get "no GPU detected"...
@chiragkrishna commented on GitHub (Feb 27, 2024):
build:
run:
@DocMAX commented on GitHub (Feb 27, 2024):
Did exactly so, but not working... very strange. CPU: AMD 5800U
@chiragkrishna commented on GitHub (Feb 28, 2024):
try playing with.
ollama has few broken checks for amd integrated gpus currently
@DocMAX commented on GitHub (Feb 28, 2024):
Nope, doesn't make a difference :-(
@chiragkrishna commented on GitHub (Feb 28, 2024):
change "tooOld" to this and compile and see.
ollama/gpu/gpu.go from line 173
even if your gpu is detected you will be stuck at my place i guess.
@DocMAX commented on GitHub (Feb 28, 2024):
Still no GPU, i give up.
@DocMAX commented on GitHub (Mar 8, 2024):
Something happened now...
I compiled with "-DLLAMA_HIP_UMA=ON"... So UMA still not working...
@chiragkrishna commented on GitHub (Mar 8, 2024):
compiled just now. no luck with ryzen 5500U
@robertvazan commented on GitHub (Mar 9, 2024):
Last update of the docker image introduced upgrade to ROCm 6.0, which dropped support for gfx900, so now Ryzen 5600G does not work even with
HSA_OVERRIDE_GFX_VERSION. AMD screwed us. Last working version is 0.1.27 with ROCm 5.7.@dhiltgen promised support for multiple ROCm versions. I am looking forward to it.
@robertvazan commented on GitHub (Mar 9, 2024):
Also looking forward to Vulkan support (#2033, #2578), which looks like a better solution than ROCm.
@kirel commented on GitHub (Mar 9, 2024):
My AMD Ryzen 7 7840HS w/ Radeon 780M Graphics works great with
HSA_OVERRIDE_GFX_VERSION=11.0.0- I set VRAM to UM_SPECIFIED and 16G (I have 32G of RAM) in the Bios of my minisforum um780xtx mini PC.@DocMAX commented on GitHub (Mar 9, 2024):
Yeah, sure that works. But this is about getting Ollama run with UMA memory and "auto" mode in BIOS!
@robertvazan commented on GitHub (Mar 9, 2024):
@kirel Your iGPU is RDNA3, which is still supported by ROCm. ROCm definitely works, it's just that they deprecate hardware really quickly (my CPU is 6 months old). Vulkan will hopefully provide wider and more long-lived support without any hacks.
@taweili commented on GitHub (Mar 10, 2024):
I managed to get Ollama and llama.cpp to run on 5700G with
export HSA_ENABLE_SDMA=0.The performance gain isn't much but I am also looking into thehipHostMallochack. You can see more info here.@robertvazan commented on GitHub (Mar 11, 2024):
I can confirm this works with Ryzen 5600G and ROCm 6.0.
It would be ideal to have these overrides stored centrally in ROCm, llama.cpp, or Ollama code.
@DocMAX commented on GitHub (Mar 12, 2024):
Doesn't work yet: "not enough vram available, falling back to CPU only"
@ddpasa commented on GitHub (Mar 13, 2024):
vulkan can really help here: https://github.com/ollama/ollama/pull/2578
llama.cpp has some vulkan support, but it's in very early stages. You can try the PR above if that helps.
@DocMAX commented on GitHub (Mar 16, 2024):
so any updates on this? how can GTT memory be enabled?
@ericcurtin commented on GitHub (Apr 12, 2024):
Hit this also 😄
on a octa core "AMD Ryzen 7 5700U with Radeon Graphics"
Maybe I should buy myself a GPU :)
@ericcurtin commented on GitHub (Apr 12, 2024):
Has anybody experience with connecting something like:
a "Radeon RX 7600" GPU
to a PCIe slot designed for an NVMe drive? Any recommendations for an adapter?
Wanna learn about with ollama AI :)
@qkiel commented on GitHub (Apr 20, 2024):
@DocMAX thanks for the tip on the compilation. It works with one additional line of code.
Let me explain the process from the start, so others can follow. I have AMD 5600G APU and use Ubuntu 22.04.
Compiling Ollama requires newer versions of
cmakeandgothan the ones available in Ubuntu 22.04:cmakeversion 3.24 or highergoversion 1.22 or highergccversion 11.4.0 or higherROCm6.0.3 or 6.1libclblastfor AMDFirst install ROCm using official instructions. I'm using version 6.1 even though officially it no longer supports GCN-based iGPUs.
Next, install some required packages:
And finally install cmake and go from official pages:
We need to add extracted directories to the PATH. Open the .profile with text editor and add this line at the end (
/home/ubuntu/depends on your user, so change it accordingly):Now use
source ~/.profilecommand to make sure environment variable is set.Getting Ollama source code is simple, use
git clonecommand with a tag of the latest release:Let's make two changes in the source code. In the
ollama/llm/generate/gen_linux.shfile, find a line that begins withif [ -d "${ROCM_PATH}" ]; then. A few lines under it, there is a line that begins withCMAKE_DEFS=.Because Ollama uses llama.cpp under the hood, we can add there environment variables required for APU. In my case these are
-DLLAMA_HIP_UMA=on -DHSA_ENABLE_SDMA=off -DHSA_OVERRIDE_GFX_VERSION=9.0.0. I also changed both-DAMDGPU_TARGETS=$(amdGPUs)and-DGPU_TARGETS=$(amdGPUs)togfx900(this value depends on your iGPU of course). It should look like this:Second thing we have to change is in
ollama/gpu/amd_linux.gofile. Find a line that begins withif totalMemory < IGPUMemLimit {. Just before it addtotalMemory = 16 * format.GibiByte, where the value16is how much of VRAM can Ollama use for the models. I wouldn't go beyondyour_RAM_in_GB - 8. This code should look like this:Now Ollama thinks my iGPU has 16 GB of VRAM assigned to it and doesn't complain. Up to 16 GB will be used when Ollama is running and models are loaded, but when we stop the container, our RAM will be free again.
Compile Ollama:
Now you can run it with this command:
@rjmalagon commented on GitHub (Apr 22, 2024):
Yup, this trick works pretty well if you know what are you doing.
Command R was usable on GPU (>32GB of RAM only to load), and I can raise contexts windows to 32000 tokens (>60GB of RAM), around 80G without mayor hiccups. Ryzen 5600G here with 128G RAM. I modified the source and variables on the Dockerfile for a very AMD APU friendly ollama rocm container.
@DimitriosKakouris commented on GitHub (May 7, 2024):
I run ollama in docker with the official
ollama:rocmand I run the container withdocker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e HSA_OVERRIDE_GFX_VERSION="11.0.0" --name ollama ollama/ollama:rocm, that is the same HSA override that @kirel used, I also have the AMD 7840HS with Radeon 780M (gfx1103) but when I type a instruction instead of an answer I get a blank(white screen) with only the mouse cursor working, It does not go away unless I hard-restart.@kirel commented on GitHub (May 7, 2024):
@ntua-el19019 did you set your VRAM to a high enough value in the BIOS? I set UM_SPECIFIED to 16 Gigs. And here is my full docker-compose.yml
@DimitriosKakouris commented on GitHub (May 7, 2024):
Let me try to increase the VRAM.
@DimitriosKakouris commented on GitHub (May 7, 2024):
Damn it worked!Also changing the boot parameter to
amdgpu.sg_display=0removed the blank screen!Thank you.
@kirel commented on GitHub (May 7, 2024):
@rjmalagon do you have your updated Dockerfile somewhere?
@DocMAX commented on GitHub (May 7, 2024):
Haven't looked into it a while. Seems some text changed, but still doesn't work if gpu memory is set to "auto" in BIOS.
Anything i can do?
@qkiel commented on GitHub (May 8, 2024):
Ollama thinks you have too little VRAM available, even though llama.cpp can support UMA and use your RAM. The workaround is to compile ollama with two little changes in the source code. The solution is just a few posts above in this thread:
https://github.com/ollama/ollama/issues/2637#issuecomment-2067766641
@rjmalagon commented on GitHub (May 9, 2024):
If you don't mind too much, you can use the current Dockerfile and modify gpu/amd_linux.go
In this example, I multiply by 15 my current 8GB iGPU VRAM to match my 120GB physical RAM (128GB - 8GB)
And modify llm/generate/gen_linux.sh to add "-DLLAMA_HIP_UMA=on" to the ROCM cmake defs.
Then just use docker build with --build-arg=AMDGPU_TARGETS="gfx900" (replace target with your AMD APU)
I can share you a lightly modified ollama docker image if you only want to test it.
@kirel commented on GitHub (May 9, 2024):
There is actually another undocumented env var
OLLAMA_CUSTOM_ROCM_DEFS(at least inmain). I was able to compile withAdapt to you APU obvsly!
And then my only source code modification is
totalMemory = ...@rjmalagon commented on GitHub (May 9, 2024):
@kirel Yup, your way is cleaner than my hackish way. I have a highly modified environment for special porpoises and quickly adapted a succinct way to replicate the essentials for AMD APU support for this thread, your post is way easier for others to follow.
@kirel commented on GitHub (May 9, 2024):
I've been trying to understand how at least in linux ollama understands the memory available. Now in my case reading from
/sys/class/kfd/kfd/topology/nodes/0/mem_banks/0/propertiesin the line that starts withsize_in_bytesI can get the CPU memory and from https://www.kernel.org/doc/html/v4.19/gpu/amdgpu.html I interpret 3/4 of that is what can be allocated. So changed the line tofor me now and with 64GB of RAM I can now load
llama3:70bfully into the "gpu". It's slow but works.So a proper solution should understand that we have an iGPU and collect
totalMemoryautomatically roughly like I did manually now?@santo998 commented on GitHub (May 16, 2024):
I have Ryzen 5 3400g APU and configured VRAM = 8 GB in BIOS.
However, ollama it's only using CPU.
I'm using ollama for Windows (0.1.38 version, without Docker neither anything else) running "ollama run phi3" from command line.
Here you can see my task manager and "ollama ps" command output:
@rjmalagon commented on GitHub (May 18, 2024):
@santo998 common Ollama binaries will not work for you. Your GPU is old (older than mine) and unsupported by default (just as mine). We did a custom Ollama build with some unofficial changes to the GPU memory count and forced old GPU and main RAM support within ROCM.
I am not familiar with Windows building, but I may be able to help you with the necessary changes to the Ollama source and build scripts.
@santo998 commented on GitHub (May 18, 2024):
@rjmalagon why those changes aren't included in official ollama source code?
Or at least have a fork...
I can try it on Ubuntu too. Which are the changes I should make?
@rjmalagon commented on GitHub (May 18, 2024):
I can only guess, but strongly feel because ROCM support for GPU compute in old AMD iGPU is incomplete and lacking.
Even with a newer and stronger iGPU, like on Ryzen 5600G, is not that faster (~10% to ~15%) than pure CPU (6 cores /12threads) and I doubt that Ryzen 3400 GPU will take you to significant performance. Main RAM speed is a bottleneck by itself too. Ollama on modest AMD iGPU is useful in very specific local use on small models because you get free CPU to spare while inferencing. Big models (70b+) and you get nothing useful from a modest AMD iGPU.
Maybe for newer and future AMD iGPUs you can get notable performance and will be enough for someone to code Ollama build routines for it.
To enable Ollama for AMD iGPU you will need these three things:
Enable using main memory for the iGPU with "-DLLAMA_HIP_UMA=on"
Trick ollama about how much VRAM you have because you will use main RAM, VRAM measurements are meaningless here
Force ollama build for your iGPU
Take read of this post https://github.com/ollama/ollama/issues/2637#issuecomment-2067766641 and this one https://github.com/ollama/ollama/issues/2637#issuecomment-2102113224, that are more relevant for your consideration.
@arilou commented on GitHub (May 19, 2024):
I have just ran into this thread I have been playing with llama.cpp
and ollama as well. Here is my setup:
I added the following patch to ollama:
Then I built using Docker:
Then I extracted the binary from the docker image using
docker exportIn additioin I downloaded the ROCM parts:
https://github.com/ollama/ollama/releases/download/v0.1.38/ollama-linux-amd64-rocm.tgz
Next part is I took https://github.com/segurac/force-host-alloction-APU and built it:
And copy the output libforcegttalloc.so to the same folder
So now you should have a folder with:
Now to start ollama I would use: (change the VRAM according to your system memory), Now you dont need to take away alot of memory from your system
ram to the VRAM, as it will simply use the system RAM, it should not affect the speed as the iGPU sits on the same SoC as the CPU...
I have also tried playing with XNACK which as far as I understand should make
memory access faster as page "migrate" to the VRAM (with XNACK I think
libforcegttalloc wont be required).
But I could not get it to work properly as there seems to be a bug in the amdgpu driver, if you want to play with it you need to add to your
kernel params
And then to execute ollama change HSA_XNACK to 1.
(To validate you have XNACK you can execute
HSA_XNACK=1 rocminfo | grep xnack, you will see :xnack+ instead of :xnack-)I opened a ticket for amd driver in hope to get some feedback from them:
https://gitlab.freedesktop.org/drm/amd/-/issues/3386
If you want to "force" enabling XNACK you can change in the docker build from AMDGPU_TARGETS="gfx900" to AMDGPU_TARGETS="gfx900:xnack+"
Will be happy to your results, sorry I did not do the extra step to make it all run from the docker (aka passing the env and libforcegttalloc.so into
the docker but ran out of time to play with it more).
@qkiel commented on GitHub (May 19, 2024):
This is a great idea, we can now change
totalMemoryon the fly.I would only make
OLLAMA_VRAM_OVERRIDEmore human-readable. Value oftotalMemoryis compared toIGPUMemLimitfromollama/gpu/gpu.gofile, which is defined like this:Similarly, in your code you can calculate
totalMemorylike this:And set env
OLLAMA_VRAM_OVERRIDE=55.Question about the .git folder
When I download source for ollama release, for example 0.1.38, there's no
.gitfolder in the archive. I usually get it from master branch:But that didn't work for ollama 0.1.38:
Is there a way to re-create a proper
.gitfolder? Simplegit initin theollama-0.1.38folder doesn't work.@arilou commented on GitHub (May 20, 2024):
It's simpler to just clone the git repository and checkout to whatever tag / branch you want to build
@qkiel commented on GitHub (May 20, 2024):
You're right, you can clone repo with a particular tag:
Thanks for the tip.
@DocMAX commented on GitHub (May 20, 2024):
Meanwhile is it possible to fix this? (Leaving VRAM to "auto" in BIOS)
time=2024-05-20T10:04:10.752Z level=INFO source=amd_linux.go:233 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"@arilou commented on GitHub (May 20, 2024):
Not sure I'm following if you use the VRAM override env and also libforcegttalloc.so to expose your entire system memory it wont show this print.
The code verifies you have at least 1Gb of ram... pretty sure your system memory has more than 1Gb
@DocMAX commented on GitHub (May 20, 2024):
Yes, i have 64GB RAM. Full output:
@arilou commented on GitHub (May 20, 2024):
It seems like you either ran without OLLAMA_VRAM_OVERRIDE=$((102410241024*55)) or you did not rebuild with my patch diff..
@DocMAX commented on GitHub (May 20, 2024):
I'm using docker image ollama/ollama:rocm
@arilou commented on GitHub (May 20, 2024):
What I suggested is a change over ollama, OLLAMA_VRAM_OVERRIDE, this is not part of ollama today...
@DocMAX commented on GitHub (May 20, 2024):
OK, then i have to wait for the docker version, because i want stay on docker.
@qkiel commented on GitHub (May 20, 2024):
Curious question - why do you use libforcegttalloc.so with ollama? Isn't it only intended for use with applications that require PyTorch? Without LD_PRELOAD everything should work exactly the same.
@arilou commented on GitHub (May 20, 2024):
Well the reason is that if you will look when you compile ollama/llama.cpp (even with
LLAMA_HIP_UMA=on)It will charge the VRAM memory (you can use radeontop/amdgpu_top)
That limits you to amount of VRAM you can assign in the BIOS (max is 16Gb)
But let's say on my system I have 64Gb of memory, there is no reason I wont be able to load much larger models like 50Gb
After all there is no "real" meaning to those 16Gb being VRam, they sit on the say DIMM... so by using the trick done
in libforcegttalloc.so you basically charge memory from the OS but since we are all here with APUs it's the same thing
we just need to go through hips in order for the iGFX to understand it can access those pointers regularly
So with this trick you can now load models much bigger, and "steal" less memory from your system for your GPU.
So for example I loaded llama3:70b-instruct-q4_K_M which is about 40Gb, and I still get 0.8tps which is fairly ok for the power of our iGPU...
@qkiel commented on GitHub (May 20, 2024):
Interesting. I have an AMD 5600G APU with UMA_AUTO set in UEFI/BIOS (which means 512 MB is taken from my RAM for VRAM). On my Ubuntu 22.04 the libforcegttalloc.so is required only for Stable Diffusion apps like Fooocus.
Running ollama with or without LD_PRELOAD makes no difference in my case. VRAM is kept at 512 MB, models are loaded to RAM, and the compute is done on GPU.
Have you tried running ollama without LD_PRELOAD?
@arilou commented on GitHub (May 20, 2024):
Interesting for me it crashes if I tried to module bigger than the allocated VRAM, I wonder if it's an issue because in Fedora 40, the default ROCm is 6.0.
@Jonnybravo commented on GitHub (May 20, 2024):
Hey, I also have a 5600G and I wanted to make use of it. I read the whole thread, but I'm confused about which steps should I do to change the build in order to make this work with this iGPU.
Is there a version already pre-compiled that everyone can use? I tried to follow @qkiel steps, but this fails miserably when I try to compile and build using go...
@qkiel commented on GitHub (May 20, 2024):
When you download source code and compile it with commands below, do you still get an error?
@Jonnybravo commented on GitHub (May 20, 2024):
Last time I ran the generate command I got this:
I didn't try anything else after this. Before I got here, the generate command would give me an error related to the wrong version of go being installed.
@qkiel commented on GitHub (May 21, 2024):
I updated my instruction a bit, see if it works this time. If not, I can send you my binary.
@Jonnybravo commented on GitHub (May 21, 2024):
I followed everything again and made sure about the versions of the requirements. This time I managed to pass the generate step, but it seems that I have a problem with the goroot path when I try to run the builder command:
Could it be because of this being installed in a custom folder?
EDIT: Meanwhile I tried to point both variables to different paths, but now I have an error on GOPROXY. @qkiel, did you also experienced this? What are your paths for each variable GOROOT, GOPATH and GOPROXY?
@qkiel commented on GitHub (May 21, 2024):
This warning doesn't matter, just run ollama:
Then in a second terminal window:
@Jonnybravo commented on GitHub (May 21, 2024):
I did that and I believe I'm still running it using the CPU. Is there a way to confirm that I'm running this using the GPU?
@qkiel commented on GitHub (May 21, 2024):
Look at GPU utilization. I use nvtop for that (also available as a snap):
@Jonnybravo commented on GitHub (May 21, 2024):
Installed it, but I think I messed up on the ROCm installation. Do I need to do some extra step besides this?:
sudo apt update wget https://repo.radeon.com/amdgpu-install/6.1.1/ubuntu/jammy/amdgpu-install_6.1.60101-1_all.deb sudo apt install ./amdgpu-install_6.1.60101-1_all.debIf it helps, I'm using windows with wsl.
@qkiel commented on GitHub (May 21, 2024):
Unfortunately, there is no equivalent of the
HSA_OVERRIDE_GFX_VERSIONenvironment variable on Windows, so you cannot present your iGPU to ROCm as supported.Secondly, you install ROCm differently on Windows.
I don't think it can be done on Windows the same way as on Linux.
Edit:
Besides that, you can this to install ROCm:
But chances of success are very slim.
@Jonnybravo commented on GitHub (May 22, 2024):
Yeah, I tried it and got the same problem mentioned on this thread: https://github.com/ROCm/ROCm/issues/3051
And what about a Virtual Machine running linux, @qkiel? Do you think that could work or am I stretching too much here?
@qkiel commented on GitHub (May 22, 2024):
I have no idea. If you have a regular GPU, then you can pass iGPU to the VM and that could work. I don't think that 5600G supports SR-IOV so you can't partition iGPU and pass only part of it.
@xwry commented on GitHub (May 26, 2024):
You can try radeontop, it works fine on iGPU from AMD, -c flag ads colorized output.
sudo apt install radeontop@smellouk commented on GitHub (Jun 19, 2024):
@qkiel thx for this tip 🙏
I followed the steps as you describe but I'm facing this error
My current setup:
What is crazy now, is that if I install docker in this LXC and run
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama --device=/dev/kfd --device=/dev/dri/renderD128 --env HSA_OVERRIDE_GFX_VERSION=9.0.0 --env HSA_ENABLE_SDMA=0 ollama/ollama:rocmeverything work great. More details here: https://github.com/ollama/ollama/issues/5143#issuecomment-2179538572@qkiel commented on GitHub (Jun 20, 2024):
@smellouk I have tutorials on how I install ROCm and Ollama in Icnus containers (fork of LXD):
Do you do this similarly or somehow differently?
@smellouk commented on GitHub (Jun 21, 2024):
@qkiel I used that article and I just noticed you are the same owner 😆, that Ai tutorial: ROCm and PyTorch on AMD APU or GPU led to me here. I follow everything and same issue 😢
@qkiel commented on GitHub (Jun 21, 2024):
When you run this command, what do you see?
Do
card0andrenderD128belong tovideoorrendergroup?If they belong to
root root, that means you didn't set a propergidwhen adding the GPU device to the container. For Ubuntu containers, that would be:Or your user inside the container doesn't belong to
videoandrendergroups. For Ubuntu containers, that would be (this requires a restart of the container to take effect):@smellouk commented on GitHub (Jun 22, 2024):
@qkiel permissions are correct as expected
@arilou commented on GitHub (Jun 25, 2024):
@dhiltgen perhaps you want to consider adding this patch to ollama? (I dont have any NVIDIA computer to test and do the same for CUDA, or whatever Intel has/will have) to test this with, but i know it works well for AMD
@wszgrcy commented on GitHub (Jul 28, 2024):
I can set environmen
HSA_OVERRIDE_GFX_VERSIONrun ollama bymsys2,And it can also correctly recognize the graphics cardHowever, when executing the 'ollama run', it prompts that gfx1103 is missing
@dhiltgen commented on GitHub (Aug 1, 2024):
@wszgrcy you've hit #3107
@MaciejMogilany commented on GitHub (Aug 7, 2024):
on ollama version 0.3.2 with kernel 6.10.3-3 on Arch based distro and ollama Environment variables as below
Environment="OLLAMA_HOST=0.0.0.0"
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.1"
Environment="HSA_ENABLE_SDMA=0"
Environment="HCC_AMDGPU_TARGET=gfx1101"
Environment="OLLAMA_FLASH_ATTENTION=1"
Probably only needed is Environment="HSA_OVERRIDE_GFX_VERSION=11.0.1" for RDNA 3 GPU like 780M
no other hack applied (official ollama) on Ryzen 7 7840HS with 780M integrate GPU, llm-s load straight to GTT memory bypassing allocated VRAM as below
ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.1:8b-instruct-q5_K_M 3cfab818fbe8 7.7 GB 100% GPU 4 minutes from now
amdgpu_top
...
Memory Usage
VRAM: [ 1309 / 16384 MiB ] GTT: [ 7215 / 40105 MiB ]
...
Problem now is that Memory available for ollama is calculated by allocated VRAM in bios in my case I can use only 16GB despite having available 42GB GTT and wasted 16GB unused VRAM which is bypassed on kernel 6.10.
It's there any chance to alter script allocating total memory for APU on 6.10+ kernels based on GTT not VRAM?
This will allow using a default carveout of VRAM of 0.5 GB, so no memory is wasted and load full 70B q4 model to GPU memory (on 96GB RAM system)
At this stage, using APU GPU is feasible when the model fits in GTT memory, which on 96GB RAM system is around 47GB with 1GB VRAM carveout. In my opinion if model fits in GTT use GPU if not use CPU. I find some instability in big quantized models above 70B that offload partially to GPU on a system with 96GB of RAM. This logic will mitigate this, as splitting model for CPU and GPU require 2x of RAM (two copies one for CPU one for GPU) on consumer APU systems (they do not support xnack)
@sebastian-philipp commented on GitHub (Aug 7, 2024):
Relates to https://github.com/ggerganov/llama.cpp/issues/7145 . I'd search for
DLLAMA_HIP_UMAin the comments above as they re-compile llama with the needed flag. And yes, I'm also still on CPU-only mode here and waiting for any progress with Vulkan or this issue.@MaciejMogilany commented on GitHub (Aug 7, 2024):
This is mitigated in 6.10 kernel, that's why I mention it (no hack needed) https://www.phoronix.com/news/Linux-6.10-AMDKFD-Small-APUs
@MaciejMogilany commented on GitHub (Aug 7, 2024):
I test ollama with compile flag DLLAMA_HIP_UMA and it allow all GTT memory to be used on 6.10 kernel
@sebastian-philipp commented on GitHub (Aug 7, 2024):
still pretty unfortunate that llama needs to be recompiled for this.
@sebastian-philipp commented on GitHub (Aug 7, 2024):
pretty please @Djip007
@MaciejMogilany commented on GitHub (Aug 9, 2024):
I made a mistake dont't know why one compilation gave me full GTT memory as GPU memory on ollama. I made so many compilations and tinkering in code.
I clone a fresh copy of Ollama and got the results:
ollama v0.1.3.4 commit
de4fc29and llama.cpp commit 1e6f6544 aug 6 2024 no UMA applied Ollama sees only 16GB GPU memory (I set it in bios). Load LLM model to GTT memory on kernel 6.10ollama v0.1.3.4 commit
de4fc29and llama.cpp commit 1e6f6544 aug 6 2024 with flag -DGGML_HIP_UMA=onOllama sees only 16GB GPU memory, amdgpu_top doesn't see GTT or VRAM memory filled when LLM model is loaded.
On 6.10 kernel DGGML_HIP_UMA=on is not needed to use shared GTT memory.
It's there any chance to alter script gpu/amd_linux.go allocating total memory for APU on 6.10+ kernels based on GTT not VRAM?
@MaciejMogilany commented on GitHub (Aug 9, 2024):
Made PR that enable full GTT allocation on gfx1103, gfx1035 APUs on 6.10+ kernels
@Djip007 commented on GitHub (Aug 14, 2024):
As @MaciejMogilany report on linux:
GGML_HIP_UMA=onllama.cpp use RAM not VRAM or GTT.VRAM+GGTAnd you can ajust GTT with linux boot :
amdgpu.gttsize=65536allow use of 64Go of GTT (by default it is 1/2 of RAM)(for exemple @MaciejMogilany can reduce BIOS VRAM to 2/4Go, and configure
amdgpu.gttsize=81920to have 80Go of GTT.)@MaciejMogilany commented on GitHub (Aug 14, 2024):
only GTT
If the model exceeds 50GB (or maybe is bigger than 1/2 of RAM) ollama becomes unstable. Is increasing GTT size fasiable? Mayby if you need to fit few small models with small context windows.
@Djip007 commented on GitHub (Aug 14, 2024):
GGT size can be increasing with boot kernel param :
amdgpu.gttsize=NNNin MByte.I always thinks it is rocblas that is unstable (because unsuported?) I have to made more test with llamafile.
@Djip007 commented on GitHub (Aug 14, 2024):
Strange...
I allocate 56Go of GGT:
(test with llamafile, not ollama)
But don't look I can use more GTT than 1/2 of RAM for hip_alloc... (Linux framework 6.10.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Aug 5 14:30:00 UTC 2024 x86_64 GNU/Linux)
And have sometime crache even with small model (Mistral-7B-Instruct-v0.3.F16.gguf) and tinyblas (ie without rocblas)
UMA can alloc more RAM.
May be a "bug" in amdgpu/kfd driver.
(Note: test with
--no-mmap)@Djip007 commented on GitHub (Aug 20, 2024):
OK I have somethings now that "work":
now:
instal and activate rocblas-gfx1103:
module load rocm/gfx1103I test with llamafile-0.8.13(pach) and Meta-Llama-3-70B-Instruct.Q4_K_M.gguf (42520393152 size)
so it succeed to load the full model on GPU/GTT. I do not have been limited by 1/2 of RAM... look to have allocate ~42Go on GTT.
I do not have ollama, but it may be good news!
@MaciejMogilany commented on GitHub (Aug 23, 2024):
With
Around 80GB of GTT
the biggest model I was able to load was mistral-large:123b-instruct-2407-q3_K_S
it takes around 71GB of RAM
but only with llamafile and --no-mmap flag
This same model with mmap almost doubles the memory requirement as it adds 50GB CPU buffer that make paging memory indefinitely
ollama doesn't have the ability to pas --no-mmap to llama.cpp. The maximum I was able to load was mistral-large:123b-instruct-2407-q2_K (47 GB of GTT) and often crashed with bigger context windows. Max memory on small APU is 96GB. 47GB model in GTT + CPU buffer around 40GB + system +KV buffer = :( Thre is future requests fo add no-mmap to ollama
Another thing is instability. The worst offenders are gemma2 models. gemma2 27B crash graphic on load with 50% probability on ollama but lama3.1 70B q4 run solid.
@Djip007 commented on GitHub (Aug 24, 2024):
Ho sorry, I forgot to report that... Yes it is needed (until we manage to get direct acces on it (there was a test, so I know it can work...)
well... not really:
you have 4 DDR so 192Go is possible 😉
Yes... I don't understand why... don't know how to debug that, and no idea what it can be. May be I can find some time to create specific backend for RDNA3 APU...
@rmcmilli commented on GitHub (Aug 25, 2024):
Added the patch from your PR and it works great so far on my AMD 7840U. Thanks!
@MaciejMogilany commented on GitHub (Aug 26, 2024):
Can You test again with last commits? I have added --no-mmap for APU and subtracted GTT from RAM size. For the first time, I was able to load to GPU (partially) mistral-large:123b-instruct-2407-q4 without hang or crash of ollama. I am not sure if this is a correct approach, but from initial testing ollama seems more robust.
Default GTT runs more stable for me
@N4S4 commented on GitHub (Aug 26, 2024):
Hello,
not sure if this is the right place, I am experiencing an issue where when setting the override with systemd edit ollama.service it dores not take place after restart on ubuntu 24, ryzen 9 7940hs radeon 780m
output from journalctl -u ollama --no-pager
ollama[49742]: 2024/08/25 08:15:47 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:*
HSA_OVERRIDE_GFX_VERSION: remains empty
could someone help me?
@MaciejMogilany commented on GitHub (Aug 26, 2024):
@rmcmilli commented on GitHub (Aug 26, 2024):
I've rebuilt with the new commit but now i'm having trouble getting
HSA_OVERRIDE_GFX_VERSIONto take. Using the last image I built looks fine but the new one doesn't ignore the compatibility check with the environment variable.I'm using docker btw so it might be a local issue.
Edit: Confirmed local issues. Rebuilt and running now without issue.
@N4S4 commented on GitHub (Aug 26, 2024):
strange i did the same steps before but now worked, variable is set
now i get from journalctl -u ollama.service | grep gpu output:
set DEBUG
INFO source=amd_linux.go:274 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"not sure why is skipping my igpu any idea?
@MaciejMogilany commented on GitHub (Aug 26, 2024):
This is not a place for this.
You may go to your bios and increase VRAM size as ollama officially
bypasses integrated graphics. It does this by checking if VRAM is smaller
than 1GB. And AMD set VRAM to 512MB by default. Another way is to install
6.9.9+ kernel and compile ollama from this draft
ed05e507bcthen changing VRAM in bios is not needed
pon., 26 sie 2024, 21:40 użytkownik Renato @.***>
napisał:
@Xeroxxx commented on GitHub (Sep 15, 2024):
This makes it working for me on my AMD Ryzen 5 Pro 2400GE! Thank you. Just need to increase VRAM in BIOS.
Running on three Node Kubernetes with AMD Device Plugin.
@yookoala commented on GitHub (Oct 4, 2024):
Is there a way to force the shared memory allocation so model can be run on GPU instead of CPU?
I'm running a Ryzen 8700g with 96GB RAM. But I cannot seem to run a ~40GB model on GPU.
@MaciejMogilany commented on GitHub (Oct 4, 2024):
compile this Branch
https://github.com/ollama/ollama/pull/6282#issue-2457641055 everything is
in PR description
pt., 4 paź 2024, 16:23 użytkownik Koala Yeung @.***>
napisał:
@DocMAX commented on GitHub (Oct 5, 2024):
GTT possible with Kernel 6.8.12? I am on Proxmox, thats why...
@ericcurtin commented on GitHub (Oct 5, 2024):
But how much VRAM do you have? GPU workloads use VRAM rather than RAM
@yookoala commented on GitHub (Oct 7, 2024):
GPU in an APU has no dedicated VRAM. It uses shared memory on the motherboard. I want to allocate as much as possible to the GPU for the use here (say 64GB).
@robertvazan commented on GitHub (Nov 2, 2024):
Are you sure this actually uses the BIOS-defined VRAM rather than GTT? Can you check it with
radeontop? If I try this with fixed 8GB VRAM allocation in BIOS, Ollama will use iGPU, but the model is loaded in GTT while BIOS-configured VRAM is ignored. HSA_ENABLE_SDMA does nothing for me. If I don't reserve VRAM in BIOS, Ollama does not use the iGPU at all. I have Ryzen 5600G and Fedora.@Sebazzz commented on GitHub (Dec 4, 2024):
When I boot I see this:
Ollama won't use GTT memory? Or are you saying it needs to detect separate VRAM, but does not actually use it?
@robertvazan commented on GitHub (Dec 5, 2024):
@Sebazzz You have to build #6282 yourself for now to use AMD iGPU.
@Sebazzz commented on GitHub (Dec 7, 2024):
Okay, for anyone else attempting this. I got this working on a Minisforum UM480XT (AMD Ryzen 7 4800H with Radeon Graphics; gfx90c).
I followed these steps:
amdgpu-install --usecase=dkms,multimedia,rocm,opencl,openclsdk,mllib,hip,lrtmake -j 5; go build .LD_LIBRARY_PATHto take the compiledlibggml_rocmotherwise I ran into/tmp/ollama3477896540/runners/rocm/ollama_llama_server: error while loading shared libraries: libggml_rocm.so: cannot open shared object file: No such file or directoryWhen setting these vars for testing with
exportdon't forget to quote the versions (e.g.:export HSA_OVERRIDE_GFX_VERSION="9.0.0").Other things:
amdgpu_topis a great utility to watch GPU sage.Only thing I now need to check is how to install this in a more permanent fashion, specifically which files I need to install where.
@sebastian-philipp commented on GitHub (Dec 7, 2024):
@Sebazzz : Can you share some performance benchmarks with and without iGPU? (including some hardware infos)
@Sebazzz commented on GitHub (Dec 7, 2024):
I'm happy to. Is there a defined approach for this?
@Sebazzz commented on GitHub (Dec 7, 2024):
That last premise seems to be the case. It needs to detect the 1G VRAM but then dumps everything in GTT. Not sure if that matters for performance, but it is a waste of 1G of system memory.
@petrm commented on GitHub (Dec 9, 2024):
I applied following patch instead, as I have no use for dedicated video RAM:
@Cirius1792 commented on GitHub (Jan 7, 2025):
I don't know if there is any standard method but one that worked for me id the following:
for run in {1..10}; do echo "where was beethoven born?" | ollama run tinyllama --verbose 2>&1 >/dev/null | grep "eval rate:"; donefound here
@robertvazan commented on GitHub (Jan 7, 2025):
Prompt processing is heavily parallelized and its performance also depends on total length of the prompt. I would recommend testing with at least 1K input tokens. Try to ask for a summary of some article.
@DocMAX commented on GitHub (Feb 16, 2025):
TLDR; whats the currently easiest way to run ollama on a AMD APU (5800U)? Thanks.
@rjmalagon commented on GitHub (Feb 17, 2025):
Shameless plug... I pretend to maintain patches that facilitate the operation of Ollama on an AMD APUs.
https://github.com/rjmalagon/ollama-linux-amd-apu
@Sebazzz commented on GitHub (Feb 17, 2025):
How is that different than the PR?
Met vriendelijke groet,
Sebastiaan Dammann
Van: Ricardo Jesus Malagon Jerez @.>
Verzonden: Monday, February 17, 2025 6:29:12 PM
Aan: ollama/ollama @.>
CC: Sebastiaan Dammann @.>; Mention @.>
Onderwerp: Re: [ollama/ollama] Integrated AMD GPU support (Issue #2637)
Shameless plug... I pretend to maintain patches that facilitate the operation of Ollama on an AMD APUs.
https://github.com/rjmalagon/ollama-linux-amd-apu
—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2637#issuecomment-2663729627, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAK4FMMRBOTV4ZDEDV27Z332QIL6RAVCNFSM6AAAAABDTGRJCKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRTG4ZDSNRSG4.
You are receiving this because you were mentioned.Message ID: @.***>
[rjmalagon]rjmalagon left a comment (ollama/ollama#2637)https://github.com/ollama/ollama/issues/2637#issuecomment-2663729627
Shameless plug... I pretend to maintain patches that facilitate the operation of Ollama on an AMD APUs.
https://github.com/rjmalagon/ollama-linux-amd-apu
—
Reply to this email directly, view it on GitHubhttps://github.com/ollama/ollama/issues/2637#issuecomment-2663729627, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAK4FMMRBOTV4ZDEDV27Z332QIL6RAVCNFSM6AAAAABDTGRJCKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRTG4ZDSNRSG4.
You are receiving this because you were mentioned.Message ID: @.***>
@MaciejMogilany commented on GitHub (Feb 17, 2025):
If it's duct taped than make high quality path yourself. And don't use this
duct taped code on your container. Do not insult anyone who try to push
things further.
pon., 17 lut 2025, 20:29 użytkownik Ricardo Jesus Malagon Jerez <
@.***> napisał:
@lanwin commented on GitHub (Feb 18, 2025):
Is it really worth it? Mean is the performance of the APU really better then processing it directly on the CPU?
@AwesomenessZ commented on GitHub (Feb 18, 2025):
Yes its at least double imo on the 5600g. Plus other things can use the cpu during that time.
-------- Original message --------From: Steve Wagner @.> Date: 2/18/25 1:28 AM (GMT-08:00) To: ollama/ollama @.> Cc: Awe @.>, Manual @.> Subject: Re: [ollama/ollama] Integrated AMD GPU support (Issue #2637)
Is it really worth it? Mean is the performance of the APU really better then processing it directly on the CPU?—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
lanwin left a comment (ollama/ollama#2637)
Is it really worth it? Mean is the performance of the APU really better then processing it directly on the CPU?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
@lanwin commented on GitHub (Feb 19, 2025):
@AwesomenessZ would you share your setup? I tried to get it running on my 4750G and it felt slower.
@DocMAX commented on GitHub (Feb 19, 2025):
As far i could run AI tasks on my 5800U APU i didn't see any big performance impact. Almost the same as CPU. Anyone else has other experiences?
@MaciejMogilany commented on GitHub (Feb 19, 2025):
prompt ingestion is much quicker, and PC is cooler. make prompt to
summarize one page of text to see difference. Still APU is not up to the
game. Next gen should be better as latest AMD AI APU has 4 channel memory
(more bandwidth)
śr., 19 lut 2025, 13:06 użytkownik DocMAX @.***>
napisał:
@AwesomenessZ commented on GitHub (Feb 19, 2025):
@lanwin
Docker compose:
But in the past few weeks I have been having GPU driver crashes, unfortunately.
@lanwin commented on GitHub (Feb 19, 2025):
Strage. I had that working a few weeks ago. Now I get
"/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory"
@rjmalagon commented on GitHub (Feb 28, 2025):
Not really a problem, will work without it, but you can map this file from a local file, I use a copy from my host.
@malteneuss commented on GitHub (Mar 19, 2025):
Is it normal that the iGPU is slower than CPU? I have a AMD Ryzen 7 7735HS with AMD Radeon 680M gfx1035 and with e.g. model qwen2.5-coder:3b get 14 token/s on iGPU and 15 tokens/s on CPU. I'm using NixOS with Ollama 0.6.0 following the installation instructions in https://wiki.nixos.org/wiki/Ollama.
@taweili commented on GitHub (Mar 20, 2025):
Yes, for 7735HS, the iGPU gives about the same performance as the CPU in the tokens/s. The only advantage is that it freed up the CPU to process other things while the model running on GPU.
@ivanbaldo commented on GitHub (May 21, 2025):
Shouldn't this be closed now? Isn't AMD iGPU already available now?
@DocMAX commented on GitHub (May 21, 2025):
No, iGPU not supported yet
@rjmalagon commented on GitHub (May 21, 2025):
AMD iGPU is niche, on Linux (current kernel), there is support for GTT memory for ROCM, this allows use almost all main memory for GPU.
But it needs proper kernel configuration, and extra memory management routines on Ollama, without the later, Ollama use only the assigned VRAM by the UEFI/Firmware.
Just for info, Fedora 42 has all in place (newer linux kernel, current ROCM access, etc), but a conservative Ubuntu 22.04 LTS misses GTT support for ROCM.
Sent from Proton Mail Android
-------- Original Message --------
On 5/21/25 1:35 PM, DocMAX wrote:
@github.com>
@Crandel commented on GitHub (Jun 11, 2025):
When iGPU will be supported?
@ddpasa commented on GitHub (Jun 11, 2025):
you can use iGPUs via Vulkan. llama.cpp has supported Vulkan for months. You can either use the Vulkan fork of ollama, or better just use llama.cpp directly.
@ligjn commented on GitHub (Jun 25, 2025):
@ddpasa I didn't find the ollama branch named vulkan in the repository. Could you provide an address? Thank you.
@ddpasa commented on GitHub (Jun 25, 2025):
ollama doesn't support vulkan. the devs don't care.
which is okay, because ollama is just a thin wrapper for llama.cpp server. you can just use llama.cpp directly: https://github.com/ggml-org/llama.cpp
@Crandel commented on GitHub (Jun 25, 2025):
@ligjn You can try my fork here for AMD iGPU. It works very well. I've created Arch Linux AUR packages tooI'm giving up with maintaining my fork for AMD iGPU. I've switch to llama.cpp + llama-swap and I'm using Vulkan backend now. Looks like devs have no interest in anything except Nvidia and corporate support at the moment.
@ericcurtin commented on GitHub (Oct 13, 2025):
We added Vulkan support to docker model runner, so we cover this feature:
https://www.docker.com/blog/docker-model-runner-vulkan-gpu-support/
We've also put effort to putting all our code in one central place to make it easier for people to contribute. Please star, fork and contribute.
https://github.com/docker/model-runner
We have vulkan support. You can pull models from Docker Hub, Huggingface or any other OCI registry. You can also push models to Docker Hub or any other OCI registry.
@Djip007 commented on GitHub (Oct 21, 2025):
if someone want to make a patch for AMD iGPU there is some change on llama.cpp:
03792ad936/ggml/src/ggml-cuda/ggml-cuda.cu (L110-L135)Now it is simple to use all RAM on iGPU you only need to set env var:
GGML_CUDA_ENABLE_UNIFIED_MEMORY=ONno need for vRAM/GGT,so a simple patch can be:
or something like that. It may even work on Windows.
@Djip007 commented on GitHub (Oct 26, 2025):
just can't figure where the device RAM is compute now... did it use what ggml report now?
@dhiltgen commented on GitHub (Nov 18, 2025):
In 0.12.11 Vulkan is now included in the official binaries, but still experimental. To enable, set OLLAMA_VULKAN=1 for the server. https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-do-i-configure-ollama-server
@rmcmilli commented on GitHub (Nov 19, 2025):
Testing now, but fwiw this has been far more stable than rocm in my limited testing on a 780m.