mirror of
https://github.com/ollama/ollama.git
synced 2026-05-05 23:53:43 -05:00
Closed
opened 2026-05-03 08:34:17 -05:00 by GiteaMirror
·
323 comments
No Branch/Tag Specified
main
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-mlx-decode-checkpoints
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#62384
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @deadmeu on GitHub (Oct 8, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/738
Originally assigned to: @dhiltgen on GitHub.
I have a 7900XT and would definitely love to have ROCm support. It seems like it might be coming with https://github.com/jmorganca/ollama/pull/667?
I couldn't find a dedicated issue for this so I'm creating this one to track it.
Edit: For those interested in this feature, follow https://github.com/jmorganca/ollama/pull/814.
@jmorganca commented on GitHub (Oct 8, 2023):
Thanks for creating an issue. Keep an eye on https://github.com/jmorganca/ollama/pull/667
@65a commented on GitHub (Oct 8, 2023):
It works if you apply that patch locally and follow the updated readme/build instructions. My w7900 unfortunately had to go back to AMD for replacement because it liked to hang up in VBIOS during some boots, but I'd love to hear if you can patch locally and run it successfully. I have a RX6950 and Instinct Mi60 I am testing with currently. Should be as easy as (assuming you have ROCm and CLBlast installed):
And that should give you a ROCm-compatible ollama binary in the current directory.
Some notes: if ROCm fails, it will fall back to CPU, so you want to look carefully at the logs. Let me know if you see that happen (the symptom of course would also include low tokens/s). For reference, I just tested these instructions and I get about 11 tok/s on gfx906 (Mi60) with a 25gb q8_0 model, a 7900 (gfx1100) should do much better.
@deadmeu commented on GitHub (Oct 9, 2023):
@65a thanks for the quick help. Unfortunately, it does seem to be falling back to the CPU. Here's what I did:
git clone --recursive https://github.com/65a/ollamasudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast goROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...go build -tags rocm./ollama serveand then in another terminal./ollama run codellama:34bHere's what I'm getting in the output of the server:
Two lines that stand out to me are:
2023/10/09 14:58:20 llama.go:283: llama runner not found: stat /tmp/ollama1555236189/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory{"timestamp":1696827500,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}As a side note I found that go was running single-threaded. Is there an easy way to add multi-threading to the
go generateandgo buildcommands?@65a commented on GitHub (Oct 9, 2023):
Looks like a build problem, see the not found. I actually did the same thing, make sure
-tags rocmis provided for both go generate and go build.@deadmeu commented on GitHub (Oct 9, 2023):
Ok, I rebuilt (and ran) the binary with theEdit: Never mind - I had the wrong paths set.ROCM_PATHandCLBlast_DIRenv vars included and am getting this new warning:2023/10/09 15:11:40 routes.go:599: Warning: GPU support may not enabled, check you have installed install GPU drivers: rocm-smi command failedThis might be an issue with the installed ROCm packages on my system. I have this strange issue where, even though I have packages like
rocm-smi-libandrocminfoinstalled, I cannot run them:@65a commented on GitHub (Oct 9, 2023):
Where is your rocminfo binary? Set ROCM_PATH when running the binary...for me it's at /opt/rocm/bin/rocminfo
@65a commented on GitHub (Oct 9, 2023):
Note also it wants
rocm-smi, so make sure that is located at ROCM_PATH/bin/rocm-smi@65a commented on GitHub (Oct 9, 2023):
It's working? Good to hear. At least on Arch, /opt/rocm/bin is not on the path, so it only works if you add it to PATH, or call it directly with the full path to the binary, which is the approach the code uses (also safer generally). Is your SDK also installed there? If not I can test against multiple distro-specific fallbacks if you're on a fairly mainstream distro.
@deadmeu commented on GitHub (Oct 9, 2023):
Oops, I had the wrong paths set earlier. I regenerated the tags and rebuilt with the correct paths
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlastbut I still get the message{"timestamp":1696829167,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}I can confirm that
/opt/rocm/bin/rocm-smiis where myrocm-smibinary is.@65a commented on GitHub (Oct 9, 2023):
Kill the binary, and do a clean run, and attach the log here. Usually you will see two llama.cpp logs, first the ROCm one, which will have some error, then it will start again with a CPU one.
@65a commented on GitHub (Oct 9, 2023):
I think I saw it correctly read the VRAM in your first log though.
@deadmeu commented on GitHub (Oct 9, 2023):
Are there any other log files written other than what's being printed in the terminal?
@65a commented on GitHub (Oct 9, 2023):
Just the terminal one is good. I'd expect to see
But instead with some error after offload, in the first part of the log, and the GPU no compiled error only after that run failed.
@deadmeu commented on GitHub (Oct 9, 2023):
And then when I run
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama run codellama:34bin another terminal I get the following printed out in the server's terminal:I don't see any
llm_load_tensorsmessages.@65a commented on GitHub (Oct 9, 2023):
This is your problem:
llama runner not found: stat /tmp/ollama1663762207/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directoryFor some reason the binary you are running doesn't have the rocm runner embedded in it. Can you try deleting ollama and generating/building again and ensure
-tags rocmare set? Also,ls llm/llama.cpp/gguf/build/rocm/bin/should show a fileollama-runner@65a commented on GitHub (Oct 9, 2023):
If the file isn't there, something went wrong with go generate. If it is there, something went wrong with go build.
@65a commented on GitHub (Oct 9, 2023):
Actually, since we see the VRAM check work, I think go generate either didn't have
-tags rocm, or something weird happened there...did it look like it maybe failed or something?@65a commented on GitHub (Oct 9, 2023):
Here's what I run locally, your paths may vary (note the ./... after go generate also)
There should be a bunch of CMake output, first for CPU and then a second run that talks about HIPBLAS
then
That should result in a binary with the correct runners on a clean checkout.
@deadmeu commented on GitHub (Oct 9, 2023):
Sorry I'm not very familiar with go development. Is there a clean command which will clean up everything from both the
go buildandgo generatecommands? Otherwise I may try to start over from a new directory 😛Also, I can confirm
llm/llama.cpp/gguf/build/rocm/bin/ollama-runnerexists.@65a commented on GitHub (Oct 9, 2023):
You can just run them again, at least it works for me. I think you should see the words hipblas go by on the generate command. A new directory isn't a terrible idea, because CMake does stash environment variables everywhere, so that might be part of the issue here, if rebuilding with the commands above still is missing the runner. I'll also do a clean checkout and see if I am missing something.
@deadmeu commented on GitHub (Oct 9, 2023):
It looks like hipBLAS is working:
-- HIP and hipBLAS found.Here's the complete output for the
go generatecommand:generate-log.log
@deadmeu commented on GitHub (Oct 9, 2023):
Do you know why the
/tmp/ollama1663762207/path is being prefixed when trying to look up/llama.cpp/ggml/build/rocm/bin/ollama-runner?@65a commented on GitHub (Oct 9, 2023):
I am noticing something weird on my clean checkout, I may have borked a merge or something let me poke at it a bit. Thanks again for testing. I can do ROCm-assisted inference with the result, but it looks like it ignored my build tags or something...Wrong code directory. Human error is real.The path is actually an embedded binary, or should be, inside
ollamausing go-embed I think.@65a commented on GitHub (Oct 9, 2023):
Generate output looks similar to my working output, I suspect if you run the ollama-runner directly you would get gpu inference on it. So we need to figure out why the
go build -tags rocmisn't embedding your binary...Starting my clean checkout over, so I'll try to reproduce as well.@deadmeu commented on GitHub (Oct 9, 2023):
Are there any other dependencies required which I may be missing? Maybe something required for go embeds to work?
@65a commented on GitHub (Oct 9, 2023):
I don't think so. Here's my log starting with
go build -tags rocmand running inference successfully: https://pastebin.com/p0ZpqFEEAre you running the ollama by typing
./ollama serve?@deadmeu commented on GitHub (Oct 9, 2023):
Yes, with environment variables included:
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama serveIt might be worth mentioning I also have ollama installed from the Arch repo but for this ROCm debugging session I've been running the locally built binary.
@65a commented on GitHub (Oct 9, 2023):
ab0668293c/llm/llama.go (L28)the embed is right here, given you have a binary that matches that glob, I would expectgo build -tags rocmto contain it.@65a commented on GitHub (Oct 9, 2023):
Were you able to try again with a rebuilt binary? I don't think the packaged version also being installed matters, so long as you invoke the right one.
@deadmeu commented on GitHub (Oct 9, 2023):
I've started fresh with a new directory, cloning your repo down and building again, and still nothing 😢
It looks like I'm on commit
87b11be, is that what you're using too?@65a commented on GitHub (Oct 9, 2023):
Yes. On arch, I have the following packages. Also make sure that you did git clone --recursive, though I think so. It might be interesting to try to manually run
./llm/llama.cpp/gguf/build/rocm/bin/ollama-runner, it's has a little webui and stuff, you can use--helpto see the flags. That will at least show the runner is working.Arch packages I have installed that start with ROCm:
@65a commented on GitHub (Oct 9, 2023):
I'm wondering if running the command is failing because of a missing runtime dep, and it is actually embedded correctly...
@deadmeu commented on GitHub (Oct 9, 2023):
Are you running an AMD CPU? rocminfo lists my 3900X as the first "agent" so maybe it's defaulting to that?
@65a commented on GitHub (Oct 9, 2023):
rocminfo lists my CPU as the first agent too, it's smart enough not to use it. I have AMD and Intel CPUs, integrated graphics might be an open problem though.
@deadmeu commented on GitHub (Oct 9, 2023):
What's the difference between ggml and gguf? Could the model I'm using be the problem?
@65a commented on GitHub (Oct 9, 2023):
Are you using a .gguf file? or a .ggml?
@deadmeu commented on GitHub (Oct 9, 2023):
I've tried running
llama2andcodellama:34bso whatever they are. I think they might be .ggml@65a commented on GitHub (Oct 9, 2023):
Aha. I don't have any GGMLs around, and GGML uses CLBlast only. Let me grab one and try it. GGUF is the newer format, if you want to grab one off hugging face and try that, we might be on to something finally!
@65a commented on GitHub (Oct 9, 2023):
Thanks again for testing stuff, I will double check all the ggml paths and go generate code, if there's an obvious bug or something while I download this q2k 7b model that is probably going to actually drool :)
@deadmeu commented on GitHub (Oct 9, 2023):
I'm very new to this space so thanks for your patience and help with getting set up. What would be the easiest way to run a GGUF model with ollama? I've been relying on
ollama runto automatically import and set everything up for me. From what I've seen on Hugging Face many models are provided in some multi-part .bin format.@65a commented on GitHub (Oct 9, 2023):
Dude, I am sorry! You have found a bug in my PR. Stand by, I'll push a fix.
@65a commented on GitHub (Oct 9, 2023):
tl;dr, when they renamed the ollama-runner from server, I did that for gguf but not ggml. I just force pushed a fix, let me stare at it again and make sure I got it all.
@65a commented on GitHub (Oct 9, 2023):
Fix pushed, I would use a new directory just to be sure, you want 22d1439328b30ddace69503339f2ce6043fc3e7d
@65a commented on GitHub (Oct 9, 2023):
Note that GGUF will be faster, since it will use ROCm, and not CLBlast. You shouldn't need any environment variables for
./ollama serve, since /opt/rocm will be used by default and the CLBlast_DIR is just a builld thing.@deadmeu commented on GitHub (Oct 9, 2023):
Ok I deleted the old directory and re-cloned (with
--recursive) then checked out22d1439328but it's still only running on the CPU on the llama2 GGML model 🤔 Edit:2023/10/09 17:10:15 llama.go:283: llama runner not found: stat /tmp/ollama3703397108/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directoryI've downloaded a GGUF model so I'll try that now as well.
@65a commented on GitHub (Oct 9, 2023):
Ok, let me try locally and see what I can find with ggml issues, I didn't test that much (as we unfortunately found out).
@65a commented on GitHub (Oct 9, 2023):
Ok, yeah, my fix didn't take for some reason, I don't see it in a clean checkout. Standby, I'll ensure I actually committed the right things before I share a commit 😄 I expect GGUF may work though already.
@deadmeu commented on GitHub (Oct 9, 2023):
I managed to import a GGUF model (
codellama-7b.Q2_K.gguffrom https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) but when I run it I still see a large amount of CPU utilisation ☹️@65a commented on GitHub (Oct 9, 2023):
You might need
PARAMETER num_gpu 50or similar in the modelfile, depending on how you are running it. Logs help, did it find the runner for gguf at least?@deadmeu commented on GitHub (Oct 9, 2023):
Sorry, there is definitely something interesting happening in the logs:
serve-my-model.log
@deadmeu commented on GitHub (Oct 9, 2023):
Could be related to https://github.com/ggerganov/llama.cpp/issues/3320?
@65a commented on GitHub (Oct 9, 2023):
I verified caae3cbab8136350da09c6d6e02c240c3a4db659 has my fix looking in the github UI, it also has a cuda fix of the same nature. You can try that one for GGML.
For the error, try adding
AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100in front of the go generate, though I'm not sure how to do that (or if it's necessary) for CLBlast/ggml. It should help with gguf.@65a commented on GitHub (Oct 9, 2023):
Re #3320, it's that error, try the
AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...and it will probably help.@deadmeu commented on GitHub (Oct 9, 2023):
It works!
AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100fixed it.It's screaming now:
I'll give your new commit a try for GGML models.
@65a commented on GitHub (Oct 9, 2023):
Thanks again for testing, I almost never use GGML, so this is great feedback.
@65a commented on GitHub (Oct 9, 2023):
It is a q2k of a 7B, but 85tok/s is pretty nice to see!
@65a commented on GitHub (Oct 9, 2023):
I'm going to try the q2k 7b ggml on a gfx1030 (RX6950XT) in a second, and see if I can get that working.
@deadmeu commented on GitHub (Oct 9, 2023):
Ok testing GGML on
caae3cbab8is interesting. It seems to be using some of the GPU (~30%) but still plenty of CPU. Is this expected, given what you said about the format?@65a commented on GitHub (Oct 9, 2023):
GGML also works at caae3cbab8136350da09c6d6e02c240c3a4db659 for me, though as you can see much slower than ROCm (though gfx1100 is a beast compared to gfx1030)
Also, if you wondered how smart a q2k 7b is, here it is:
@65a commented on GitHub (Oct 9, 2023):
Thanks again @deadmeu for testing and debugging with me, but I think we got it to a good point! I might put a top level build script somewhere, there are a ton of env vars to manage for ROCm/CLBlast.
@deadmeu commented on GitHub (Oct 9, 2023):
No problem - thanks for sticking with me and seeing it through. Glad to have this working, and thanks a ton for your contributions to get it to where it is now. I'm looking forward to seeing how ollama and this space in general grows and am really keen to see ROCm mature more and get the love it deserves from AMD!
@65a commented on GitHub (Oct 9, 2023):
Re: GGML/OpenCL performance, I think it's less optimized, and it's an older copy of llama.cpp codebase, so also less optimized. However, there are a variety of OpenCL drivers including Mesa and ROCm's OpenCL driver, as well as some other ones that might use CPU. You can poke around at clinfo and OpenCL docs, but if it's using some GPU it's probably working...you can imagine why I use GGUF mainly now, given the performance delta.
@deadmeu commented on GitHub (Oct 9, 2023):
It seems like all the models in https://ollama.ai/library are GGML which is how I ended up using them. Could we maybe have those swapped over to GGUF as a preferred default?
@mchiang0610 commented on GitHub (Oct 11, 2023):
@deadmeu all the recent models have been uploaded using GGUF. The ones uploaded before the switch to GGUF are in GGML (for example, llama2 ). We will be uploading GGUF to the library for those models too soon.
@scd31 commented on GitHub (Oct 24, 2023):
I'm running into issues getting this working on my RX580 on Arch Linux.
Here's my output:
And my modelfile:
As far as I can tell it all looks fine, but my CPU is getting hit hard while my GPU remains untouched.
@65a commented on GitHub (Oct 25, 2023):
@scd31 I am not sure if ROCm support Polaris cards still, but it's worth a try. The logs you posted are from a binary compiled without ROCm support, or from fallback after it failed. You should see "Using ROCm" near the failure.
@scd31 commented on GitHub (Oct 25, 2023):
@65a Thanks for your response. A silly question, but where should I see that?
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...has lots of output but no "using rocm" as far as I can tell (checked with grep)go build -tags rocmdoesn't return any output@65a commented on GitHub (Oct 25, 2023):
@scd31 try running
./llm/llama.cpp/gguf/build/rocm/bin/ollama-runnerdirectly, it's the same as llama.cpp server (if you're familiar with it, if not try with --help, it needs -ngl 50 -model /path/to/model at least I think, and has a basic webui).@65a commented on GitHub (Oct 25, 2023):
I have a theory that ROCm decided not to build on an older card, but you could try adjusting the AMDGPU_TARGETS and GPU_TARGETS to include your card and see what happens. Please do try the above command and see if it errors or is accelerated.
@scd31 commented on GitHub (Oct 26, 2023):
The command errors - I have no
rocmfolder inllm/llama.cpp/gguf/build, justcpuandcuda. I tried building withAMDGPU_TARGETS=gfx803 GPU_TARGETS=gfx803 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...andgo build -tags rocmbut unfortunately that folder still doesn't exist. Nothing in the output logs when I grep forrocm. Anything else I can try?@65a commented on GitHub (Oct 28, 2023):
@scd31 I'm assuming you have ROCM and CLblast installed, can you pastebin the output of go generate or something?
@65a commented on GitHub (Oct 28, 2023):
If you don't see:
-- HIP and hipBLAS foundin the generate output, I suspect you don't have it installed, or it's installed at a different path@scd31 commented on GitHub (Nov 4, 2023):
Sorry about the delay! I got busy.
Here's the pastebin output: https://pastebin.com/fEL3Rksi
I see no mention of HIP or hipBLAS so I suspect that's where the issue is.
Here's my
/opt/rocmcontents:And here's
/usr/lib/cmake/CLBlast:Anything there look awry? Apologies for the silly questions, I'm super inexperienced with GPGPU in general.
@65a commented on GitHub (Nov 9, 2023):
It looks fine to me. Ensure you are running
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... && go build -tags rocm ./...on a fresh checkout. Can you try building ggerganov/llama.cpp with ROCm support directly? Does that work? This is really just adding a build of that to ollama, fundamentally.@65a commented on GitHub (Nov 9, 2023):
@scd31
-- Build files have been written to: /home/stephen/src/ollama/llm/llama.cpp/ggml/build/cudais highly suspect, you are running a cuda build. Try a fresh checkout of the pull request code (not ollama HEAD), and make sure you run only the commands listed above. I suspect you are operating on the upstream codebase, given that line.@scd31 commented on GitHub (Nov 9, 2023):
@65a That fixed it - thank you so much! No idea how I pulled down the wrong branch in the first place but it's working great now and I can see it maxing out my GPU. Thanks again for all the help!
@luantak commented on GitHub (Nov 10, 2023):
@65a Here are my logs for a vega 56 running ROCm/rocBLAS#814, https://gist.github.com/lu4p/fbad0b502c070af8295f2b4b0761a888
Indeed seems like its not offloading to the GPU. If you need any additional context feel free to ask.
@65a commented on GitHub (Nov 10, 2023):
Yeah, it's not seeing the card for some reason:
{"timestamp":1699588706,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}Can you run
/opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv? It looks like it's getting some lines but not parsing them or something.@luantak commented on GitHub (Nov 10, 2023):
This is located in:
Running the following produces the same output
I installed https://archlinux.org/packages/extra/x86_64/rocm-smi-lib/
@65a commented on GitHub (Nov 10, 2023):
That looks like a broken ROCm install to me. Uninstall that package, install
rocm-hip-sdkand its dependencies (if not already). The fact it can't find libraries means this is a ROCm installation issue, but that package should do it.@65a commented on GitHub (Nov 10, 2023):
Can you also share
pacman -Q rocm-hip-sdkif it's already installed? Possibly a packaging problem with upstream if you are using arch'stestingrepo, I haven't tried 5.7 from there yet.@luantak commented on GitHub (Nov 10, 2023):
error: package 'rocm-hip-sdk' was not found
should I install it?
@65a commented on GitHub (Nov 10, 2023):
Yes, that should pull all the things necessary, please see https://wiki.archlinux.org/title/GPGPU#ROCm
@luantak commented on GitHub (Nov 10, 2023):
Okay done.
New problem checked out the latest pr and am met with
@65a commented on GitHub (Nov 10, 2023):
You might need to refetch, I pushed over the original commit with better diagnostics when you had the error. I'm not sure if you also need
hip-runtime-amd, you may want to check that package andmiopen-hipare installed, I first set up python ML deps on this machine...I've been meaning to figure out the minimum installed packages for arch or debian to write a dockerfile...@65a commented on GitHub (Nov 10, 2023):
I might have a bad merge of the generate_linux*, checking with a clean build...
@65a commented on GitHub (Nov 10, 2023):
There was a copy-paste error of the cmake directory (fix pushed), but I think your problem was just needing a clean(er) checkout, you can do something like:
This produces a binary at least here.
@luantak commented on GitHub (Nov 10, 2023):
Seems to be building now
@luantak commented on GitHub (Nov 10, 2023):
still the same
Wouldn't it be easier/ more stable to link against the library directly instead of calling the rocm-smi command?
@65a commented on GitHub (Nov 10, 2023):
If
rocm-smiisn't working there's something wrong with your ROCm installation.@65a commented on GitHub (Nov 10, 2023):
Regarding linking, Ollama doesn't really directly link to llama.cpp or any GPU stuff, that's more similar to the LocalAI/ https://github.com/go-skynet/go-llama.cpp approach (I also sent them a ROCm PR which is merged). I'd generally expect it to not work either, if the library loader is having issues running rocm-smi...something weird is happening, because rocm-smi is directly linked to the AMD libraries. Did you install the AMD "drivers" directly from AMD (or anywhere else other than Arch repositories) by any chance? This would overwrite package files on arch and cause issues...if so you may need to reinstall all of the arch packages, and I'm not sure what else. Also check
envfor any overrides of LD_* variables, that would cause weird issues too, depending.@luantak commented on GitHub (Nov 10, 2023):
no normal mesa drivers from arch repos, i uninstalled everything rocm/ hip related (because I probably installed them in a weird order) and then reinstalled by just installing
rocm-hip-sdk, still no luckNo overrides in env.
Maybe a broken package?
@luantak commented on GitHub (Nov 11, 2023):
Got it working on ubuntu now, ollama is actually fun to use now that its fast.
@65a commented on GitHub (Nov 11, 2023):
@lu4p if you ever figure out what was wrong in the arch env, mention it here, I'm sure it could help someone else too (or file bug reports upstream). I was going to guess maybe ROCM_PATH or ROCM_HOME conflicted or were wrong or something, but really glad to see you got it working somewhere.
@luantak commented on GitHub (Nov 11, 2023):
Tried again, I'm convinced that most of rocm is just broken on arch.
Fedora is taking months to try to package rocm, because they probably test if things are actually working.
https://fedoraproject.org/wiki/SIGs/HC
I'm now a boring Ubuntu LTS user, as Ubuntu 23.10 crashes randomly every 30 minutes for me.
@UltraRabbit commented on GitHub (Nov 11, 2023):
@65a I'm using a rocm/pytorch docker image as the base container trying to build the ollama-rocm. As this docker image is based on Ubuntu 20.04, I managed to install libclblast-dev from a ppa repository. However, it seems like there's no /usr/lib/cmake/CLBlast existed as required to be specified in CLBlast_DIR. Could I just ommit this env settings and make "go generate" to detect it automatically?
@paulie-g commented on GitHub (Nov 11, 2023):
This is not accurate. I am using nearly all of rocm successfully on Arch, including ollama. The only problem on Arch that I'm aware of is that the maintainers of the extras/python-pytorch multi-package broke rocm variants 2 months ago and have no intention of working on fixing it, hoping that an update to rocm 5.7 fixes it at some point in the future.
@65a commented on GitHub (Nov 11, 2023):
@lu4p It's actually useful that you tested Ubuntu though, because I hadn't tested that (or Debian Sid) yet, but plan to so I can throw this stuff into k8s with OCI containers. If you want to keep debugging your arch, it might be interesting to see
env(redacted, where appropriate), and make sure your user is in the group that owns the drm devices and /dev/kfd....probably (videoandrender).@paulie-g I developed the patch on Arch and tested across a few different installs so it definitely works on Arch, but who knows if like a different value for LC_COLLATE than I use triggers a bug rocm-smi or something. I also actually filed that bug for pytorch :)
@paulie-g commented on GitHub (Nov 11, 2023):
Yes, on Arch egid in
videoandrenderis a must. Also, env has to have aPATHthat includes/opt/rocm/binandROCM_PATH=/opt/rocm. None of this is done for you on Arch (you are supposed to have sufficient clue in line with Arch principles ;).Embarrassingly, I wasted a huge amount of time trying to get it to work before I realised I missed the fact your patch wasn't committed to the main repo and I needed to pull from 65a/ollama.
Unlikely, but the environment not being right is. We might want to add an env.sh that people can source.
Yeah, I noticed ;) Didn't get them to do anything though. It's especially infuriating because I chose pytorch as a way to test my rocm install originally and took the segfault to mean that my whole set-up is broken. Turns out the maintainers are broken ;) I've honestly never seen a significant package being broken in the main Arch repos that a) passes testing, and b) just remains unfixed in the hope that something in the future automagically unclobbers it. I don't want to press the issue any further because alientating the maintainers won't help. The bug has very few votes because very few users have the wherewithal to debug the coredump, which is the only way to find the bug listing, so they just let it ride.
@luantak commented on GitHub (Nov 11, 2023):
@paulie-g I know but usually when that's the case there is a nice arch wiki page explaining how to configure a software.
Closest I found is
https://github.com/rocm-arch/rocm-arch
which isn't working. I also knew about video, render groups from the AMD docs. I think while troubleshooting I also added the room/bin directory to my path. And ROCM_PATH had been set.
Is there anywhere else where I should've looked for this info?
Can you provide me a list of commands to get rocm working on arch from scratch?
@paulie-g commented on GitHub (Nov 11, 2023):
No, I don't recall doing anything other than installing all the packages and ensuring all the env vars are set and the groups. That repo, iirc, is from before Arch had the packages in the official repos. If you installed anything from there, it's probably a good idea to remove and install from official repos. It's mostly useful for edge cases, trying to get Polaris support working for old cards, some non-standard software for Mi cards etc.
My point is that I, @65a, and others are successfully using rocm on Arch. It is therefore unlikely that 'most of rocm is just broken on arch'. Does your
rocm-bandwidth-testwork, for example? If it does, and shows your card talking to the rest of the system in a sane way, then rocm works for a baseline definition of 'works'.Checking some of your problems earlier in the thread, one problem was that you didn't have
/opt/rocm/bin' in your PATH (and probably noROCM_PATH` either, but I'm not sure that's necessary other than for builds). This isn't Windows, you don't just 'add' it to your env once, this works at best in the one shell process you do it in and its children (not even in the next tab you open in your term emulator). It has to be set for your log-in session, which is dependent on how you log in. I think at least one of the rocm packages might do it for you if you're not doing anything exotic, but then you have to log out and log back in before it takes effect. I am doing something exotic so I can't tell you if it works in a normal DE.@paulie-g commented on GitHub (Nov 11, 2023):
Incidentally, a) try installing
llama-cpp-rocm-gitfrom the AUR and see if it works, and b) some packages, like various rocm things, tell you to do extra things as they install - this needs to be read and done (to your question, this is likely where people read about adding things to their env, groups etc).@iDeNoh commented on GitHub (Nov 13, 2023):
I'm at a loss, I've tried everything that I can think of to get this to generate and build properly for ROCM but so far it just doesn't seem to want to work. I can only assume I'm doing something wrong in the process or maybe my system environment isn't set up correctly because it only generates for CPU. I have rocm 5.7.1 installed, rocm-smi and rocminfo are working fine. I've got CLBlast installed but its not in the location I would have expected, mine for some reason shows up under /usr/local/lib/cmake/CLBlast. I can do the install once more and drop logs if anyone is willing to give me some guidance.
@paulie-g commented on GitHub (Nov 13, 2023):
Are you checking out from the correct repo?
@iDeNoh commented on GitHub (Nov 13, 2023):
I am, yes. My first run was using the command on the ollama website and I noticed that it didn't see that there was a GPU at all, the second run and every attempt after that it registered my GPU as Nvidia stating Nvidia smi failed
@65a commented on GitHub (Nov 13, 2023):
The PR isn't merged yet, you need to be compiling from https://github.com/65a/ollama using the instructions at the top of ROCm/rocBLAS#814
@iDeNoh commented on GitHub (Nov 13, 2023):
Understood, and I did follow the steps that are included in this thread here
@iDeNoh commented on GitHub (Nov 13, 2023):
I should note that I got an error when attempting to run go generate, perhaps this is related? I followed its instructions but I'm not sure if they are correct or not:
`go: github.com/gin-gonic/gin@v1.9.1 requires
`
@luantak commented on GitHub (Nov 13, 2023):
Which distro and version?
Which go version are you using?
You should use a current go version (1.21), if your package manager ships an older version you can easily install it as a snap.
https://snapcraft.io/go
@65a commented on GitHub (Nov 14, 2023):
@iDeNoh what card, distro and please links logs of a build from a clean checkout
@GZGavinZhao commented on GitHub (Nov 14, 2023):
I'd also like to note that please verify locally that the git commits of your git submodules (located in
llm/llama.cpp/{ggml,gguf}align with the commits shown in themaintree here. Sometimes submodules stuff can be messed up and causing you to build an earlier or older version ofggml/ggufthat may be causing trouble only for you.@65a commented on GitHub (Nov 14, 2023):
@GZGavinZhao there's also sometimes a problem I haven't quite pinned down where inference is really, really slow. This is usually cured if you use a completely fresh build on a fresh recursive git clone if possible...the cmake build process isn't that clean, which can probably be fixed in a different PR by ensuring things are fresh each time
go generateis run...@GZGavinZhao commented on GitHub (Nov 14, 2023):
@65a In this case, if they could run the ollama server with the environment variable
AMD_LOG_LEVELset to71 (from here), maybe we can figure out something in the output.Edit: I just tried it and I think the output from 7 is too much.
AMD_LOG_LEVEL=1 ./ollama serveshould give enough logs,@TeddyDD commented on GitHub (Nov 14, 2023):
@65a in regard to your question in ROCm/rocBLAS#814
No, it seems that clean build fixed the issue.
I still can't run
vicuna:13b-v1.5-16k-q5_K_Mdue to OOM error. Same thing happens when I use llama.cpp, but then I can set how many layers to offload. It's weird because I have 16gb of VRAM, this model should fit right in.This might be an upstream issue, though. Logs just in case.
@65a commented on GitHub (Nov 14, 2023):
@TeddyDD any chance you have an AMD iGPU as well? If so I've found I need to use
HIP_VISIBLE_DEVICES=0 ollama serve. SettingAMD_LOG_LEVEL=1and sharing the full log might be interesting as well.@TeddyDD commented on GitHub (Nov 14, 2023):
@65a No, dedicated GPU only. Here is full log with
AMD_LOG_LEVEL=1: log.txt@65a commented on GitHub (Nov 14, 2023):
@TeddyDD I do see
Device memory : required :6710886400 | free :515899392 | total :17163091968, does it run if you reduce the layers some more? It seems like something is already using a lot of your VRAM I guess?@GZGavinZhao commented on GitHub (Nov 14, 2023):
@TeddyDD You may want to install
amdgpu_topand runamdgpu_topin a separate window along side./ollama serveto monitor the programs that are taking up VRAM.@TeddyDD commented on GitHub (Nov 14, 2023):
I run nvtop, my system uses ~500mb of vram on idle (browser takes ~200mb by itself). Rest of that is free to be used by ollama. Perhaps 16k models require more that 16GB?
When running original llama.cpp it works with <= 41 layers, I can't control layers with Ollama AFAIK.
@65a commented on GitHub (Nov 15, 2023):
@TeddyDD look at the model file docs for num_gpu, it lets you override the number of layers offloaded.
@jacobkuzmits commented on GitHub (Nov 15, 2023):
I am trying to get ROCm working and I have an error running go generate on Ubuntu 22.04 with a 6950XT. The command I ran was
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/local/lib/cmake/CLBlast go generate -tags rocm ./.... I changedCLBlast_DIRto/usr/local/lib/cmake/CLBlastbecause/usr/lib/cmake/CLBlastdidn't exist, but I'm not sure if that's even the right dir. The only things in that directory are CLBlastConfig.cmake and CLBlastConfig-noconfig.cmake. Regardless, I had the same error trying it both ways.error log
I am not sure what to do to fix this. apt list --installed shows that I have:
I don't know what I'm doing so I just ignored the error and ran go build anyway. That built without error and ollama was able to run and detect my VRAM, but it was slow so I assume it had to be falling back to CPU. I'll include those logs even though the issue is probably the problem above.
serve/run logs
Sending a prompt results in
@65a commented on GitHub (Nov 16, 2023):
@jacobkuzmits I don't use ubuntu, but the error you have is classic "missing -dev package problems" for Ubuntu, by any chance do you have rocm-dev installed (or whatever equivalent -dev for hip clang++?)
@65a commented on GitHub (Nov 16, 2023):
@lu4p was using Ubuntu as well, and may know how to do this better than I do on Ubuntu
@luantak commented on GitHub (Nov 16, 2023):
@jacobkuzmits
Ubuntu 22.04. Instructions
@makenwaves commented on GitHub (Nov 17, 2023):
Hey @jacobkuzmits, so after a bit of digging on the same problem you were having, I found that:
sudo apt-get install libstdc++-12-devseemed to do the trick. The go compilation still throws a ton of warnings, but it compiled, and it is flying!!. I'm using rocm 5.7 and running a 7900xtx. I'm kinda new to posting so I hope this is welcome:)
@iDeNoh commented on GitHub (Nov 17, 2023):
MSI 6700xt, Ubuntu 22.04 rocm 5.7, and here's install logs.txt
@65a commented on GitHub (Nov 17, 2023):
@iDeNoh a couple things, you have some go errors I don't expect, this may be related to needing to update go on your machine, I think that may be related to the snap thing @lu4p mentioned, but I don't get go mod errors locally. Your
go buildcommand doesn't look like it succeeds for anything but cpu, and thego:builderrors make me think your go version doesn't understand them properly (for ollama or tonic, it looks like?) so it only builds the cpu runner, and at the very end, you are runningollama serveinstead of./ollama serve(the latter would be the result of the build, the former a system-wide install.@iDeNoh commented on GitHub (Nov 17, 2023):
well im not sure, according to snap i have go v1.21.4 installed. im wondering if ive got something wrong with my ubuntu install
@iDeNoh commented on GitHub (Nov 17, 2023):
Ah, found the culprit. I had an old version of go installed (1.16.5) that was superseding the snap version of go, deleted that and reinstalled go via snap and go generate ran without complaining, and go build was able to run successfully, @lu4p you were right, it was absolutely my version of go. fyi for anyone else in my shoes, don't rely on snap.
go versionwill show if you have an older version installed (in my case it was a preinstalled version that came with ubuntu that never got updated/replaced)@GZGavinZhao commented on GitHub (Nov 18, 2023):
@shamb0 I don't know where you installed rocBLAS from, but I checked that as of ROCm 5.7.2 the official
rocblaspackage from AMD hasgfx1010support. If you install ROCm using AMD's instructions, the rocBLAS errors should go away.@shamb0 commented on GitHub (Nov 19, 2023):
Thanks @GZGavinZhao
Can I have pointer to ROCm 5.7.2 release ?
I installed rocBLAS using below command:
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Linux_Install_Guide.html#building-and-installing-rocblas
https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165
https://github.com/RadeonOpenCompute/ROCm/issues/1714
@GZGavinZhao commented on GitHub (Nov 19, 2023):
@shamb0
If you add the repository, you should be able to just runSorry, I confused myself.sudo apt-get install rocblas rocm-hip-librariesand installrocblasand all of its dependencies. You shouldn't need to build from source.gfx1010is already enabled by default, but lazy library loading seems to be unsupported forgfx1010(which is the cause of the error you linked here). To fix this, you either have to build from source with-DTensile_LAZY_LIBRARY_LOADING=OFF, or if you're on Debian 13/Ubuntu 23.10 installlibrocblas-devwith OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. (source)I know that rocBLAS has
gfx1010support because of two reasons:rocblas5.7.0 packages and saw that files like/opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsacoexist.@shamb0 commented on GitHub (Nov 19, 2023):
Thanks a lot for very detailed analysis @GZGavinZhao, I try the suggestions and get back ASAP.
@james-luther commented on GitHub (Nov 24, 2023):
When you are running ollama make sure the user you are running as is a member of both the
videoandrendergroup. If not, you will haverocblaserrors. My regular user is a member of these groups and when running manually things worked great but I failed to add the ollama user to these groups and the service was having the error you mentioned.@wijjj commented on GitHub (Nov 27, 2023):
sorry just checking in: did anybody manage to run ollama using rocm? :)
@james-luther commented on GitHub (Nov 27, 2023):
I am able to and am currently running it on Ubuntu 23.10 Server with 7900XTX.
Follow the instructions to install rocm for your distro and install. With Ubuntu you have some additional groups you need to ensure are setup but that's it.
@bergutman commented on GitHub (Nov 28, 2023):
Howdy! For anyone not running one of AMD's supported distributions (most of us) I have created a docker image containing a copy of CLBlast and @65a's fork of ollama with ROCm support. The image exposes the api on port 11434 but you can also bash into the container if you'd like to access ollama from a terminal. The README explains how to get everything up and running. Cheers! 🎉
https://hub.docker.com/r/bergutman/ollama-rocm
@ignacio82 commented on GitHub (Nov 28, 2023):
Thanks @bergutman . I think I'm doing something wrong. When I transcode a movie with jellyfin I can see clear activity on my gpu using
radeontop:On the other hand when I make a query to llama2 I don't see any activity:
This is what I have in my docker-compose file:
Am I missing something?
@james-luther commented on GitHub (Nov 28, 2023):
Make sure when you run Radeontop you select the correct PCI bus. When I tested it initially I didn't select a bus and it attached to the GPU integrated into my processor (7950X).
When I run
lspci -nn | grep -E 'VGA|Display'From here I run radeontop with
radeontop -b 03:00.0and I see ollama using the card.@ignacio82 commented on GitHub (Nov 29, 2023):
Thanks @james-luther
$ lspci -nn | grep -E 'VGA|Display'
35:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [1002:1681] (rev c7)
Is my problem that my card is not compatible?
@james-luther commented on GitHub (Nov 29, 2023):
@ignacio82 no, that card is compatible with RoCM. Maybe it is a groups/permissions thing with Docker. Have you tried /dev/dri:/dev/dri in your docker-compose instead of attempting to isolate the specific card?
@prawilny commented on GitHub (Dec 2, 2023):
@bergutman, thanks for the help in getting it working, but your image seems to be missing
rocblas-devandhipblas-dev- after I added those, queries started being GPU-accelerated, whereas before the change the build step ofROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...silently failed after not finding BLAS libraries.You can view my configuration at https://github.com/prawilny/ollama-rocm-docker (I tested it with podman and the whole redhatware stack).
@ignacio82 commented on GitHub (Dec 3, 2023):
Yes, that made no difference.
@prawilny is your container on dockerhub?
@prawilny commented on GitHub (Dec 3, 2023):
@ignacio82, no, I didn't push it to any registry, but you can build it from this Dockerfile (note that you may need to remove line
ENV HSA_OVERRIDE_GFX_VERSION=10.3.0which is a workaround specific to my GPU).@ml2s commented on GitHub (Dec 5, 2023):
A quick question:
People say ROCm functions as a translation layer, does that mean I need to install CUDA toolkit on my Linux system in order to use ROCm? If yes, before or after I install ROCm?
@luantak commented on GitHub (Dec 5, 2023):
No.
@markg85 commented on GitHub (Dec 5, 2023):
Hi,
Sorry for "partially hijacking" this thread. It's still very much on-subject but perhaps in the wrong place.
I'm actually trying to reach @65a but he's so smart to hide his mail everywhere. This ping probably reaches him though :)
So i'm trying to build ollama with rocm for archlinux by modifying the Arch PKGBUILD file.
The file i have now looks like:
While compilation seems to get far, it eventually does fail with an error like this:
Besides the obvious compile error, the gfx gpu is wrongly detected too. I have a
gfx1100, not agfx1010.Do you perhaps have a clue on how to solve these issues?
@paulie-g commented on GitHub (Dec 5, 2023):
@markg85 You're using a compile flag that's incompatible with one of {llama,rocm,hip,cuda}. It's erroring earlier, but it'll error later too. There are a number of stack, call flow graph etc protections and similar gimmicks that break. It's likely coming from your makepkg.conf.
The gpu is not detected at all for the purposes of building, otherwise it would be a terrible package that can't be transferred to a different system with a different GPU. Since you didn't take specific steps to make sure it only builds for yours, it's building for all of them and that one just happens to be the first.
@markg85 commented on GitHub (Dec 6, 2023):
Ah, you're totally right @paulie-g!
makepkg.confindeed hadfcf-protectionwhich causes this error. Removing it made the compile proceed much further!It still errors though. This time on an actual undefined variable, or so it seems:
For context, i changed the build part of the script to:
But yeah,
llm/accelerator_none.go:20:12: undefined: errNoGPUkinda kills the compile thoroughly. Is there something i need to do to get that variable to exist?@paulie-g commented on GitHub (Dec 7, 2023):
@markg85 You might try editing the PKGBUILD to pull from the repo that actually has ROCm support ;) You need to pull from @65a 's repo, his ROCm support pull isn't merged into mainline yet. Not sure if that's the cause of that specific problem (and not sure why you've edited out the CGO flags), but it would be a good start.
@markg85 commented on GitHub (Dec 7, 2023):
@paulie-g lol 😂
Have another look at the PKGBUILD I posted a couple posts back. I am pulling that repo you mention. In fact, the error I'm getting is on a file that the @65a repo has, this one doesn't.
I changed the flags to be the same as the first post in this thread. There's no special other reason besides just mimicking what others in this thread did.
@paulie-g commented on GitHub (Dec 8, 2023):
@markg85 Missed that, it's not me thread ;)
@deftdawg commented on GitHub (Dec 9, 2023):
@prawilny running your image from straight podman gives an error about
/run/podman-initbeing missing (it's not present inside my container image; maybe it's somethingpodman composeadds, idk)In any case, I can get it to run by overriding endpoint and then running the CMD command... It works on my GPU (6900XT).
Here are the steps, I'm running on a NixOS host:
EDIT: fixed ~/.ollama not being writable with chmod 777
@prawilny commented on GitHub (Dec 9, 2023):
@deftdawg,
/run/podman-initis provided bypodman run's--initflag.@zevv commented on GitHub (Dec 11, 2023):
fwiw, I was able to get AMD acceleration working on the first try on my 6600:
-tags rocmHSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve@markg85 commented on GitHub (Dec 11, 2023):
That trick - forcing the HSA version - is a trick that works for RDNA2 GPUs. So your 6600 gpu is perfectly happy with that.
It's not an option for RDNA3 GPUs (anything starting in the 7xxx-series). I know cause i tried.
@65a commented on GitHub (Dec 14, 2023):
Note, I will close and delete my branch (the patch in ROCm/rocBLAS#814) when ROCm/rocBLAS#1146 merges, as it effectively contains my patch. I am not intending to maintain a fork, and I actually am using a different approach than an ollama server now locally. Anyone building directly from my patch should fork or move to 1146 when it is merged.
@paulie-g commented on GitHub (Dec 15, 2023):
Thank you for all the work on this.
Would you mind elaborating? Curious to see what you are using instead (can be as short as you like).
@65a commented on GitHub (Dec 18, 2023):
@paulie-g Nothing special, just a local shim linking to libllama instead, similar to where Ollama is going and what go-skynet did (both of which are probably better), just can get more opinionated (aka hardcode) things like tokenizer stack and settings, mainly this is all just for learning for me, so getting to poke around libllama and understand it is interesting. The CGO linked Ollama is looking good, so I'd definitely run that for a real use case (especially when CGO is involved, more eyes is better) and having an API abstraction can be a good thing for resiliency.
@Wintoplay commented on GitHub (Dec 19, 2023):
Thank to @deadmeu Ollama run on my Amd GPU. However, Shader clock stuck at 125% as long as the terminal run the serving model. Is that ok? or how can I fix it?
Is it because I use rocm 5.7.0?
@ReOT20 commented on GitHub (Dec 19, 2023):
Tried building on Ubuntu 22.04 LTS with ROCm 5.4.2 and RX 5700 XT masked as gfx1030. I am getting this after lots of warnings while trying to build dependencies: https://pastebin.com/VWEaBMqG
Exactly the same thing with ROCm/rocBLAS#1146
@ignacio82 commented on GitHub (Dec 24, 2023):
I finally got around creating the image using this Dockerfile but I don't think it is using my GPU, or at least I don' t see any activity on radeontop. Any idea for why or how to debug this?
Another thing I noticed is that the webui does not work anymore.
How can I fix that?
@deftdawg commented on GitHub (Dec 24, 2023):
What card are you using? What command line to start?
Webui doesn't work because ollama picks random port numbers inside your container that you didn't publish at container start time... not yet sure how to fix that (I assume there's a variable or option somewhere).
@ignacio82 commented on GitHub (Dec 24, 2023):
AMD Ryzen 9 6900HX(Up to 4.9GHz), Radeon 680M Graphics,8C/16T Micro Computer, 32GB DDR5 512GB PCIe4.0 SSD
Thanks for the help!
@andy-shi88 commented on GitHub (Dec 26, 2023):
I'm on
I build the binary from
mainbranchI'm getting
Error: could not connect to ollama server, run 'ollama serve' to start itwhen I runollama run mistralThis is the log in
journalctl -ru ollamalooks like it crashed and restarted the service at some point whenLazy loading ...is there some flag I need to set for this to work?
edit:
I'm able to run it if I run
ollama servethenollama run mistralbut fail if
ollama serveis ran from systemd service@andy-shi88 commented on GitHub (Dec 26, 2023):
oh just realized I set
HSA_OVERRIDE_GFX_VERSION=10.3.0in my~/.zshrcand I just need to setEnvironment=" HSA_OVERRIDE_GFX_VERSION=10.3.0"in the service file.It works perfectly now on my rx 6700 xt, Thank you!
@oatmealm commented on GitHub (Dec 30, 2023):
I'm having similiar issues on fedora 38:
Note the
/opt/rocm/lib/libamdhip64.so.6: versionhip_6.0' not found` I've tried symlinking, this dynamic lib and others it was complaining about, but no luck.rocminfo, rocm-smi etc. are all available on the path and working:
@light-on-shadow commented on GitHub (Dec 30, 2023):
@oatmealm I'm on Fedora 39, the only way I could make it all work is by purging anything ROCM from prior trials, adding version 5.6 from the RHEL 9.3 official repo, then installing the ROCM SDK from there.
Fedora 39
6950XT
@oatmealm commented on GitHub (Dec 30, 2023):
@light-on-shadow oh good. I was avoiding upgrading since I thought 39 was not supported... didn't know rhel 9.3 is compatible with f39
@light-on-shadow commented on GitHub (Dec 30, 2023):
@oatmealm RHEL 9 is based on CentOS Stream 9, which is based on Fedora 34.
ROCM is officially supported on Ubuntu, RHEL and OpenSUSE, so anything you do outside of the requirements is with a caveat anyway.
@jayk commented on GitHub (Jan 1, 2024):
Question! I have ollama running on my Manjaro linux machine with a 7900xtx and it works great. (git version as of 2023-12-25) However, I notice that once I run any model, the GPU stays pegged, consuming ~135 watts, even if I am no longer running any model.
I would have expected that once it's not running a model anymore, it would drop, but it doesn't. It doesn't drop until I kill
ollama servealtogether.Is this expected behavior with ollama or is it something specific to rocm support?
@65a commented on GitHub (Jan 2, 2024):
Run Ollama with
GPU_MAX_HW_QUEUES=1 /path/to/ollama, or otherwise set it in Ollama's environment. This bug is upstream of Ollama, and has to do with how HIP works vs CUDA. It would be fine for Ollama to add code likeos.Setenv("GPU_MAX_HW_QUEUES","1")before calling into C code, as this solves the issue as well without the user having to do anything. The best solution is probably at the HIP layer, or less ideally, some ifdefs in llama.cpp or something.This took my W7900 from a 99W idle to 18W. Graphics clocks will drop when not inferencing as expected (it's not clocked as hard by default as a 7900XTX). I couldn't determine any performance impact, still seeing 60tok/s on short prompts with 7b mistral.
@oatmealm commented on GitHub (Jan 2, 2024):
Trying to install from the RHEL 9.3 repo but I'm not sure how to set it up. Are you installing from the script or package manager? From script I'm getting [in F39]
@light-on-shadow commented on GitHub (Jan 2, 2024):
Yes from the package manager : sudo dnf install rocm-hip-sdk[...]
@oatmealm commented on GitHub (Jan 2, 2024):
And which version of the amdgpu-dkms I should install? Only 6.0 seems available for rhel 9.3 and the installation fails for me with some error in the install script.
@jayk commented on GitHub (Jan 2, 2024):
This worked great. Thank you. I set the variable and now I'm down to 15W which is much happier than the 125W I was pegged at before. And my room is cooler. 😉 Thanks!!!
@oatmealm commented on GitHub (Jan 4, 2024):
Made some progress with my setup on Fedora 39 and rocm 5.6 it started compiling but ends like this:
Checking the libraries I see, etc. Any idea what could be the problem?
@light-on-shadow commented on GitHub (Jan 4, 2024):
I installed this I believe rocm-hip-sdk5.6.0-5.6.0.50600-67.el9.x86_64
When I run a locate on that lib on my setup I get this :
Seems like it's in the rocsparse/rocsparse-devel package which you should get with the SDK.
@oatmealm commented on GitHub (Jan 4, 2024):
Yes. It's installed and ld can see it but still the last step when gcc is run fails ...
@oatmealm commented on GitHub (Jan 4, 2024):
Ok, problem solved by symlinking rocm-5.6.0 to rocm instead of using ROCM_PATH ...
Now that it's running, all models seem to fail... with the following:
@light-on-shadow commented on GitHub (Jan 4, 2024):
I've had a lot of issues in the past because of my CPU's integrated GPU, try to disable it.
But the errors seems to point to something else, again it cannot fetch a lib.
@oatmealm commented on GitHub (Jan 5, 2024):
Hardware wise though it should be ok though? I was trying to check if it's currently supported or not...
@ms178 commented on GitHub (Jan 6, 2024):
I tried to build ollama 0.1.18 release on Arch with the provided PKGBUILD, but even though I've tried various GCC and Clang compiler flags and ROCm versions (5.7.1 and 6.0), I still end up with:
/usr/bin/ld: gguf/build/linux/rocm/lib/libext_server.a: member gguf/build/linux/rocm/lib/libext_server.a(ext_server.cpp.o) in archive is not an objectin the installation phase.Any ideas how to fix this?
@deadmeu commented on GitHub (Jan 7, 2024):
Now that the changes in ROCm/rocBLAS#1146 have flowed through to a release and is being shipped in the Arch repo I tried it out to see how it would go but unfortunately am having some issues:
ollamapackage installed, my GPU is not detectedgpu.go:45: ROCm not detected: Unable to load librocm_smi64.so library to query for Radeon GPUs: /opt/rocm/lib/librocm_smi64.so: cannot open shared object file: No such file or directory.rocm-smi-libpackage my GPU was being picked up (Radeon GPU detected) but running a model would not use my GPU.@jameshulse commented on GitHub (Jan 9, 2024):
@oatmealm
I think this is fixed by setting the environment variable to use the older GFX version:
HSA_OVERRIDE_GFX_VERSION=10.3.0. You can set it in your systemd config, or if you are runningollama servedirectly then prefix it likeHSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve. There is also another env variableHCC_AMDGPU_TARGET=gfx1030which could help.@oatmealm commented on GitHub (Jan 9, 2024):
I've tried the first one before, but wasn't sure which version I'm supposed to use. Know that it's running, I'm not seeing the output indicating it fond rocm and when running, I see this: "Not compiled with GPU offload support," ...?!!
@oatmealm commented on GitHub (Jan 9, 2024):
Ok, sorry. It's fine with
HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx90c ./ollama serv... the gpu I actually have. It seems to work and there's some activitiy shown in radeontop, but cpu goes to 50%+ all the time...@tecosaur commented on GitHub (Jan 10, 2024):
I'm running openSUSE Tumbleweed, and once I got
libffi7was able to install the SLE libraries. Following the docs, I've ended up with the following installed:I've then run
ollama serve(ollama 0.1.19) like so:Attempting to run a model though, I see that while the GPU is detected, inference falls back to the CPU.
Looking in
/opt/rocm/lib, I seelibhipblas.so.2notlibhipblas.so.1. This is having installed ROCm 6.0.I don't suppose anybody might have any ideas about this?
@jameshulse commented on GitHub (Jan 10, 2024):
@tecosaur
I'm not an expert by any means, but my guess is you have a more recent version of libhipblas than is expected. I wonder if you can downgrade?
EDIT: I've just seen you installed ROCm 6.0. I got it working with an older version : 5.6.0 I believe.
@ms178 commented on GitHub (Jan 10, 2024):
@tecosaur See: https://github.com/jmorganca/ollama/pull/1819
ROCm 5 and 6 are incompatible to one another. Ollama expects ROCm 5 at the moment. The linked Pull request will make it compatible with ROCm 6.
@tecosaur commented on GitHub (Jan 10, 2024):
Ah interesting, I wasn't aware that the initial support was for ROCm 5.6 only. For what it's worth, the sym links "hacky fix" seems to be working my end.
@tecosaur commented on GitHub (Jan 11, 2024):
I've noticed something funny when I'm not actively running inference with any model.
Restarting the ollama service immediately takes me from (3) to (1), but I'm find it strange that (3) happens in the first place.
@tecosaur commented on GitHub (Jan 12, 2024):
Oh, and my computer also refuses to go to sleep while the GPU is in state (2).
@65a commented on GitHub (Jan 12, 2024):
@tecosaur try setting
GPU_MAX_HW_QUEUESto1in your environment.If Ollama wanted to, they could do something like:
on the ROCm path, but I should probably just send a PR to llama.cpp at this point. It's a problem with the the way HIP slams the ROCm scheduler with queues, which seems to be broken.
@dhiltgen commented on GitHub (Jan 20, 2024):
The pre-release for 0.1.21 is up now, and we've made various improvements to support ROCm cards, covering both v5 and v6 of the ROCm libraries. You'll have to install ROCm, but then the Ollama binary should work.
Please let us know if you run into any problems.
@dhiltgen commented on GitHub (Jan 20, 2024):
@tecosaur this is a known issue tracked via #1848
@babariviere commented on GitHub (Jan 20, 2024):
@dhiltgen will the docker image support rocm? Or do we need to make our own dockerfile for this?
@dhiltgen commented on GitHub (Jan 20, 2024):
We haven't added it to our image Dockerfile yet.
@babariviere commented on GitHub (Jan 21, 2024):
Good to know thanks. 😄 Hope it will come one day!
@dhiltgen commented on GitHub (Jan 21, 2024):
@babariviere #2127 once merged will add rocm support to our official container image.
@zaskokus commented on GitHub (Jan 22, 2024):
@dhiltgen 1.21 keeps using CPU only for me. I've latest packages from archlinux, running on rdna3. am I missing something? (the model I've used is
tinyllamaif that makes any difference)edit: I used the binary from release section:
@0xdeafbeef commented on GitHub (Jan 22, 2024):
exporting
LD_LIBRARY_PATH=/usr/lib64/will fix lib search
append to
/etc/systemd/system/ollama.serviceEnvironment='LD_LIBRARY_PATH=/usr/lib64/'into
[Service]section.Than
UPD:
but it anyway doesn't use gpu :)
@mnn commented on GitHub (Jan 22, 2024):
I am getting
not enough vram available, falling back to CPU only(in both - compiled main6225fde046and the binary from release section here). I tried with dolphin-mistral-q4_0 and tinyllama. I am quite sure 1B model should fit in VRAM of 7900XTX. Maybe I am doing something wrong, I tried a bunch of env. vars and I don't think they affect anything (it looks like it doesn't even try to use GPU; I remember with Stable Diffusion similar sounding wrong env. vars. lead to segfaults or pc freezes).logs
@adham-omran commented on GitHub (Jan 23, 2024):
Thank you for your work and efforts
I'm running into issue on Arch Linux
Issues
When I run
HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1032 ./ollama-linux-amd64 serveIt returnsLog
However when I run
./ollama-linux-amd64 run llama2I getLog
With a bash shell
Bash Shell Log
System Details
/opt/rocm/bin/rocm-smipacman -Q | grep 'hip\|rocm'@hiepxanh commented on GitHub (Jan 23, 2024):
It only support rx 6800 and 7600 as ROCm home page mention, try to using llamafile can help you because I also have rx 6600
@dhiltgen commented on GitHub (Jan 23, 2024):
It looks like multiple people are hitting this
free(): invalid pointerproblem - I've opened up a new ticket to track the resolution of that #2165@dhiltgen commented on GitHub (Jan 23, 2024):
@adham-omran can you clarify the scenario when you override with
gfx1032? Does the server "work" and are you able to run models on the GPU?@dhiltgen commented on GitHub (Jan 24, 2024):
@mnn keep an eye on PR #2162 which may fix or at the very least help us troubleshoot why VRAM discovery isn't working on your system
@dhiltgen commented on GitHub (Jan 24, 2024):
@0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup?
@mnn commented on GitHub (Jan 24, 2024):
@dhiltgen I have tried the new logging (used commit
f63dc2db5c), but I think it crashed on NVidia-related code before it got to AMD (just my guess, I know close to nothing about Go or GPU stuff). It behaves same without env. vars.[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] GET / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers)
time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/server/routes.go:943 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/llm/payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/llm/payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx2 rocm_v1 cpu cpu_avx cuda_v12]"
time=2024-01-24T08:16:53.664+01:00 level=DEBUG source=/mnt/dev/ai/ollama/llm/payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:91 msg="Detecting GPU type"
time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:210 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-01-24T08:16:53.665+01:00 level=DEBUG source=/mnt/dev/ai/ollama/gpu/gpu.go:228 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ml.so /usr/lib/wsl/lib/libnvidia-ml.so /usr/lib/wsl/drivers//libnvidia-ml.so /opt/cuda/lib64/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /mnt/dev/ai/ollama/libnvidia-ml.so*]"
time=2024-01-24T08:16:53.673+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:256 msg="Discovered GPU libraries: [/opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]"
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:
You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Linked to libnvidia-ml library at wrong path : /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
SIGSEGV: segmentation violation
PC=0x7fd014406300 m=14 sigcode=1
signal arrived during cgo execution
goroutine 1 [syscall]:
runtime.cgocall(0x9b4490, 0xc0004598a8)
/usr/lib/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000459880 sp=0xc000459848 pc=0x409b0b
github.com/jmorganca/ollama/gpu._Cfunc_cuda_init(0x7fcfd8000cb0, 0xc00003c300)
_cgo_gotypes.go:254 +0x3f fp=0xc0004598a8 sp=0xc000459880 pc=0x7b945f
github.com/jmorganca/ollama/gpu.LoadCUDAMgmt.func2(0xc00003a0d0?, 0x38?)
/mnt/dev/ai/ollama/gpu/gpu.go:266 +0x4a fp=0xc0004598e8 sp=0xc0004598a8 pc=0x7bb22a
github.com/jmorganca/ollama/gpu.LoadCUDAMgmt({0xc00052c000, 0x3, 0xc0000f2420?})
/mnt/dev/ai/ollama/gpu/gpu.go:266 +0x1b8 fp=0xc000459988 sp=0xc0004598e8 pc=0x7bb0f8
github.com/jmorganca/ollama/gpu.initGPUHandles()
/mnt/dev/ai/ollama/gpu/gpu.go:94 +0xd1 fp=0xc0004599f0 sp=0xc000459988 pc=0x7b98b1
github.com/jmorganca/ollama/gpu.GetGPUInfo()
/mnt/dev/ai/ollama/gpu/gpu.go:119 +0xb5 fp=0xc000459b00 sp=0xc0004599f0 pc=0x7b9a75
github.com/jmorganca/ollama/gpu.CheckVRAM()
/mnt/dev/ai/ollama/gpu/gpu.go:192 +0x1f fp=0xc000459ba8 sp=0xc000459b00 pc=0x7ba75f
github.com/jmorganca/ollama/server.Serve({0x19e64470, 0xc000024040})
/mnt/dev/ai/ollama/server/routes.go:965 +0x45f fp=0xc000459c98 sp=0xc000459ba8 pc=0x999b3f
github.com/jmorganca/ollama/cmd.RunServer(0xc000438300?, {0x1a2a87a0?, 0x4?, 0xaccea1?})
/mnt/dev/ai/ollama/cmd/cmd.go:690 +0x199 fp=0xc000459d30 sp=0xc000459c98 pc=0x9abf59
github.com/spf13/cobra.(*Command).execute(0xc0003f1500, {0x1a2a87a0, 0x0, 0x0})
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc000459e68 sp=0xc000459d30 pc=0x763c9c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003f0900)
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000459f20 sp=0xc000459e68 pc=0x7644c5
github.com/spf13/cobra.(*Command).Execute(...)
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
/mnt/dev/ai/ollama/main.go:11 +0x4d fp=0xc000459f40 sp=0xc000459f20 pc=0x9b2fcd
runtime.main()
/usr/lib/go/src/runtime/proc.go:267 +0x2bb fp=0xc000459fe0 sp=0xc000459f40 pc=0x43e1bb
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000459fe8 sp=0xc000459fe0 pc=0x46e081
goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006cfa8 sp=0xc00006cf88 pc=0x43e60e
runtime.goparkunlock(...)
/usr/lib/go/src/runtime/proc.go:404
runtime.forcegchelper()
/usr/lib/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006cfe0 sp=0xc00006cfa8 pc=0x43e493
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x46e081
created by runtime.init.6 in goroutine 1
/usr/lib/go/src/runtime/proc.go:310 +0x1a
goroutine 18 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000068778 sp=0xc000068758 pc=0x43e60e
runtime.goparkunlock(...)
/usr/lib/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
/usr/lib/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc0000687c8 sp=0xc000068778 pc=0x42a57f
runtime.gcenable.func1()
/usr/lib/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000687e0 sp=0xc0000687c8 pc=0x41f6c5
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000687e8 sp=0xc0000687e0 pc=0x46e081
created by runtime.gcenable in goroutine 1
/usr/lib/go/src/runtime/mgc.go:200 +0x66
goroutine 19 [GC scavenge wait]:
runtime.gopark(0x8b2ee4?, 0x7ed898?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000068f70 sp=0xc000068f50 pc=0x43e60e
runtime.goparkunlock(...)
/usr/lib/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x1a278b20)
/usr/lib/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000068fa0 sp=0xc000068f70 pc=0x427de9
runtime.bgscavenge(0x0?)
/usr/lib/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000068fc8 sp=0xc000068fa0 pc=0x428399
runtime.gcenable.func2()
/usr/lib/go/src/runtime/mgc.go:201 +0x25 fp=0xc000068fe0 sp=0xc000068fc8 pc=0x41f665
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000068fe8 sp=0xc000068fe0 pc=0x46e081
created by runtime.gcenable in goroutine 1
/usr/lib/go/src/runtime/mgc.go:201 +0xa5
goroutine 20 [finalizer wait]:
runtime.gopark(0x198?, 0xac5e60?, 0x1?, 0xf7?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006c620 sp=0xc00006c600 pc=0x43e60e
runtime.runfinq()
/usr/lib/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006c7e0 sp=0xc00006c620 pc=0x41e6e7
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e081
created by runtime.createfing in goroutine 1
/usr/lib/go/src/runtime/mfinal.go:163 +0x3d
goroutine 21 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000069750 sp=0xc000069730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0000697e0 sp=0xc000069750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000697e8 sp=0xc0000697e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 22 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000069f50 sp=0xc000069f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000069fe0 sp=0xc000069f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000069fe8 sp=0xc000069fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 3 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc52a?, 0x1?, 0xa0?, 0x31?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006d750 sp=0xc00006d730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006d7e0 sp=0xc00006d750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 34 [GC worker (idle)]:
runtime.gopark(0x59ab3bf2720?, 0x3?, 0xc6?, 0x18?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000488750 sp=0xc000488730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004887e0 sp=0xc000488750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004887e8 sp=0xc0004887e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 35 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc399?, 0x1?, 0x5b?, 0x9c?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000488f50 sp=0xc000488f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000488fe0 sp=0xc000488f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000488fe8 sp=0xc000488fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 4 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc584?, 0x1?, 0xae?, 0x65?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006df50 sp=0xc00006df30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006dfe0 sp=0xc00006df50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006dfe8 sp=0xc00006dfe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 36 [GC worker (idle)]:
runtime.gopark(0x59ab3bf2e04?, 0x3?, 0x3a?, 0x23?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000489750 sp=0xc000489730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004897e0 sp=0xc000489750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004897e8 sp=0xc0004897e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 37 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc52a?, 0x1?, 0xba?, 0x4d?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000489f50 sp=0xc000489f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000489fe0 sp=0xc000489f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 23 [GC worker (idle)]:
runtime.gopark(0x59ab3bdd079?, 0x3?, 0xbc?, 0x18?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006a750 sp=0xc00006a730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006a7e0 sp=0xc00006a750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 38 [GC worker (idle)]:
runtime.gopark(0x59ab3bf29ab?, 0x1?, 0x8c?, 0x6d?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00048a750 sp=0xc00048a730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00048a7e0 sp=0xc00048a750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00048a7e8 sp=0xc00048a7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 5 [GC worker (idle)]:
runtime.gopark(0x59ab008fefa?, 0x1?, 0x58?, 0x52?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006e750 sp=0xc00006e730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006e7e0 sp=0xc00006e750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 6 [GC worker (idle)]:
runtime.gopark(0x59ab3bdbc66?, 0x1?, 0x3a?, 0x21?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006ef50 sp=0xc00006ef30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006efe0 sp=0xc00006ef50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 7 [GC worker (idle)]:
runtime.gopark(0x59ab3bf269e?, 0x1?, 0xf5?, 0x46?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006f750 sp=0xc00006f730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006f7e0 sp=0xc00006f750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 8 [GC worker (idle)]:
runtime.gopark(0x59ab3bf273e?, 0x3?, 0xc1?, 0xa1?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006ff50 sp=0xc00006ff30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006ffe0 sp=0xc00006ff50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 9 [GC worker (idle)]:
runtime.gopark(0x1a2aa4e0?, 0x1?, 0xba?, 0x61?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000484750 sp=0xc000484730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004847e0 sp=0xc000484750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004847e8 sp=0xc0004847e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 10 [GC worker (idle)]:
runtime.gopark(0x59ab3bf30d5?, 0x3?, 0xa5?, 0x32?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000484f50 sp=0xc000484f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000484fe0 sp=0xc000484f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000484fe8 sp=0xc000484fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c
goroutine 39 [select, locked to thread]:
runtime.gopark(0xc00048b7a8?, 0x2?, 0xa9?, 0xe8?, 0xc00048b7a4?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00048b638 sp=0xc00048b618 pc=0x43e60e
runtime.selectgo(0xc00048b7a8, 0xc00048b7a0, 0x0?, 0x0, 0x0?, 0x1)
/usr/lib/go/src/runtime/select.go:327 +0x725 fp=0xc00048b758 sp=0xc00048b638 pc=0x44e165
runtime.ensureSigM.func1()
/usr/lib/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00048b7e0 sp=0xc00048b758 pc=0x46519f
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00048b7e8 sp=0xc00048b7e0 pc=0x46e081
created by runtime.ensureSigM in goroutine 1
/usr/lib/go/src/runtime/signal_unix.go:997 +0xc8
goroutine 24 [syscall]:
runtime.notetsleepg(0x0?, 0x0?)
/usr/lib/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0004527a0 sp=0xc000452768 pc=0x411209
os/signal.signal_recv()
/usr/lib/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0004527c0 sp=0xc0004527a0 pc=0x46aa49
os/signal.loop()
/usr/lib/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0004527e0 sp=0xc0004527c0 pc=0x6f3913
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004527e8 sp=0xc0004527e0 pc=0x46e081
created by os/signal.Notify.func1.1 in goroutine 1
/usr/lib/go/src/os/signal/signal.go:151 +0x1f
goroutine 25 [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000452f18 sp=0xc000452ef8 pc=0x43e60e
runtime.chanrecv(0xc0001538c0, 0x0, 0x1)
/usr/lib/go/src/runtime/chan.go:583 +0x3cd fp=0xc000452f90 sp=0xc000452f18 pc=0x40beed
runtime.chanrecv1(0x0?, 0x0?)
/usr/lib/go/src/runtime/chan.go:442 +0x12 fp=0xc000452fb8 sp=0xc000452f90 pc=0x40baf2
github.com/jmorganca/ollama/server.Serve.func1()
/mnt/dev/ai/ollama/server/routes.go:952 +0x25 fp=0xc000452fe0 sp=0xc000452fb8 pc=0x999c05
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000452fe8 sp=0xc000452fe0 pc=0x46e081
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
/mnt/dev/ai/ollama/server/routes.go:951 +0x407
rax 0x7fcfd8001d40
rbx 0x9
rcx 0x1a
rdx 0x1a
rdi 0x7fcfe7ffebe0
rsi 0x100
rbp 0x7fcfe7ffee00
rsp 0x7fcfe7ffebd8
r8 0x64
r9 0x0
r10 0x7fd065a30e58
r11 0x7fd065ac13c0
r12 0xc00003c300
r13 0x7fcfe7ffedc0
r14 0x7fcfe7ffebe0
r15 0xc00003c370
rip 0x7fd014406300
rflags 0x10206
cs 0x33
fs 0x0
gs 0x0
I'll add more info, because I am not entirely sure if only some specific ROCm version is supported (I have an older one, because it works with other software well), nor I am sure if my env. vars are correct (that GFX_VERSION should be, I am using it with SD WebUI, ComfyUI and Text Generation WebUI).
❯ yay -Q | grep -Pi rocm ─╯ rocm-clang-ocl 5.6.1-1 rocm-cmake 5.6.1-1 rocm-core 5.6.1-1 rocm-device-libs 5.6.1-1 rocm-hip-libraries 5.6.1-1 rocm-hip-runtime 5.6.1-1 rocm-hip-sdk 5.6.1-1 rocm-language-runtime 5.6.1-1 rocm-llvm 5.6.1-1 rocm-ml-libraries 5.6.1-1 rocm-ml-sdk 5.6.1-1 rocm-opencl-runtime 5.6.1-1 rocm-smi-lib 5.6.1-1 rocminfo 5.6.1-1 ❯ /opt/rocm/bin/rocminfo ─╯ ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 7800X3D 8-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7800X3D 8-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5050 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 64937188(0x3dedce4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 64937188(0x3dedce4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 64937188(0x3dedce4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1100 Uuid: GPU-df09d9133148a62b Marketing Name: AMD Radeon RX 7900 XTX Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 6144(0x1800) KB L3: 98304(0x18000) KB Chip ID: 29772(0x744c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2526 BDFID: 768 Internal Node ID: 1 Compute Unit: 96 SIMDs per CU: 2 Shader Engines: 6 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ******* Agent 3 ******* Name: gfx1036 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 2 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 256(0x100) KB Chip ID: 5710(0x164e) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2200 BDFID: 6656 Internal Node ID: 2 Compute Unit: 2 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 524288(0x80000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 524288(0x80000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1036 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done ***@adham-omran commented on GitHub (Jan 24, 2024):
No, I am unable to run models with my 6600 even with override.
@0xdeafbeef commented on GitHub (Jan 24, 2024):
It fails because fedora doesn't yet ship hipblas. I'll later check using docker
@0xdeafbeef commented on GitHub (Jan 24, 2024):
after symlinking
ln -s /usr/lib64/libhipblas.so.2 /usr/lib64/libhipblas.so.1stucks here@dhiltgen commented on GitHub (Jan 24, 2024):
@mnn Do you have any NVIDIA cards in your system? If not, as a workaround, you could uninstall the CUDA libraries so we don't try to probe for CUDA cards, but ollama shouldn't crash like that. I'll add some more verbose logging in that cuda_init routine so we can try to understand why it's crashing and fix the bug so it continues on gracefully.
@dhiltgen commented on GitHub (Jan 24, 2024):
@0xdeafbeef we haven't pushed an updated official image yet, but I've pushed an image to
dhiltgen/ollama:0.1.21-rc3which you could try with something like:Also note that recent builds support both ROCm v6 and v5, if that helps get the necessary dependencies on your system, so you shouldn't have to symlink different versions.
@0xdeafbeef commented on GitHub (Jan 24, 2024):
Fails with null pointer :(
@dhiltgen commented on GitHub (Jan 24, 2024):
@0xdeafbeef it looks like it got past the detection logic and was deep in llama.cpp/ROCm when things went bad. I'm wondering if this is due to us not targeting your specific GPU processor. I can build a test container image with different GPU targets to try out. Can you share what type of Radeon card you have?
@0xdeafbeef commented on GitHub (Jan 24, 2024):
@dhiltgen commented on GitHub (Jan 24, 2024):
It looks like we're already building with that target so my theory doesn't work. There's something else going wrong.
Until we can figure this one out, you can force it to run on the CPU with
OLLAMA_LLM_LIBRARY="cpu_avx2"@dixonl90 commented on GitHub (Jan 24, 2024):
@dhiltgen thanks for making an image. It also crashes for me when I try and send a prompt.
@dhiltgen commented on GitHub (Jan 24, 2024):
@0xdeafbeef can you share the output of
rocm-smi --showdriverversion --showproductname --showhwandrocm-smi -V@dixonl90 can you confirm the crash looks similar to https://github.com/ollama/ollama/issues/738#issuecomment-1908674324 ?
I've pushed an updated image
dhiltgen/ollama:0.1.21-rc4which has some more debug logging enabled. For these systems where you're seeing a crash when trying to send a prompt, could you share the log output before sending a prompt?The image I've pushed is compiled with ROCm v6, so if the host is v5, maybe that's a possible cause. I'll try to build a v5 based image and push that as well to see if that might yield a working setup.
@dhiltgen commented on GitHub (Jan 24, 2024):
I've pushed up
dhiltgen/ollama:0.1.21-rocmv5to explore if this is a rocm/driver mismatch in the container. On my test setup with a v6 host, the v5 container works, but maybe the reverse isn't true.@zaskokus commented on GitHub (Jan 24, 2024):
@dhiltgen https://nopaste.net/9UBIqwB41B
@0xdeafbeef commented on GitHub (Jan 24, 2024):
@zaskokus commented on GitHub (Jan 24, 2024):
@0xdeafbeef
@dhiltgen commented on GitHub (Jan 24, 2024):
@zaskokus I think you've hit #2054 - We don't have a fix for that yet, but hopefully the workaround noted on that issue will work for you so you can force ollama to run just on the discrete GPU and ignore your iGPU.
@0xdeafbeef commented on GitHub (Jan 24, 2024):
I have igpu, but it should be disabled
@dhiltgen commented on GitHub (Jan 24, 2024):
@0xdeafbeef in that case, I wonder if you've also hit #2054. If we're reporting
discovered 2 ROCm GPU Devicesin debug mode, you might want to try the workaround to force it to only use the discrete GPU.@dixonl90 commented on GitHub (Jan 25, 2024):
@dhiltgen using your latest image, I get the following error. Looks to be different from above?
@dixonl90 commented on GitHub (Jan 25, 2024):
Actually, running the dhiltgen/ollama:0.1.21-rocmv5 image works. Although the output from a prompt is just hashes:
@kescherCode commented on GitHub (Jan 25, 2024):
The host ROCm versions do not matter. All that matters is the ROCm version within the container.
The root cause:
rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1032(in the prior log by @dixonl90, it saysgfx803instead)My system uses a 6600 XT (
gfx1032). The crash is clearly caused by rocBLAS not supporting it directly. However, adding "-e HSA_OVERRIDE_GFX_VERSION=10.3.0", thereby making ROCm assume my card is agfx1030(which would stand for a 6900 XT) makes it function. The reason for this is that the ISAs from gfx1030 through gfx1035 are identical.This crash is partially caused by ROCm/rocBLAS not implementing the condition for the above on their own, but the crash ultimately shouldn't happen to begin with. Considering it is a cgo crash, this might be a llama.cpp issue.
@dhiltgen commented on GitHub (Jan 26, 2024):
Poking around in the ROCm v5 and v6 container images, it does appear v6 has dropped support for these older
gfx803cards. @dixonl90 it sounds like you'll have to stay on the older v5 stack to retain compatibility with your GPU as it is now EOL on v6 and up.@dhiltgen commented on GitHub (Jan 27, 2024):
We've just pushed an updated release v0.1.22 which has some misc ROCm fixes, including the iGPU fix. There's also a container image now specific for ROCm support based on v5.
ollama/ollama:0.1.22-rocm@ignacio82 commented on GitHub (Jan 27, 2024):
@dhiltgen could you share an docker-compose file that uses ollama/ollama:0.1.22-rocm ?
@Airradda commented on GitHub (Jan 27, 2024):
This is what has worked for me:
Then I use a model file like this to use only the gpu:
@ignacio82 commented on GitHub (Jan 27, 2024):
Thanks @Airradda This is what I have:
But I cannot connect using ollama-webui. I get
404 page not foundwhen I try to go tohttp://192.168.86.124:11434/apiWhat am I missing? Thanks again!@Airradda commented on GitHub (Jan 27, 2024):
@ignacio82 This is what I quickly spun up. It's worth noting that both are running on the same machine:
@ignacio82 commented on GitHub (Jan 27, 2024):
I believe there is a problem with image: ollama/ollama:0.1.22-rocm. The following docker-compose does not work:
When I type something on the ui, i get
Uh-oh! There was an issue connecting to Ollama.. However, if i just change to ollama/ollama:latest everything works fine. Any ideas?@hiepxanh commented on GitHub (Jan 28, 2024):
@Airradda what is your GPU device are you using? can you share worked stack?
@Airradda commented on GitHub (Jan 28, 2024):
System
GPU:
6950 XT(I thinkgfx1030)CPU:
Ryzen 9 3900XContainer Engine:
Podman v4.9.0Compose File
Modelfile
@RootPrivileges commented on GitHub (Jan 28, 2024):
@ignacio82 this is working for me. It uses Docker name resolution to dynamically point the webui to the ollama container while staying inside of Docker (I'm wondering if the IP you've hardcoded in the env variable in your compose has changed since you found the IP maybe?)
I can see the AMD library loading in the container log output, and the GPU getting detected. Unfortunately, I only have a 2GB VRAM iGPU, so it falls back to CPU-only, but all the logs up until then appear like it would use the GPU correctly if I had one that met the minimum requirements. I am, however, able to make queries to the model and get answers back, even in this degraded state.
@dhiltgen commented on GitHub (Jan 28, 2024):
@ignacio82 if you're still having troubles, I'd suggest we isolate this down to reduce variables. Focus on just getting the rocm container to run on the GPU and process prompts without adding the webui complexity into the mix, then once that's working, add the webui back. If you
docker execinto the running ollama container, try to use the CLI, and confirm that works, and check the container logs to verify the server is seeing your GPU and running on it.@ignacio82 commented on GitHub (Jan 28, 2024):
@dhiltgen I think my problem is with
ollama-rocm. I can get into the container by runningsudo docker exec -ti ollama-rocm bash. However, when i try to run I model i get kicked out of the container:I'm not sure how to further debug this. Thanks for the help.
@kescherCode commented on GitHub (Jan 29, 2024):
@ignacio82 Try running
docker logs ollama-rocm -fin a separate window@ignacio82 commented on GitHub (Jan 29, 2024):
ollama_logs.txt
I attached the logs. I'm guessing this is the problem:
@kescherCode commented on GitHub (Jan 29, 2024):
@ignacio82 Ah, you need to set
HSA_OVERRIDE_GFX_VERSIONto10.3.0as an env var for the container when running it, since the ISA for gfx1030 to gfx1035 are identical, but ROCm ignores that fact very well.@hiepxanh commented on GitHub (Jan 29, 2024):
They just fix it few days ago, after i crying for a week, you need wait more few week until they decide to build the new merge
https://github.com/ROCm/Tensile/pull/1862
@meminens commented on GitHub (Jan 30, 2024):
Can someone please help set ollama/ROCm set up on Arch Linux with AMD GPU 7900 XTX? Thank you!
@kokizzu commented on GitHub (Jan 30, 2024):
You guys rocks 🥇
tested with 6600XT took 20-30s for same query while using CPU took 60s
https://kokizzu.blogspot.com/2024/01/ollama-with-amd-gpu-rocm.html
@ignacio82 commented on GitHub (Feb 1, 2024):
@kescherCode it made no difference
Any ideas?
@kescherCode commented on GitHub (Feb 1, 2024):
@ignacio82 if the second command is within the container, you did not properly run it. Try running rocminfo with HSA_OVERRIDE_GFX_VERSION set to 10.3.0. So prepend the
rocminfowith HSA_OVERRIDE_GFX_VERSION=10.3.0, or just make sure to run the container with that env var set.@ignacio82 commented on GitHub (Feb 2, 2024):
@kescherCode what I have in my previous post is outside the container, jut to show the graphic card that I have. This is what I get inside the container:
Does it matter that inside the container it says gfx1030 and outside gfx1035 ?? My docker-compose has
HSA_OVERRIDE_GFX_VERSION=10.3.0set as an environment variable@kescherCode commented on GitHub (Feb 2, 2024):
@ignacio82 ah, now I see. gfx1035 is actually a mobile GPU, which has no VRAM on its own.
@ignacio82 commented on GitHub (Feb 2, 2024):
@kescherCode that means I cannot get any GPU acceleration?
@mkesper commented on GitHub (Feb 2, 2024):
@kescherCode In llama.cpp you can use unified memory with integrated GPUs, would that be possible here, too? Else it makes no sense on iGPUs.
@dhiltgen commented on GitHub (Feb 2, 2024):
Integrated GPUs from AMD are not currently supported.
@Th3Rom3 commented on GitHub (Feb 3, 2024):
Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39.
Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release.
@Knallli commented on GitHub (Feb 4, 2024):
@dhiltgen why is that if I may ask?
@dhiltgen commented on GitHub (Feb 5, 2024):
@Knallli with our current configuration for llama.cpp, the resulting builds crash on iGPUs. At this point we're focused on enabling discrete GPUs first, and once that's in good shape, we can evaluate if supporting iGPUs is possible in the future.
@mkesper commented on GitHub (Feb 5, 2024):
Which is completely reasonable as AMD themselves do not support ROCm on
iGPUs sadly:
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/amdgpu-install.html#installation-via-amdgpu-installer
Am 05.02.24 um 02:23 schrieb Daniel Hiltgen:
@tolasing commented on GitHub (Feb 5, 2024):
i am unable to build from source i keep getting this error :
@kescherCode commented on GitHub (Feb 5, 2024):
No ROCm Docker image seems to have been published for 0.1.23.
@dhiltgen commented on GitHub (Feb 5, 2024):
Sorry about that. The image has been pushed.
@Th3Rom3 commented on GitHub (Feb 8, 2024):
After successfully getting ollama running on my
RX6800with ROCm bare metal I struggle to get the same running within Docker. It might be just my ignorance so let me know if this derails this issue too much or is beyond its scope.I can run ollama within the docker container using cpu inference just fine but I fail to get my gpu recognized using the
0.1.23-rocmor0.1.22-rocmdocker images.ROCm works fine on the host system but I fail to get it to run within a minimum viable container setup
I have spun up a fresh Ubuntu 22.04.3 installation (logs shown above are on this system) to exclude issues with my niche forked Fedora main system. But with the exact same result.
I have tried the ROCm setup with the following two parameter sets:
--usecase=graphics,rocm--usecase=graphics,rocm --no-dkmsI have followed most of this issue thread but maybe I am still missing something obvious, any pointers are appreciated.
Addendum: In the meantime I have tried different kernels under Ubuntu (6.2 and 6.3) as referenced by older versions of the ROCm documentation as requirements for amdgpu-dkms. No changes.
@chiragkrishna commented on GitHub (Feb 9, 2024):
ive installed CLBlast and ROCm and set the environment to HSA_OVERRIDE_GFX_VERSION=9.0.6
AMDGPU_TARGET=gfx906
and built ollama. when i run ollama i can see memory usage in radeontop. but there is no reply, it just keeps spinning. here are the logs.log. it was working before, but now it doesn't.
@ghost commented on GitHub (Feb 9, 2024):
How did you achieve that? I'm also on Fedora, but my offboard GPU is never used with Ollama.
@ghost commented on GitHub (Feb 9, 2024):
When I try select gpus
Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
57c139bbda7e: Pull complete
efa866b73628: Pull complete
a03f4e4cf912: Pull complete
Digest: sha256:3bc28f48a60ee34574dca0b0e310eff21e171b55d83fa06384bd83b97d9482b8
Status: Downloaded newer image for ollama/ollama:latest
fb68c6a21f168d3a0582cfc4b8891d80e42fb896c507e915a9c9a44a63c5e58a
docker: Error response from daemon: could not select device driver "" with capabilities: gpu.
(base) bash-5.2$
@Th3Rom3 commented on GitHub (Feb 9, 2024):
@CaioPrioridosSantos What GPU are you running? Looks like you are trying to run a docker setup? If you want to use an AMD card with docker you would need the separate ROCm enabled release: ollama/ollama:0.1.23-rocm.
As per documentation ROCm is not merged with ollama:latest due to the very large file size of the ROCm libraries that are included (and needed) in the ROCm release.
But as I've stated above so far I have not managed to get it to run within Docker myself. Only when installed on my bare metal machine using the bash script installation.
For reference this is the startup log of my bare metal/host system:
It detects and uses my GPU while using the installed ROCm 6.0.2 libraries without fail.
@havfo commented on GitHub (Feb 9, 2024):
I have a working docker setup for my RX6700XT on Debian testing/unstable. I am using latest ROCM libraries for Debian. I run the container like this:
Some logs from the conainer:
@zaskokus commented on GitHub (Feb 9, 2024):
Guys, gals, everyone, with latest archlinux, rocm 6.0 and latest ollama from releases everything just works. that is including using BOTH gpus (igpu and egpu in my example) at the same time.
@meminens commented on GitHub (Feb 9, 2024):
That's great news! Have you installed ollama with the bash script on the home page? How do you tell ollama to use the dGPU?
@zaskokus commented on GitHub (Feb 10, 2024):
@misaligar I just downloaded the compiled file from the releases, added executable flag and.. that's it. I have all the rocm libs/pkgs from the archlinux repo, everything is vanilla config, I have no extra flags added, no preloads, no variables before the command itself. I have NOT used the packaged ollama version, but the github bin file.
@meminens commented on GitHub (Feb 10, 2024):
Thank you so much! I just did the same and like you said it worked right off the bat. Amazing! I don't have compile with custom flags anymore.
@ghost commented on GitHub (Feb 10, 2024):
@Th3Rom3
Hello. I'm on a notebook, Fedora 39, AMD Ryzen™ 9 6900HS with Radeon™ Graphics × 16, and an AMD Radeon Rx6800s 8gb.
Yes, I'm on Docker, but if needed, I can remove the ollama on Docker and restart with the bash installation.
Thank you for sharing the image. It looks like it worked for you. What did you do beforehand for it to recognize ROCm and the GPU?
Thank you
@chiragkrishna commented on GitHub (Feb 10, 2024):
i can see the memory changes in radeontop, but there is no output, looks like have to wait for amd integrated graphics to work.
@zaskokus commented on GitHub (Feb 10, 2024):
@chiragkrishna I confirmed few posts before that iGPUS work well with ROCm 6.0, what's more, they work in tandem with dGPU/eGPU out of the box.
@chiragkrishna commented on GitHub (Feb 10, 2024):
@zaskokus using rocm 6.0.2, downloaded the latest ollama, but still it is stuck here
ollama_logs.txt
rocm test is good and i can run stable diffusion
test-rocm.py
@Th3Rom3 commented on GitHub (Feb 11, 2024):
I have managed to figure out my problems with running ROCm powered ollama in a Docker container. Somehow the Docker Engine setup bundled with the Linux version of Docker Desktop does not work with my GPU setup. After purging Docker and manually installing the docker engine it now works flawlessly both in a container and on bare metal.
@CaioPrioridosSantos Disclaimer I was testing on a forked Fedora distro called Nobara. Said distro comes with some ROCm dependencies preinstalled. Although I later updated the libraries manually to 6.0.2 myself.
I find it to be easier to test with a local installation first, since it removes the added complications of handing the GPU to a docker container. You can start by running
rocminfoon your bare metal system to see if it works at all (although it isn't strictly necessary to have the full ROCm stack set up both inside the container and on the host system).@ghost commented on GitHub (Feb 11, 2024):
Nothing yet. In intall bah cant recognize GPU AMD, searching only per GPU Nvidia but I have no Nvidia.
(base) bash-5.2$ ollama serve
time=2024-02-11T15:35:47.939Z level=INFO source=images.go:863 msg="total blobs: 0"
time=2024-02-11T15:35:47.939Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-11T15:35:47.939Z level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.1.24)"
time=2024-02-11T15:35:47.940Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-11T15:35:49.691Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 rocm_v5 cpu cpu_avx2 cpu_avx cuda_v11]"
time=2024-02-11T15:35:49.691Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-11T15:35:49.691Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-11T15:35:49.693Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.470.223.02]"
time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:300 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.470.223.02: nvml vram init failure: 9"
time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-11T15:35:49.895Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:35:49.895Z level=INFO source=routes.go:1022 msg="no GPU detected"
But, in container docker this recognize, like these logs
(base) bash-5.2$ docker logs ollama
time=2024-02-11T14:59:53.833Z level=INFO source=images.go:863 msg="total blobs: 0"
time=2024-02-11T14:59:53.833Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-11T14:59:53.834Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)"
time=2024-02-11T14:59:53.834Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-11T14:59:55.676Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 cpu cuda_v11 rocm_v6]"
time=2024-02-11T14:59:55.676Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-11T14:59:55.676Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-11T14:59:55.677Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-11T14:59:55.677Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T14:59:55.678Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-11T14:59:55.680Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-11T14:59:55.680Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T14:59:55.680Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
[GIN] 2024/02/11 - 15:08:35 | 200 | 53.445µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/02/11 - 15:08:35 | 404 | 216.646µs | 127.0.0.1 | POST "/api/show"
time=2024-02-11T15:08:38.651Z level=INFO source=download.go:136 msg="downloading 2609048d349e in 65 115 MB part(s)"
time=2024-02-11T15:10:40.100Z level=INFO source=download.go:136 msg="downloading 8c17c2ebb0ea in 1 7.0 KB part(s)"
time=2024-02-11T15:10:43.706Z level=INFO source=download.go:136 msg="downloading 7c23fb36d801 in 1 4.8 KB part(s)"
time=2024-02-11T15:10:48.088Z level=INFO source=download.go:136 msg="downloading 2e0493f67d0c in 1 59 B part(s)"
time=2024-02-11T15:10:51.566Z level=INFO source=download.go:136 msg="downloading fa304d675061 in 1 91 B part(s)"
time=2024-02-11T15:10:54.968Z level=INFO source=download.go:136 msg="downloading be61bcdf308e in 1 558 B part(s)"
[GIN] 2024/02/11 - 15:11:01 | 200 | 2m25s | 127.0.0.1 | POST "/api/pull"
[GIN] 2024/02/11 - 15:11:01 | 200 | 659.444µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/02/11 - 15:11:01 | 200 | 238.289µs | 127.0.0.1 | POST "/api/show"
time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:11:01.766Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:11:01.766Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:11:01.861Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2384028714/rocm_v5/libext_server.so"
time=2024-02-11T15:11:01.861Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon RX 6800S, compute capability 10.3, VMM: no
llama_model_loader: loaded meta data with 23 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256:2609048d349e7c70196401be59bea7eb89a968d4642e409b0e798b34403b96c8 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 5120
llama_model_loader: - kv 4: llama.block_count u32 = 40
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 13824
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 40
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 40
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "
", "", "<0x00>", "<...llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q4_0: 281 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 5120
llm_load_print_meta: n_embd_v_gqa = 5120
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 6.86 GiB (4.53 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '
''llm_load_print_meta: EOS token = 2 '
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.28 MiB
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloaded 22/41 layers to GPU
llm_load_tensors: ROCm0 buffer size = 3744.30 MiB
llm_load_tensors: CPU buffer size = 7023.90 MiB
....................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: ROCm0 KV buffer size = 880.00 MiB
llama_kv_cache_init: ROCm_Host KV buffer size = 720.00 MiB
llama_new_context_with_model: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB
llama_new_context_with_model: ROCm_Host input buffer size = 14.01 MiB
llama_new_context_with_model: ROCm0 compute buffer size = 213.40 MiB
llama_new_context_with_model: ROCm_Host compute buffer size = 209.00 MiB
llama_new_context_with_model: graph splits (measure): 5
time=2024-02-11T15:11:07.468Z level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[GIN] 2024/02/11 - 15:11:07 | 200 | 5.922620794s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/02/11 - 15:11:36 | 200 | 15.424189225s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/02/11 - 15:14:27 | 200 | 2m5s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/02/11 - 15:17:02 | 200 | 9.417514508s | 127.0.0.1 | POST "/api/chat"
time=2024-02-11T15:38:07.045Z level=INFO source=images.go:863 msg="total blobs: 6"
time=2024-02-11T15:38:07.045Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-11T15:38:07.046Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)"
time=2024-02-11T15:38:07.046Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-11T15:38:08.870Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cuda_v11 cpu_avx2 cpu_avx rocm_v5 cpu]"
time=2024-02-11T15:38:08.870Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-11T15:38:08.870Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-11T15:38:08.872Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-11T15:38:08.872Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T15:38:08.873Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-11T15:38:08.889Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-11T15:38:08.889Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:38:08.889Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
@mkesper commented on GitHub (Feb 11, 2024):
Docker Desktop on Linux is a non-starter as it uses a VM without any need "to give you the same experience as on MacOs and Windows". https://docs.docker.com/desktop/install/linux-install/ Please avoid this idiocy.
@Th3Rom3 commented on GitHub (Feb 11, 2024):
That is good to know but it was not obvious to me as it did not even come up once as a potential problem during my troubleshooting steps. Hence my post to make the issue visible for posterity.
It also does not help that the Docker Desktop for Linux context was left even after I followed the uninstall procedure. I had to remove it manually via
docker context remove desktop-linux. But as always this might have been due to an oversight on my part.Bottom line: As of release 0.1.22 ollama works well for me with ROCm acceleration using a RX6800 both bare metal as well as (native) docker.
@ghost commented on GitHub (Feb 11, 2024):
Hey, thanks for the quickly response.
Some tutorial to update rocm to 6.0.2 on fedora?
@dhiltgen commented on GitHub (Feb 12, 2024):
@CaioPrioridosSantos these two lines when running on your host:
Indicate that you don't have the rocm SMI library installed, which we currently use to query for GPU information. The package is likely called
rocm-smi-libor something along those lines. We are exploring refactoring the way we discover AMD GPUs to try to remove this dependenciy, but for now, you'll need to install that library for it to discover your GPU. The container image has the library bundled into the image.@askareija commented on GitHub (Feb 13, 2024):
I have an AMD laptop, My specifications are:
MSI Laptop - Bravo 15 B7E
OS: EndeavourOS Galileo (Based on Arch linux)
DE: Hyprland 0.35.0
CPU: AMD Ryzen 5 7535HS
iGPU: Radeon Graphics (gfx1035)
dGPU: Radeon RX 6550M (gfx1034)
RAM: 32GB DDR5-4800
I try to build with these steps:
git clone --recursive https://github.com/ollama/ollama.gitthen install some packages:
sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast goafter that i build with these params:
3.
AMDGPU_TARGET=gfx1034 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...there are some warnings but i don't know what kind it is because there's a lot. kinda deprecated things.
now i run
go build -tags rocmbut nothing or no feedback to console. then i ranollama serveIt's said that Radeon GPU detected, then on below no GPU detected again.
i try to run
ollama run codellama:7bbut it's said it's falling back to CPU:this is result from

rocm-smi:clinfo -l:am i missing something here?
@chiragkrishna commented on GitHub (Feb 13, 2024):
Build with this
AMDGPU_TARGET=gfx1030
Run with this
HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve
@askareija commented on GitHub (Feb 13, 2024):
ok so i've removed ollama, clone & rebuild again with this,
AMDGPU_TARGET=gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...then runHSA_OVERRIDE_GFX_VERSION=10.3.0 ./ollama serve.But it's still saying "
GPU not available, falling back to CPU"i'm using rocm 6.0.0 (according to pacman)
edit:
git clone --recursive https://github.com/65a/ollamanot working so i usegit clone --recursive https://github.com/ollama/ollama@ghost commented on GitHub (Feb 13, 2024):
The question is: I have rocm but I did not the 6.0.2 installation, I have at the moment 5.7 version
@Th3Rom3 commented on GitHub (Feb 13, 2024):
@CaioPrioridosSantos
You can check what you have installed
dnf list installed | grep rocm-smiIt should be ROCm 5.7.1. based on the Fedora 39 repo
You should not need ROCm 6.0.x for it to run accelerated. That was just something I was messing with personally.
@Venefilyn commented on GitHub (Feb 16, 2024):
I can't figure mine out other than it does seem to detect my GPU correctly. Crashes as soon as any model is used. Using the ollama/ollama:0.1.24-rocm build with
Specifically, from the logs below this seems to be the interesting error
Log output
@Venefilyn commented on GitHub (Feb 16, 2024):
Nevermind my above comment, I think it's because I run a GPU that doesn't support ROCm, Radeon RX 5600 XT
@askareija commented on GitHub (Feb 17, 2024):
okay so after many rebuilds i've got my AMD GPU detected on ROCm.
now the weird thing is when i use
codellama:7bit's still using my CPU insted of GPU. my GPU usage only used about 9% below (monitoring using amdgpu_top & nvtop)But, if i'm using
deepseek-coder:1.3bit's blazingly fast and my GPU usage goes up like 50% or more.so... what's seem to be the problem here?
@mkesper commented on GitHub (Feb 19, 2024):
Hi Aden,
it would be interesting to know what you changed between the rebuilds.
For the different models: Did you check the VRAM requirements of each? If it won't fit it probably can only be run on CPU.
@askareija commented on GitHub (Feb 19, 2024):
@mkesper
Hi Michael,
Here's the step i've gone through to make it works on my laptop (on Arch Linux):
git clone --recursive https://github.com/ollama/ollama.gitcd ollamasudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast goAMDGPU_TARGET=gfx1030 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...go build -tags rocmHSA_OVERRIDE_GFX_VERSION=10.3.0 ./ollama serveNow i have working ollama with my GPU, and yes after checking the VRAM looks like it's not fit so it's using my CPU. My solution was make my own model:
Modelfileand then create custom model:
./ollama create codellama:7b-22 -f ./Modelfilenow i have utilized my GPU:

finally it's working
@badverybadboy commented on GitHub (Feb 20, 2024):
Is it possible for you to share the docker image, or share detailed instructions. Does the host os need to have rocm support?
@sid-cypher commented on GitHub (Feb 20, 2024):
You'll need the amdgpu DKMS drivers from the ROCm package on the host OS.
As for the image builing, is the Dockerfile not enough?
@shanoaice commented on GitHub (Feb 22, 2024):
I do wonder, is this possible on Windows? AMD had recently added HIP SDK support for Windows, or does this require something else that's not currently possible?
@dhiltgen commented on GitHub (Feb 26, 2024):
@shanoaice I've opened #2598 to track Radeon native Windows support
@mkesper commented on GitHub (Feb 27, 2024):
No luck with iGPU and ollama:0.1.27-rocm.
Does not find a /sys/module/amdgpu/version and reports negative memory.
time=2024-02-27T15:20:37.354Z level=INFO source=images.go:710 msg="total blobs: 8" time=2024-02-27T15:20:37.354Z level=INFO source=images.go:717 msg="total unused blobs removed: 0" time=2024-02-27T15:20:37.355Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)" time=2024-02-27T15:20:37.355Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-02-27T15:20:39.398Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v6 cpu_avx2 cuda_v11 cpu rocm_v5 cpu_avx]" time=2024-02-27T15:20:39.398Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-02-27T15:20:39.398Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-27T15:20:39.398Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-27T15:20:39.398Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]" time=2024-02-27T15:20:39.399Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-02-27T15:20:39.399Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-27T15:20:39.399Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /opt/rocm/lib/librocm_smi64.so* /usr/local/lib/librocm_smi64.so* /opt/rh/devtoolset-7/root/librocm_smi64.so*]" time=2024-02-27T15:20:39.400Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.5.0.50701 dlsym: rsmi_init dlsym: rsmi_shut_down dlsym: rsmi_dev_memory_total_get dlsym: rsmi_dev_memory_usage_get dlsym: rsmi_version_get dlsym: rsmi_num_monitor_devices dlsym: rsmi_dev_id_get dlsym: rsmi_dev_name_get dlsym: rsmi_dev_brand_get dlsym: rsmi_dev_vendor_name_get dlsym: rsmi_dev_vram_vendor_get dlsym: rsmi_dev_serial_number_get dlsym: rsmi_dev_subsystem_name_get dlsym: rsmi_dev_vbios_version_get time=2024-02-27T15:20:39.401Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-27T15:20:39.401Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-27T15:20:39.401Z level=DEBUG source=gpu.go:158 msg="error looking up amd driver version: %s" !BADKEY="amdgpu file stat error: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-02-27T15:20:39.401Z level=DEBUG source=amd.go:76 msg="malformed gfx_target_version 0" discovered 1 ROCm GPU Devices [0] ROCm device name: Rembrandt [Radeon 680M] [0] ROCm brand: Rembrandt [Radeon 680M] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: unknown rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: 0x50b4 [0] ROCm vbios version: 113-REMBRANDT-X37 [0] ROCm totalMem 1073741824 [0] ROCm usedMem 925331456 time=2024-02-27T15:20:39.404Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with -882M available memory"dmesg|grep amdgpu:
[ 2.451312] [drm] amdgpu kernel modesetting enabled. [ 2.461887] amdgpu: Virtual CRAT table created for CPU [ 2.461910] amdgpu: Topology: Add CPU node [ 2.462061] amdgpu 0000:33:00.0: enabling device (0006 -> 0007) [ 2.465847] amdgpu 0000:33:00.0: amdgpu: Fetched VBIOS from VFCT [ 2.465850] amdgpu: ATOM BIOS: 113-REMBRANDT-X37 [ 2.465893] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_toc.bin [ 2.466021] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_ta.bin [ 2.466207] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_dmcub.bin [ 2.466384] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_pfp.bin [ 2.466569] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_me.bin [ 2.466755] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_ce.bin [ 2.466925] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_rlc.bin [ 2.467074] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_mec.bin [ 2.467265] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_mec2.bin [ 2.467598] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_vcn.bin [ 2.468072] amdgpu 0000:33:00.0: vgaarb: deactivate vga console [ 2.468075] amdgpu 0000:33:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 2.468122] amdgpu 0000:33:00.0: amdgpu: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used) [ 2.468124] amdgpu 0000:33:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF [ 2.468126] amdgpu 0000:33:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF [ 2.469069] [drm] amdgpu: 1024M of VRAM memory ready [ 2.469075] [drm] amdgpu: 15429M of GTT memory ready. [ 2.470296] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_sdma.bin [ 2.470358] amdgpu 0000:33:00.0: amdgpu: Will use PSP to load VCN firmware [ 2.658417] amdgpu 0000:33:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 2.670602] amdgpu 0000:33:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 2.670605] amdgpu 0000:33:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 2.673063] amdgpu 0000:33:00.0: amdgpu: SMU is initialized successfully! [ 2.867007] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 2.867028] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 [ 2.867223] amdgpu: Virtual CRAT table created for GPU [ 2.867381] amdgpu: Topology: Add dGPU node [0x1681:0x1002] [ 2.867383] kfd kfd: amdgpu: added device 1002:1681 [ 2.867396] amdgpu 0000:33:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12 [ 2.867583] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 2.867585] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 2.867587] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 2.867588] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 2.867589] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 2.867590] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 2.867591] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 2.867592] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 2.867593] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 2.867594] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [ 2.867595] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 2.867597] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 2.867598] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 2.867599] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 2.867600] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 [ 2.876353] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:33:00.0 on minor 0 [ 2.886076] fbcon: amdgpudrmfb (fb0) is primary device [ 3.809581] amdgpu 0000:33:00.0: [drm] fb0: amdgpudrmfb frame buffer device [ 16.739507] snd_hda_intel 0000:33:00.1: bound 0000:33:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 9128.244511] amdgpu 0000:33:00.0: amdgpu: SMU is resuming... [ 9128.246334] amdgpu 0000:33:00.0: amdgpu: SMU is resumed successfully! [ 9129.206014] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 9129.206017] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 9129.206019] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 9129.206021] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 9129.206022] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 9129.206024] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 9129.206025] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 9129.206027] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 9129.206028] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 9129.206030] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [ 9129.206032] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 9129.206034] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 9129.206035] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 9129.206036] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 9129.206038] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8@mkesper commented on GitHub (Feb 27, 2024):
I for one wasn't able to build an image with it yet:
@dhiltgen commented on GitHub (Mar 2, 2024):
I know everyone's eager for a more stable AMD GPU setup for Ollama, so I wanted to give a quick update on where we're at and the current plan.
I just got Radeon cards working in windows, so I should have a PR up in the next day or two adding support for Windows ROCm (tracked via #2598)
I know a lot of folks have been seeing crashes with ROCm v5 on linux. Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience, so I'm going to pivot to only supporting v6 on Linux in our official builds. We'll set this up so we can auto-detect v6 if already installed in the system, and if not detected, the install script will download an artifact from our releases page if we see an AMD GPU. The goal is you should only need to have the driver installed and Ollama will take care of the library dependencies.
On windows, v6 has not yet shipped, so we'll use v5 for now, but I believe the v6 release is imminent, so we'll switch to that once it's available. Again, we'll aim to carry the library in the installer to streamline the user experience, although it will result in a much larger installer due to the size of the ROCm tensor data files.
For folks with older GPUs that aren't supported by v6 (e.g. RX 580 #2453) what I'm hoping to do is refine our build process so you could install an older ROCm library version that does support it, and build from source locally to get a working setup. That will take some more work as workarounds are required, but that's our goal.
We're going to focus on Discrete GPUs first and get those stable, then we'll come back to add support for iGPUs with #2637
Thanks everyone for your patience as we work through the best approach to support Radeon GPUs.
@mnn commented on GitHub (Mar 3, 2024):
As far as I know that is not supported by other AI stuff like automatic1111. Just yesterday I tried to update ROCm to 6 and that ended in a frozen PC. So until at least a1111 supports ROCm 6, I am staying on stable 5.6.1 and text-generation-webui. To be frank, ollama even on CPU was still far from usable (kept unloading model after few seconds, making it painfully slow to use). Kinda disappointed I wasted hours on this and it is still so far away...
@sid-cypher commented on GitHub (Mar 3, 2024):
A1111 relies on Torch, and using A1111 with
have always worked well for me (with rx7900xtx, both ROCm 5.7 and 6.0.x).
There is also an option to build Torch for the newer ROCm, but in my experience Torch "nightly/rocm5.7" works just fine with 6.0.2 ? Your mileage may vary.
@Th3Rom3 commented on GitHub (Mar 5, 2024):
One quick comment on that (hopefully that does not stray too far offtopic):
Yeah I agree unless we talk about running it on Windows I did not have problems running stable diffusion with different ROCm/torch combinations.
BTW the torch build is no longer nightly only, you can now use
--index-url https://download.pytorch.org/whl/rocm5.7not sure if it redirects to the same repo.
However I agree that ROCm is a bit of a mess on consumer cards. Hopefully they can finally find a stable base to build upon with longer term support for different card generations without dropping support (offical or non-official) at the next minor version.
@frostworx commented on GitHub (Mar 7, 2024):
Thank you very much!
@dhiltgen commented on GitHub (Mar 11, 2024):
The pre-release for 0.1.29 is live and ready for broader testing.
To install on linux, you'll need to use an updated install script from my branch. (We'll merge this once we wrap up testing and mark the release latest) This new update to the install script will detect Radeon cards (via the amdgpu driver presence) and set up ROCm v6 for ollama if it's not already present on the host.
Windows users can install the OllamaSetup.exe from the 0.1.29 release page which includes ROCm v5.7 (latest at this time.)
We've updated our troubleshooting docs to include some pointers on Radeon GPU compatibility.
Please let us know if you run into any problems with the pre-release (on this issue, or open a new issue)
@JorySeverijnse commented on GitHub (Mar 12, 2024):
@dhiltgen I've tried to get my 6700xt working with the install script above and have exported this env variable HSA_OVERRIDE_GFX_VERSION="10.3.0" because my card is gfx1031. Also installed rocblas and rocm-core on arch btw so i actually have /opt/rocm/lib/librocblas.so. After running ollama serve i get
time=2024-03-12T23:34:26.569+01:00 level=INFO source=images.go:806 msg="total blobs: 0" time=2024-03-12T23:34:26.569+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-12T23:34:26.569+01:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-12T23:34:26.569+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3641506161/runners ..." time=2024-03-12T23:34:28.405+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 cpu_avx2 rocm_v60000]" time=2024-03-12T23:34:28.405+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-12T23:34:28.405+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-12T23:34:28.411+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-12T23:34:28.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-12T23:34:28.411+01:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-12T23:34:28.411+01:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]" time=2024-03-12T23:34:28.411+01:00 level=WARN source=amd_linux.go:339 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" time=2024-03-12T23:34:28.412+01:00 level=WARN source=amd_linux.go:96 msg="unable to verify rocm library, will use cpu: no suitable rocm found, falling back to CPU" time=2024-03-12T23:34:28.412+01:00 level=INFO source=routes.go:1105 msg="no GPU detected"@dhiltgen commented on GitHub (Mar 13, 2024):
@JorySeverijnse the error
amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions atis why it didn't use the GPU. If you run withOLLAMA_DEBUG=1you'll be able to see more information about where it's searching for ROCm. What is probably most relevant for your setup is either set HIP_PATH or make sure LD_LIBRARY_PATH contains it. We also need v6, so if you have v5 installed, that wont work.@askareija commented on GitHub (Mar 13, 2024):
@dhiltgen i've tried pre-release version 1.0.29, it's working good and detected my GPU's, the problem is i always got:
I think this is because i have integrated GPU:
And i don't know how to set only using device 0, instead of using all GPU's.
Full logs:
@shanoaice commented on GitHub (Mar 13, 2024):
@askareija You probably need to set
ROCR_VISIBLE_DEVICES. For example in this case you only want to use device 0, then you setROCR_VISIBLE_DEVICES=0. If you are manually running ollama server thenROCR_VISIBLE_DEVICES=0 ollama servewill do the trick. Take a look at the docs if you want to modify the systemd service.@JorySeverijnse commented on GitHub (Mar 13, 2024):
Thanks for the fast reply, i had a look at the manual installation but that was not for building from source or arch package. I've found the problem i also had to install those two packages in order to get rocm to work rocm-hip-sdk rocm-opencl-sdk
@MorrisLu-Taipei commented on GitHub (Mar 13, 2024):
Hi thanks to all, provide a easier way to use RX7900 w/ollama


ollama docker runing well.
BUT, the performance is so terrible (compare to Dual 3090) with some prompt and base model.
Anyone can help?? thanks in advance.
@dhiltgen commented on GitHub (Mar 13, 2024):
@askareija you hit a bug around iGPU detection that I fixed yesterday. We haven't pushed updated builds to github yet with that fix as we're chasing down a couple other release blocker bugs, but we hope to have an updated build later today.
@windblade89 commented on GitHub (Apr 17, 2024):
How do I make it support AMD Radeon RX 580 8GB Card anyone have like a guide?
@shanoaice commented on GitHub (Apr 18, 2024):
I don't believe that ROCm support GCN3 architecture cards. You will need a RDNA card for this, RX580 will not work.
@unoexperto commented on GitHub (Apr 26, 2024):
Hi folks, sorry for reviving old thread. I'm on ollama 0.1.32 and have ROCM 6 drivers installed. I modified
/etc/systemd/system/ollama.serviceso that it contains following overridesAfter this I see in the log that ollama uses "GPU" but the caveat is that I don't have dedicated GPU. I'm on Lenovo T14 Gen4 which has integrated videocard (AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics).
As result ollama reports in the log that GPU has 1GB of memory which is obvious too little. Is there anything I could do to make it allocated all available shared VRAM ?
@xyproto commented on GitHub (Apr 29, 2024):
Hi, I just packaged
ollama-rocmfor Arch Linux.I don't have access to an ADM graphics card right now. Please test if it works.
Are these environment variables required for
ollama-rocmto work for as many users as possible?@badverybadboy commented on GitHub (Apr 29, 2024):
I could test it out but issue is I have only Vega64 and according to documentation https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html it seems its no longer supported. Let me know if thats not the case then I could give it a try.
@badverybadboy commented on GitHub (Apr 29, 2024):
I saw that you referenced RX580s(gfx803-gfx805) in the post but there is no mention of Vega(gfx900-906) architecture. Are they already working with some workaround as I could not get it to work on the Rocm 6xx on my install of ubuntu. Rocm actually caused issues of graphics card failing and things not working so I could not proceed with the Rocm drivers and gave up. If there is a way to get it working with Rocm, I would really appreciate.
@sfxworks commented on GitHub (Apr 29, 2024):
on my w6800 with those additions in the systemd
I have rocm lib in /opt/rocm, assuming its looking for /opt/rocm/hip/lib?
@sfxworks commented on GitHub (Apr 29, 2024):
OLLAMA_LLM_LIBRARY=rocm_v60002 ROCR_VISIBLE_DEVICES=0 OLLAMA_DEBUG=1 ROCM_PATH="/opt/rocm" CLBlast_DIR="/usr/lib/cmake/CLBlast" HIP_PATH="/opt/rocm/hip/lib" HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama servenoooo idea where to go from here@sfxworks commented on GitHub (Apr 30, 2024):
Well, uhh I also have this as a k8s node
so I installed the rocm k8s device plugin https://github.com/ROCm/k8s-device-plugin
and ran the rocm build
Seems to be detected and works ok
@sfxworks commented on GitHub (Apr 30, 2024):
I do wish it would remove just 1 layer from the gpu since theres some overhead here / keep getting oom for gpu cpu split but hey it tries. switching to 8x7b
@sfxworks commented on GitHub (Apr 30, 2024):
Yep, works perfect. Who knows what I have on this machine that may have been leftover between nvidia and amd drivers on this arch build. Containers rule.