mirror of
https://github.com/ollama/ollama.git
synced 2026-05-07 08:30:05 -05:00
Closed
opened 2026-04-12 15:11:18 -05:00 by GiteaMirror
·
57 comments
No Branch/Tag Specified
main
hoyyeva/anthropic-local-image-path
dhiltgen/ci
dhiltgen/llama-runner
parth-remove-claude-desktop-launch
hoyyeva/anthropic-reference-images-path
parth-anthropic-reference-images-path
brucemacd/download-before-remove
hoyyeva/editor-config-repair
parth-mlx-decode-checkpoints
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
parth/update-claude-docs
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.30.0-rc3
v0.30.0-rc2
v0.30.0-rc1
v0.30.0-rc0
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#4259
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ross-rosario on GitHub (Sep 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6756
Originally assigned to: @dhiltgen on GitHub.
What is the issue?
Error: llama runner process has terminated: signal: segmentation fault (core dumped). It occurs while loading larger models, that are still within the VRAM capacity. Here I'm trying to loadcommand-r:35b-08-2024-q4_K_M(19GB), on an RX 7900 XTX with 24GB of VRAM. Smaller models load fine.Edit: even with
gemma2:27b-instruct-q4_K_M16 GBI still get the error. It seems that max model size that can be loaded is13 GB, egcodestral:22b-v0.1-q4_K_M.From the logs: ollama clearly says that available vram is
23.5 GiBSep 11 09:41:57 computer ollama[71334]: time=2024-09-11T09:41:57.987-06:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=0.0 name=1002:744c total="24.0 GiB" available="23.5 GiB"Error:
Sep 11 09:43:18 computer ollama[71334]: time=2024-09-11T09:43:18.408-06:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" Sep 11 09:43:26 computer ollama[71334]: time=2024-09-11T09:43:26.324-06:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: segI could've swore that models of that size used to load just fine on older ollama version, but unfortunately, not sure which was the latest ollama version that worked.
OS
Linux
GPU
AMD
CPU
AMD
Ollama version
0.3.10
@igorschlum commented on GitHub (Sep 11, 2024):
hi @remon-nashid can you retry using ollama 0.3.10 ?
@ross-rosario commented on GitHub (Sep 11, 2024):
@igorschlum will do as soon as it lands in archlinux packages.
@dhiltgen commented on GitHub (Sep 12, 2024):
I don't have an identical setup, but I tried to repro on windows on an 7900 XTX with 0.3.9 and 0.3.10 and they both load this model OK. @remon-nashid are you specifying any custom parameters like context size? Server logs might help narrow things down as well.
@ross-rosario commented on GitHub (Sep 12, 2024):
Hi @dhiltgen , thanks for looking into this. I'm not specifying neither a custom context size nor other parameters.
Below are the logs from my attempt to
ollama run command-r:35b-08-2024-q4_K_M.@ross-rosario commented on GitHub (Sep 12, 2024):
If it helps, here are the local ROCm-related packages. Also I have 32 GB of ram, out of which 20+ are free.
@dhiltgen commented on GitHub (Sep 12, 2024):
It might be a different crash, but my suspicion is it's memory prediction related. To workaround this until we find the root cause, you can set
num_gputo a smaller value (try 40, 39, ...) or use the new env-varOLLAMA_GPU_OVERHEADto reserve some VRAM so our algorithm calculates less layers to load. Let us know how many layers do load successfully, or if this turns out not to be the root cause and the crash isn't memory related.@ross-rosario commented on GitHub (Sep 12, 2024):
@dhiltgen would you point me to docs about those vars please?
Edit: I'll try
OLLAMA_GPU_OVERHEADas soon as I get my hands on ollama v0.3.10.@dhiltgen commented on GitHub (Sep 13, 2024):
@remon-nashid you can run
ollama serve --helpto get a short description of the configuration variables, and some are discussed in more depth in various markdown files in the docs directory.@ProjectMoon commented on GitHub (Sep 13, 2024):
Found this issue because I was experiencing a similar thing. Gemma2 Q4_K_M would load just fine some time ago, but now crashes in a conversation with a few back and forths between me and the AI. Setting GPU overhead to 1 GiB 'fixes' it. Loads 33 out of 47 layers.
@ross-rosario commented on GitHub (Sep 14, 2024):
Well, I've tried setting
OLLAMA_GPU_OVERHEAD=1073741824(the value is in bytes, right?) but the segmentation fault persists. Perhaps I need to try with different values.btw that's on ollama 0.3.10
@ProjectMoon commented on GitHub (Sep 14, 2024):
The value is in bytes. Try increasing it until it doesn't crash. And if that doesn't work, manually lower
num_gpuparameter in the modelfile until it loads.@ross-rosario commented on GitHub (Sep 14, 2024):
Thanks @ProjectMoon for your help but I wonder, assuming that due to a bug, ollama doesn't calculate the available VRAM well, shouldn't it try to offload some layers to CPU? It used to behave that way in older versions but that as well seems to have regressed lately.
@ross-rosario commented on GitHub (Sep 15, 2024):
Just wanted to update you guys that I've tried an overhead up to 10 GB but still wouldn't load the model.
@ProjectMoon commented on GitHub (Sep 15, 2024):
It should offload to CPU yes. Especially if you manually lower the number of GPU layers.
But if 10 GB overhead didn't let the model load, it sounds like something else is going on. Either overhead parameter not set correctly, not working in the code, or possibly something else entirely.
Are you using the ollama supplied ROCm or a system installed one?
@kennethjyang commented on GitHub (Sep 17, 2024):
I'm experiencing this as well. I'm using
dnfI can load Llama 3.1 8B but I can't do any of the 70B models. I'll try the suggestions here, but I think there needs to be some way for Ollama to detect and deal with this.
In the logs I just get a message that says the LLM didn't "respond" and the exit sequence starts.
@kennethjyang commented on GitHub (Sep 17, 2024):
Update: this still did not work, although I can't tell if it did anything. I kept on increasing the GPU overhead and restarted the server but there was no difference in behavior.
@ross-rosario commented on GitHub (Sep 17, 2024):
I'm not aware of that ollama supplied ROCm binaries. I have arch linux's package installed https://archlinux.org/packages/extra/x86_64/ollama-rocm/
@kennethjyang commented on GitHub (Sep 18, 2024):
I'm using the Fedora-supplied ROCm packages. And it works since the log shows it was compatible. I can also run Llama 3.1 8b, just not 70b
@ross-rosario commented on GitHub (Sep 18, 2024):
@dhiltgen One interesting finding revealed by sending requests to ollama in parallel:
mistral-small:22bandinternlm2:20b. Ollama would load one model, generate, unload it, load the next one, generate. Everything is fine.gemma2:2bmqwen2:1.5band finallymistral-small:22b. Ollama loads and runs generation for the smaller models in parallel, by the time it gets to the larger model,segmentation faultoccurs.@ross-rosario commented on GitHub (Sep 20, 2024):
Any updates regarding this issue? Any other debugging info can we provide?
@ross-rosario commented on GitHub (Sep 21, 2024):
I bit the bullet on downgrading to
v0.3.6and things are working again@unclemusclez commented on GitHub (Sep 24, 2024):
me too
@dhiltgen commented on GitHub (Sep 24, 2024):
@remon-nashid so it sounds like this is not actually a memory prediction error, but a crash due to some incompatibility between the driver, rocm, and llama.cpp code. It sounds like you're on archlinux, so installing the AMD downstream latest driver isn't an option for you and you have to stick on the older older driver bundled into the linux kernel. (We've found the AMD downstream driver is more reliable in our testing, but has limited distro support). It looks like you're setting
HSA_OVERRIDE_GFX_VERSION=11.0.0but that IS the gfx of your GPU, so I'm a little confused why you're doing that, as it shouldn't be necessary. That setting disables some of our safeguard checks to verify the ROCm library being used supports the detected GPU.I can't speak to how these are built, so given it looks like we're dealing with a runtime incompatibility/mismatch between driver/rocm/ollama, can you try using our official build with our bundled ROCm? If that fixes it, then we can shift this issue over to the archlinux maintainers of the package. If it still fails, then we know it's something else.
@unclemusclez commented on GitHub (Sep 24, 2024):
Does ollama update itself? I have to cp 0.3.6 over each time to get it to start without segfault. I have no idea why that would be. I'm copying it to
/usr/share/ollama@RayNguyen842857 commented on GitHub (Sep 25, 2024):
I'm running a 7600 XT on NixOS and having the same issue on both 0.3.10 and 0.3.11. Downgrading to 0.3.9 temporarily fixed the issue for me...
@ProjectMoon commented on GitHub (Sep 25, 2024):
I've noticed behavior like this too. It will sometimes seemingly get confused about how it should load or unload models, and try to load a model into the GPU while the GPU is full. Only happens in parallel processing situations. I've had it with two larger models (14 GB+ VRAM) on my main AMD GPU while a 3rd smaller model is running on my secondary ancient NVidia GPU.
@MaciejMogilany commented on GitHub (Sep 25, 2024):
On My AMD APU, a segmentation fault appears whenever --no-mmap is applied in Linux. Ollama adds it on memory constraints situations. This fits the scenarios described above.
On https://github.com/ollama/ollama/pull/6282 every model even small:360M gives seq fault on APU because --no-mmap is used there all the time to avoid duplicating memory on shared memory that APU uses.
I will bet that recent llama.cpp update changes something.EDIT: compiled llama.cpp git 8962422 that current ollama uses and it works with --no-mmap without any error.
@ProjectMoon commented on GitHub (Sep 25, 2024):
That might explain why I had to turn mmap back on for a bunch of models.
@MaciejMogilany commented on GitHub (Sep 25, 2024):
--no-mmap allows loading big models on 3.9 ollama without OOM.
There is another recent issue but from Nvidia card that has this pattern no-mmap and cuda out of memory 6864
@unclemusclez commented on GitHub (Sep 25, 2024):
i'm running 70b without --no-map on 0.3.9
@MaciejMogilany commented on GitHub (Sep 26, 2024):
Because if fit on GPU VRAM
Partial offload on memory constrain situations can make memory paging indefinitely on Linux. And there are other situations like for APU I am able to load to GTT memory mistral large q4 with --no-mmap flag, without it max is mistral large q2K on 80Gib GTT, 96Gib RAM. Without it CPU buffer often became of a size of whole model giving 2x memory requirements one ROCM buffer one CPU buffer. But this is a problem specific to APU shared memory (VRAM is on RAM).
@waltercool commented on GitHub (Sep 26, 2024):
Hi everyone,
The way I been avoiding some crashes (Gentoo related issue https://github.com/ollama/ollama/issues/6857) is using Ollama bundled libraries (LD_LIBRARY_PATH).
While not ideal, it works fine. https://github.com/ollama/ollama/releases/download/v0.3.12/ollama-linux-amd64-rocm.tgz
This is the way I was able to run my tests for PR 6282 https://github.com/ollama/ollama/pull/6282#issuecomment-2375832252
@olly1240 commented on GitHub (Sep 26, 2024):
For me both system libraries and the ones bundled in the zip file results in the same segfault behavior, it crashes even with OLLAMA_GPU_OVERHEAD set to ridiculous amounts (3gb), with a dGPU with 12gb ram. Previous releases do work fine for me. Arch with packaged rocm 6.0.2
@unclemusclez commented on GitHub (Sep 26, 2024):
i notice that the libraries definitely have something to do with it. If i compile from soruce it is ok, but i still have to restart ollama to get it to not use the packaged libraries from a previous install, i think
the ENV variables probably have something to do with this. also, the LDs are an extra 7GB so to not use them would be ideal.
@ross-rosario commented on GitHub (Oct 4, 2024):
I installed the official build after uninstalling the arch package but unfortunately, the same error is still reproducible.
@carsoncall commented on GitHub (Oct 4, 2024):
+1 for the segmentation fault. I could run the 8b model, but not the 70b.
OS: Fedora 40 (Sway)
GPU: AMD RX 6800
Model: Llama3.1:70b
Ollama version: 3.12
I first tried installing Ollama's packaged ROCm drivers. This didn't seem to make a difference.
I tried Ollama version 3.9, and that was broken as well. Downgrading to 3.6 fixed it.
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.3.6 sh@tbsteinb commented on GitHub (Oct 6, 2024):
+1 here as well.
I'm using the Docker image. I can confirm that if I roll back to 3.6 everything seems to work fine.
Host OS: Gentoo
GPU: AMD RX 7900 XTX
Model: Mixtral:8x7b
Ollama version: ollama:rocm (3.12)
Using ollama:0.3.6-rocm seems to address the issue. Setting OLLAMA_GPU_OVERHEAD would occasionally work if I set it to ludicrously high values (20 GB), but the end result was the model would (unsurprisingly) run effectively entirely on the CPU. It would only work if I used really high values like that.
@ross-rosario commented on GitHub (Oct 6, 2024):
So obviously one of the changes here v0.3.6...v0.3.8 introduced that bug, but hard for my untrained eyes to spot it.
Might as well be narrowed down to v0.3.6...v0.3.7, but I didn't test v0.3.7.
@unclemusclez commented on GitHub (Oct 7, 2024):
this could also be a rocm 6.2.X issue. i think the timeline matches upapparently not@olly1240 commented on GitHub (Oct 7, 2024):
This happens also on 6.0.2 so it might not be that
@unclemusclez commented on GitHub (Oct 7, 2024):
does anyone know if this exists on the most recent versions of llama.cpp?
@olly1240 commented on GitHub (Oct 7, 2024):
I quickly ran koboldcpp-rocm with a model that gives me segfaults on ollama and it seems it's working
@unclemusclez commented on GitHub (Oct 8, 2024):
i ran
sudo apt install libclblast-devand it is working. i recompiled from source.gfx906@dhiltgen commented on GitHub (Oct 8, 2024):
It sounds like some people seeing the regression are able to run v0.3.6. Does anyone have confirmation the regression was in v0.3.7? If so, it may be this commit, which brought back a compile flag that fixed problems for some people with multiple radeon cards, but if that's causing regressions for a broader set of people, maybe we need to back that out.
0b03b9c32f@olly1240 commented on GitHub (Oct 9, 2024):
Can confirm. ollama 3.6 works, 3.7 segfaults
@unclemusclez commented on GitHub (Oct 9, 2024):
I've notice that the normal install for Ollama AMD will install it's own version of
lib/rocm. When deleted, I was able to use up to Ollama 0.3.9 when compiled from source.I managed to get Ollama 0.3.12 to run, yesterday from source. it Ran 70B, but after i restarted my computer, i had to revert back to 0.3.9.
I mentioned similar issues above a couple of weeks ago where it seems like this is an environment variable issue. I don't understand how newer versions would be able to run otherwise. Perhaps during the build phase the environment is properly set, but upon rebuild, or environment restart, the variable gets lost?
I just want to confirm, i currently have 0.3.9 (compiled from source and with the 0.3.6 lib folder deleted) working live on Ubuntu 24.04 with ROCm 6.2.1.
0.3.6 works out of the box, and I BELIEVE i managed to get 0.3.12 to work, but i cant replicate this.
@dhiltgen commented on GitHub (Oct 9, 2024):
There might be multiple issues lurking in here. The fact that OLLAMA_GPU_OVERHEAD doesn't seem to help implies it's not a simple over-allocation, but a more subtle crash in llama.cpp or ROCm, possibly specific to a combination of ROCm versions and driver versions.
I haven't managed to reproduce so far. What would be helpful to try to narrow this down - could folks who are seeing these crashes enumerate the following details so we can try to see if we can repro?
ollama run somemodelsufficient to trigger the crashThe volume of logs is intense, but it may also be helpful to run the server with
AMD_LOG_LEVEL=3and share the logs around the time of the crash in case there's any interesting warnings/errors from the ROCm logging.Please make sure you're running the latest Ollama version so we're not chasing ghosts that were already fixed.
@paulchevalier commented on GitHub (Oct 10, 2024):
Didn't comment earlier, but I believe I have been getting this issue as well
-Environment variables in the systemd file: HSA_OVERRIDE_GFX_VERSION=10.3.0 OLLAMA_DEBUG=1 AMD_LOG_LEVEL=3
Please find attached the log I get when trying to run llama3.2:1b
ollama.log
From what I understand in the log, the crash happens after just after the model is done loading in the GPU.
All these model run fine in CPU (albeit slowly) if I remove the HSA_OVERRIDE_GFX_VERSION variable to disable the GPU.
@weak-kajuma commented on GitHub (Oct 10, 2024):
I have same issue.
My environment:
I tried some versions with docker in my environment.
Using models:
I wish it run on latest version.
@unclemusclez commented on GitHub (Oct 11, 2024):
0.3.13 branch Ubuntu 24.04 ROCm 6.2.1 Failed
@ross-rosario commented on GitHub (Oct 12, 2024):
ollama v0.3.12Arch Linux
RX 7900 XTX
32 GB
Kernel bundled driver. Kernel v6.11.3.
BYO, v6.0.2. But the issue is reroducible with the official ollama build with its own rocm.
Just running this without custom parameters causes the crash
ollama run command-r:35b-08-2024-q4_K_MNope.
Logs below
@paulchevalier commented on GitHub (Oct 12, 2024):
Is this solved for everyone? I am still getting the same error.
I just tried with the git version (by using AUR package ollama-rocm-git)
In parrallel, I updated to 32GB system ram so that I could run models on the CPU while doing other things and I am still getting the error.
I can still run many models on CPU.
ollama_2024_10_12.log
@unclemusclez commented on GitHub (Oct 12, 2024):
@paulchevalier this is solved in the main branch currently when you git pull.
@dhiltgen commented on GitHub (Oct 14, 2024):
It's possible there might be a couple distinct root causes in here, but the main issue should be resolved in 0.3.14 when we release that in the coming days. If anyone still hits their failures on that release let us know and we'll continue to investigate.
@walmartbaggg commented on GitHub (Oct 18, 2024):
I am currently getting the same issues with the Instinct Mi60, I can run deepseek v2 16b, and many other models. But once it reaches 27b (tested, gemma 27b) it causes segmentation error.
I did not try old versions, but I will. I have also tried to set OLLAMA_GPU_OVERHEAD. No difference.
latest ollama version*
@walmartbaggg commented on GitHub (Oct 18, 2024):
Oops didnt see this. Hopefully it does fix it.
@ross-rosario commented on GitHub (Oct 20, 2024):
Ollama
0.3.14-rc0works great! Thanks @dhiltgen !