mirror of
https://github.com/ollama/ollama.git
synced 2026-05-06 08:02:14 -05:00
[GH-ISSUE #8571] running deepseek r1 671b on 64GB / 128GB ram mac gives Error: llama runner process has terminated: signal: killed
#52047
Closed
opened 2026-04-28 21:43:50 -05:00 by GiteaMirror
·
38 comments
No Branch/Tag Specified
main
parth-mlx-decode-checkpoints
dhiltgen/ci
hoyyeva/editor-config-repair
parth-launch-codex-app
hoyyeva/fix-codex-model-metadata-warning
hoyyeva/qwen
hoyyeva/launch-backup-ux
parth/hide-claude-desktop-till-release
hoyyeva/opencode-image-modality
parth-add-claude-code-autoinstall
release_v0.22.0
pdevine/manifest-list
codex/fix-codex-model-metadata-warning
pdevine/addressable-manifest
brucemacd/launch-fetch-reccomended
jmorganca/llama-compat
launch-copilot-cli
hoyyeva/opencode-thinking
release_v0.20.7
parth-auto-save-backup
parth-test
jmorganca/gemma4-audio-replacements
fix-manifest-digest-on-pull
hoyyeva/vscode-improve
brucemacd/install-server-wait
brucemacd/download-before-remove
parth/update-claude-docs
parth-anthropic-reference-images-path
brucemac/start-ap-install
pdevine/mlx-update
pdevine/qwen35_vision
drifkin/api-show-fallback
mintlify/image-generation-1773352582
hoyyeva/server-context-length-local-config
jmorganca/faster-reptition-penalties
jmorganca/convert-nemotron
parth-pi-thinking
pdevine/sampling-penalties
jmorganca/fix-create-quantization-memory
dongchen/resumable_transfer_fix
pdevine/sampling-cache-error
jessegross/mlx-usage
hoyyeva/openclaw-config
hoyyeva/app-html
pdevine/qwen3next
brucemacd/sign-sh-install
brucemacd/tui-update
brucemacd/usage-api
jmorganca/launch-empty
fix-app-dist-embed
mxyng/mlx-compile
mxyng/mlx-quant
mxyng/mlx-glm4.7
mxyng/mlx
brucemacd/simplify-model-picker
jmorganca/qwen3-concurrent
fix-glm-4.7-flash-mla-config
drifkin/qwen3-coder-opening-tag
brucemacd/usage-cli
fix-cuda12-fattn-shmem
ollama-imagegen-docs
parth/fix-multiline-inputs
brucemacd/config-docs
mxyng/model-files
mxyng/simple-execute
fix-imagegen-ollama-models
mxyng/async-upload
jmorganca/lazy-no-dtype-changes
imagegen-auto-detect-create
parth/decrease-concurrent-download-hf
fix-mlx-quantize-init
jmorganca/x-cleanup
usage
imagegen-readme
jmorganca/glm-image
mlx-gpu-cd
jmorganca/imagegen-modelfile
parth/agent-skills
parth/agent-allowlist
parth/signed-in-offline
parth/agents
parth/fix-context-chopping
improve-cloud-flow
parth/add-models-websearch
parth/prompt-renderer-mcp
jmorganca/native-settings
jmorganca/download-stream-hash
jmorganca/client2-rebased
brucemacd/oai-chat-req-multipart
jessegross/multi_chunk_reserve
grace/additional-omit-empty
grace/mistral-3-large
mxyng/tokenizer2
mxyng/tokenizer
jessegross/flash
hoyyeva/windows-nacked-app
mxyng/cleanup-attention
grace/deepseek-parser
hoyyeva/remember-unsent-prompt
parth/add-lfs-pointer-error-conversion
parth/olmo2-test2
hoyyeva/ollama-launchagent-plist
nicole/olmo-model
parth/olmo-test
mxyng/remove-embedded
parth/render-template
jmorganca/intellect-3
parth/remove-prealloc-linter
jmorganca/cmd-eval
nicole/nomic-embed-text-fix
mxyng/lint-2
hoyyeva/add-gemini-3-pro-preview
hoyyeva/load-model-list
mxyng/expand-path
mxyng/environ-2
hoyyeva/deeplink-json-encoding
parth/improve-tool-calling-tests
hoyyeva/conversation
hoyyeva/assistant-edit-response
hoyyeva/thinking
origin/brucemacd/invalid-char-i-err
parth/improve-tool-calling
jmorganca/required-omitempty
grace/qwen3-vl-tests
mxyng/iter-client
parth/docs-readme
nicole/embed-test
pdevine/integration-benchstat
parth/remove-generate-cmd
parth/add-toolcall-id
mxyng/server-tests
jmorganca/glm-4.6
jmorganca/gin-h-compat
drifkin/stable-tool-args
pdevine/qwen3-more-thinking
parth/add-websearch-client
nicole/websearch_local
jmorganca/qwen3-coder-updates
grace/deepseek-v3-migration-tests
mxyng/fix-create
jmorganca/cloud-errors
pdevine/parser-tidy
revert-12233-parth/simplify-entrypoints-runner
parth/enable-so-gpt-oss
brucemacd/qwen3vl
jmorganca/readme-simplify
parth/gpt-oss-structured-outputs
revert-12039-jmorganca/tools-braces
mxyng/embeddings
mxyng/gguf
mxyng/benchmark
mxyng/types-null
parth/move-parsing
mxyng/gemma2
jmorganca/docs
mxyng/16-bit
mxyng/create-stdin
pdevine/authorizedkeys
mxyng/quant
parth/opt-in-error-context-window
brucemacd/cache-models
brucemacd/runner-completion
jmorganca/llama-update-6
brucemacd/benchmark-list
brucemacd/partial-read-caps
parth/deepseek-r1-tools
mxyng/omit-array
parth/tool-prefix-temp
brucemacd/runner-test
jmorganca/qwen25vl
brucemacd/model-forward-test-ext
parth/python-function-parsing
jmorganca/cuda-compression-none
drifkin/num-parallel
drifkin/chat-truncation-fix
jmorganca/sync
parth/python-tools-calling
drifkin/array-head-count
brucemacd/create-no-loop
parth/server-enable-content-stream-with-tools
qwen25omni
mxyng/v3
brucemacd/ropeconfig
jmorganca/silence-tokenizer
parth/sample-so-test
parth/sampling-structured-outputs
brucemacd/doc-go-engine
parth/constrained-sampling-json
jmorganca/mistral-wip
brucemacd/mistral-small-convert
parth/sample-unmarshal-json-for-params
brucemacd/jomorganca/mistral
pdevine/bfloat16
jmorganca/mistral
brucemacd/mistral
pdevine/logging
parth/sample-correctness-fix
parth/sample-fix-sorting
jmorgan/sample-fix-sorting-extras
jmorganca/temp-0-images
brucemacd/parallel-embed-models
brucemacd/shim-grammar
jmorganca/fix-gguf-error
bmizerany/nameswork
jmorganca/faster-releases
bmizerany/validatenames
brucemacd/err-no-vocab
brucemacd/rope-config
brucemacd/err-hint
brucemacd/qwen2_5
brucemacd/logprobs
brucemacd/new_runner_graph_bench
progress-flicker
brucemacd/forward-test
brucemacd/go_qwen2
pdevine/gemma2
jmorganca/add-missing-symlink-eval
mxyng/next-debug
parth/set-context-size-openai
brucemacd/next-bpe-bench
brucemacd/next-bpe-test
brucemacd/new_runner_e2e
brucemacd/new_runner_qwen2
pdevine/convert-cohere2
brucemacd/convert-cli
parth/log-probs
mxyng/next-mlx
mxyng/cmd-history
parth/templating
parth/tokenize-detokenize
brucemacd/check-key-register
bmizerany/grammar
jmorganca/vendor-081b29bd
mxyng/func-checks
jmorganca/fix-null-format
parth/fix-default-to-warn-json
jmorganca/qwen2vl
jmorganca/no-concat
parth/cmd-cleanup-SO
brucemacd/check-key-register-structured-err
parth/openai-stream-usage
parth/fix-referencing-so
stream-tools-stop
jmorganca/degin-1
brucemacd/install-path-clean
brucemacd/push-name-validation
brucemacd/browser-key-register
jmorganca/openai-fix-first-message
jmorganca/fix-proxy
jessegross/sample
parth/disallow-streaming-tools
dhiltgen/remove_submodule
jmorganca/ga
jmorganca/mllama
pdevine/newlines
pdevine/geems-2b
jmorganca/llama-bump
mxyng/modelname-7
mxyng/gin-slog
mxyng/modelname-6
jyan/convert-prog
jyan/quant5
paligemma-support
pdevine/import-docs
jmorganca/openai-context
jyan/paligemma
jyan/p2
jyan/palitest
bmizerany/embedspeedup
jmorganca/llama-vit
brucemacd/allow-ollama
royh/ep-methods
royh/whisper
mxyng/api-models
mxyng/fix-memory
jyan/q4_4/8
jyan/ollama-v
royh/stream-tools
roy-embed-parallel
bmizerany/hrm
revert-5963-revert-5924-mxyng/llama3.1-rope
royh/embed-viz
jyan/local2
jyan/auth
jyan/local
jyan/parse-temp
jmorganca/template-mistral
jyan/reord-g
royh-openai-suffixdocs
royh-imgembed
royh-embed-parallel
jyan/quant4
royh-precision
jyan/progress
pdevine/fix-template
jyan/quant3
pdevine/ggla
mxyng/update-registry-domain
jmorganca/ggml-static
mxyng/create-context
jyan/v0.146
mxyng/layers-from-files
build_dist
bmizerany/noseek
royh-ls
royh-name
timeout
mxyng/server-timestamp
bmizerany/nosillyggufslurps
royh-params
jmorganca/llama-cpp-7c26775
royh-openai-delete
royh-show-rigid
jmorganca/enable-fa
jmorganca/no-error-template
jyan/format
royh-testdelete
bmizerany/fastverify
language_support
pdevine/ps-glitches
brucemacd/tokenize
bruce/iq-quants
bmizerany/filepathwithcoloninhost
mxyng/split-bin
bmizerany/client-registry
jmorganca/if-none-match
native
jmorganca/native
jmorganca/batch-embeddings
jmorganca/initcmake
jmorganca/mm
pdevine/showggmlinfo
modenameenforcealphanum
bmizerany/modenameenforcealphanum
jmorganca/done-reason
jmorganca/llama-cpp-8960fe8
ollama.com
bmizerany/filepathnobuild
bmizerany/types/model/defaultfix
rmdisplaylong
nogogen
bmizerany/x
modelfile-readme
bmizerany/replacecolon
jmorganca/limit
jmorganca/execstack
jmorganca/replace-assets
mxyng/tune-concurrency
jmorganca/testing
whitespace-detection
jmorganca/options
upgrade-all
scratch
cuda-search
mattw/airenamer
mattw/allmodelsonhuggingface
mattw/quantcontext
mattw/whatneedstorun
brucemacd/llama-mem-calc
mattw/faq-context
mattw/communitylinks
mattw/noprune
mattw/python-functioncalling
rename
mxyng/install
pulse
remove-first
editor
mattw/selfqueryingretrieval
cgo
mattw/howtoquant
api
matt/streamingapi
format-config
mxyng/extra-args
shell
update-nous-hermes
cp-model
upload-progress
fix-unknown-model
fix-model-names
delete-fix
insecure-registry
ls
deletemodels
progressbar
readme-updates
license-layers
skip-list
list-models
modelpath
matt/examplemodelfiles
distribution
go-opts
v0.23.1
v0.23.1-rc0
v0.23.0
v0.23.0-rc0
v0.22.1
v0.22.1-rc1
v0.22.1-rc0
v0.22.0
v0.22.0-rc1
v0.21.3-rc0
v0.21.2-rc1
v0.21.2
v0.21.2-rc0
v0.21.1
v0.21.1-rc1
v0.21.1-rc0
v0.21.0
v0.21.0-rc1
v0.21.0-rc0
v0.20.8-rc0
v0.20.7
v0.20.7-rc1
v0.20.7-rc0
v0.20.6
v0.20.6-rc1
v0.20.6-rc0
v0.20.5
v0.20.5-rc2
v0.20.5-rc1
v0.20.5-rc0
v0.20.4
v0.20.4-rc2
v0.20.4-rc1
v0.20.4-rc0
v0.20.3
v0.20.3-rc0
v0.20.2
v0.20.1
v0.20.1-rc2
v0.20.1-rc1
v0.20.1-rc0
v0.20.0
v0.20.0-rc1
v0.20.0-rc0
v0.19.0
v0.19.0-rc2
v0.19.0-rc1
v0.19.0-rc0
v0.18.4-rc1
v0.18.4-rc0
v0.18.3
v0.18.3-rc2
v0.18.3-rc1
v0.18.3-rc0
v0.18.2
v0.18.2-rc1
v0.18.2-rc0
v0.18.1
v0.18.1-rc1
v0.18.1-rc0
v0.18.0
v0.18.0-rc2
v0.18.0-rc1
v0.18.0-rc0
v0.17.8-rc4
v0.17.8-rc3
v0.17.8-rc2
v0.17.8-rc1
v0.17.8-rc0
v0.17.7
v0.17.7-rc2
v0.17.7-rc1
v0.17.7-rc0
v0.17.6
v0.17.5
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.1-rc2
v0.17.1-rc1
v0.17.1-rc0
v0.17.0
v0.17.0-rc2
v0.17.0-rc1
v0.17.0-rc0
v0.16.3
v0.16.3-rc2
v0.16.3-rc1
v0.16.3-rc0
v0.16.2
v0.16.2-rc0
v0.16.1
v0.16.0
v0.16.0-rc2
v0.16.0-rc0
v0.16.0-rc1
v0.15.6
v0.15.5
v0.15.5-rc5
v0.15.5-rc4
v0.15.5-rc3
v0.15.5-rc2
v0.15.5-rc1
v0.15.5-rc0
v0.15.4
v0.15.3
v0.15.2
v0.15.1
v0.15.1-rc1
v0.15.1-rc0
v0.15.0-rc6
v0.15.0
v0.15.0-rc5
v0.15.0-rc4
v0.15.0-rc3
v0.15.0-rc2
v0.15.0-rc1
v0.15.0-rc0
v0.14.3
v0.14.3-rc3
v0.14.3-rc2
v0.14.3-rc1
v0.14.3-rc0
v0.14.2
v0.14.2-rc1
v0.14.2-rc0
v0.14.1
v0.14.0-rc11
v0.14.0
v0.14.0-rc10
v0.14.0-rc9
v0.14.0-rc8
v0.14.0-rc7
v0.14.0-rc6
v0.14.0-rc5
v0.14.0-rc4
v0.14.0-rc3
v0.14.0-rc2
v0.14.0-rc1
v0.14.0-rc0
v0.13.5
v0.13.5-rc1
v0.13.5-rc0
v0.13.4-rc2
v0.13.4
v0.13.4-rc1
v0.13.4-rc0
v0.13.3
v0.13.3-rc1
v0.13.3-rc0
v0.13.2
v0.13.2-rc2
v0.13.2-rc1
v0.13.2-rc0
v0.13.1
v0.13.1-rc2
v0.13.1-rc1
v0.13.1-rc0
v0.13.0
v0.13.0-rc0
v0.12.11
v0.12.11-rc1
v0.12.11-rc0
v0.12.10
v0.12.10-rc1
v0.12.10-rc0
v0.12.9-rc0
v0.12.9
v0.12.8
v0.12.8-rc0
v0.12.7
v0.12.7-rc1
v0.12.7-rc0
v0.12.7-citest0
v0.12.6
v0.12.6-rc1
v0.12.6-rc0
v0.12.5
v0.12.5-rc0
v0.12.4
v0.12.4-rc7
v0.12.4-rc6
v0.12.4-rc5
v0.12.4-rc4
v0.12.4-rc3
v0.12.4-rc2
v0.12.4-rc1
v0.12.4-rc0
v0.12.3
v0.12.2
v0.12.2-rc0
v0.12.1
v0.12.1-rc1
v0.12.1-rc2
v0.12.1-rc0
v0.12.0
v0.12.0-rc1
v0.12.0-rc0
v0.11.11
v0.11.11-rc3
v0.11.11-rc2
v0.11.11-rc1
v0.11.11-rc0
v0.11.10
v0.11.9
v0.11.9-rc0
v0.11.8
v0.11.8-rc0
v0.11.7-rc1
v0.11.7-rc0
v0.11.7
v0.11.6
v0.11.6-rc0
v0.11.5-rc4
v0.11.5-rc3
v0.11.5
v0.11.5-rc5
v0.11.5-rc2
v0.11.5-rc1
v0.11.5-rc0
v0.11.4
v0.11.4-rc0
v0.11.3
v0.11.3-rc0
v0.11.2
v0.11.1
v0.11.0-rc0
v0.11.0-rc1
v0.11.0-rc2
v0.11.0
v0.10.2-int1
v0.10.1
v0.10.0
v0.10.0-rc4
v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
v0.10.0-rc0
v0.9.7-rc1
v0.9.7-rc0
v0.9.6
v0.9.6-rc0
v0.9.6-ci0
v0.9.5
v0.9.4-rc5
v0.9.4-rc6
v0.9.4
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc1
v0.9.4-rc2
v0.9.4-rc0
v0.9.3
v0.9.3-rc5
v0.9.4-citest0
v0.9.3-rc4
v0.9.3-rc3
v0.9.3-rc2
v0.9.3-rc1
v0.9.3-rc0
v0.9.2
v0.9.1
v0.9.1-rc1
v0.9.1-rc0
v0.9.1-ci1
v0.9.1-ci0
v0.9.0
v0.9.0-rc0
v0.8.0
v0.8.0-rc0
v0.7.1-rc2
v0.7.1
v0.7.1-rc1
v0.7.1-rc0
v0.7.0
v0.7.0-rc1
v0.7.0-rc0
v0.6.9-rc0
v0.6.8
v0.6.8-rc0
v0.6.7
v0.6.7-rc2
v0.6.7-rc1
v0.6.7-rc0
v0.6.6
v0.6.6-rc2
v0.6.6-rc1
v0.6.6-rc0
v0.6.5-rc1
v0.6.5
v0.6.5-rc0
v0.6.4-rc0
v0.6.4
v0.6.3-rc1
v0.6.3
v0.6.3-rc0
v0.6.2
v0.6.2-rc0
v0.6.1
v0.6.1-rc0
v0.6.0-rc0
v0.6.0
v0.5.14-rc0
v0.5.13
v0.5.13-rc6
v0.5.13-rc5
v0.5.13-rc4
v0.5.13-rc3
v0.5.13-rc2
v0.5.13-rc1
v0.5.13-rc0
v0.5.12
v0.5.12-rc1
v0.5.12-rc0
v0.5.11
v0.5.10
v0.5.9
v0.5.9-rc0
v0.5.8-rc13
v0.5.8
v0.5.8-rc12
v0.5.8-rc11
v0.5.8-rc10
v0.5.8-rc9
v0.5.8-rc8
v0.5.8-rc7
v0.5.8-rc6
v0.5.8-rc5
v0.5.8-rc4
v0.5.8-rc3
v0.5.8-rc2
v0.5.8-rc1
v0.5.8-rc0
v0.5.7
v0.5.6
v0.5.5
v0.5.5-rc0
v0.5.4
v0.5.3
v0.5.3-rc0
v0.5.2
v0.5.2-rc3
v0.5.2-rc2
v0.5.2-rc1
v0.5.2-rc0
v0.5.1
v0.5.0
v0.5.0-rc1
v0.4.8-rc0
v0.4.7
v0.4.6
v0.4.5
v0.4.4
v0.4.3
v0.4.3-rc0
v0.4.2
v0.4.2-rc1
v0.4.2-rc0
v0.4.1
v0.4.1-rc0
v0.4.0
v0.4.0-rc8
v0.4.0-rc7
v0.4.0-rc6
v0.4.0-rc5
v0.4.0-rc4
v0.4.0-rc3
v0.4.0-rc2
v0.4.0-rc1
v0.4.0-rc0
v0.4.0-ci3
v0.3.14
v0.3.14-rc0
v0.3.13
v0.3.12
v0.3.12-rc5
v0.3.12-rc4
v0.3.12-rc3
v0.3.12-rc2
v0.3.12-rc1
v0.3.11
v0.3.11-rc4
v0.3.11-rc3
v0.3.11-rc2
v0.3.11-rc1
v0.3.10
v0.3.10-rc1
v0.3.9
v0.3.8
v0.3.7
v0.3.7-rc6
v0.3.7-rc5
v0.3.7-rc4
v0.3.7-rc3
v0.3.7-rc2
v0.3.7-rc1
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.8
v0.2.8-rc2
v0.2.8-rc1
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.2-rc2
v0.2.2-rc1
v0.2.1
v0.2.0
v0.1.49-rc14
v0.1.49-rc13
v0.1.49-rc12
v0.1.49-rc11
v0.1.49-rc10
v0.1.49-rc9
v0.1.49-rc8
v0.1.49-rc7
v0.1.49-rc6
v0.1.49-rc4
v0.1.49-rc5
v0.1.49-rc3
v0.1.49-rc2
v0.1.49-rc1
v0.1.48
v0.1.47
v0.1.46
v0.1.45-rc5
v0.1.45
v0.1.45-rc4
v0.1.45-rc3
v0.1.45-rc2
v0.1.45-rc1
v0.1.44
v0.1.43
v0.1.42
v0.1.41
v0.1.40
v0.1.40-rc1
v0.1.39
v0.1.39-rc2
v0.1.39-rc1
v0.1.38
v0.1.37
v0.1.36
v0.1.35
v0.1.35-rc1
v0.1.34
v0.1.34-rc1
v0.1.33
v0.1.33-rc7
v0.1.33-rc6
v0.1.33-rc5
v0.1.33-rc4
v0.1.33-rc3
v0.1.33-rc2
v0.1.33-rc1
v0.1.32
v0.1.32-rc2
v0.1.32-rc1
v0.1.31
v0.1.30
v0.1.29
v0.1.28
v0.1.27
v0.1.26
v0.1.25
v0.1.24
v0.1.23
v0.1.22
v0.1.21
v0.1.20
v0.1.19
v0.1.18
v0.1.17
v0.1.16
v0.1.15
v0.1.14
v0.1.13
v0.1.12
v0.1.11
v0.1.10
v0.1.9
v0.1.8
v0.1.7
v0.1.6
v0.1.5
v0.1.4
v0.1.3
v0.1.2
v0.1.1
v0.1.0
v0.0.21
v0.0.20
v0.0.19
v0.0.18
v0.0.17
v0.0.16
v0.0.15
v0.0.14
v0.0.13
v0.0.12
v0.0.11
v0.0.10
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
Labels
Clear labels
amd
api
app
bug
build
cli
cloud
compatibility
context-length
create
docker
documentation
embeddings
feature request
feedback wanted
good first issue
gpt-oss
gpu
harmony
help wanted
image
install
intel
js
launch
linux
macos
memory
mlx
model
needs more info
networking
nvidia
ollama.com
performance
pull-request
python
question
registry
rendering
thinking
tools
top
vulkan
windows
wsl
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/ollama#52047
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @duttaoindril on GitHub (Jan 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8571
What is the issue?
after waiting all day for the model to download,
ollama run deepseek-r1:671bfails to run with the errorError: llama runner process has terminated: signal: killed.I can run the deepseek-r1:70b llama model just fine.
I'm running a Macbook M3 Pro 64GB ram, I'm assuming it's failing due to lack of memory?
OS
macOS
GPU
Apple
CPU
Apple
Ollama version
0.5.7
@rick-github commented on GitHub (Jan 25, 2025):
You will need to add a lot of swap to run this model.
@rick-github commented on GitHub (Jan 25, 2025):
It would be interesting to see the server logs because I would have expected ollama to refuse to load the model in the face of insufficient resources.
@duttaoindril commented on GitHub (Jan 25, 2025):
copied as much as I could
@duttaoindril commented on GitHub (Jan 25, 2025):
lol how do I add almost 350gb of swap?
i don't think I have even that much storage left 😅
@rick-github commented on GitHub (Jan 25, 2025):
So 322G in system RAM and 63G in GPU. I believe that MacOS has dynamically allocated swap so it grows automatically, which I guess is why ollama didn't reject it outright. Maybe that's why it eventually failed though, it may have hit some system limit.
signal: killedseems like an active policy took the runner out. Do you have system logs for kernel issues on MacOS?@rick-github commented on GitHub (Jan 25, 2025):
Or maybe not, free swap was reported as 0. I'm afraid virtual memory management on MacOS is a mystery to me, you'll have to dig out the manual or do some internet searches to figure out how to expand your swap to get the model loaded.
@SeekPoint commented on GitHub (Jan 26, 2025):
Can I run ds-r1 651GB on 4*2080ti@22GB cards and 512GB cpu memory?
@rick-github commented on GitHub (Jan 26, 2025):
Theoretically, yes. It won't be very fast, though.
@neuhaus commented on GitHub (Jan 28, 2025):
@duttaoindril what outcome do your expect? You have way to little memory for this huge model. Running it with using swap works in theory, but inference will be slow to the extreme, completely unusable. You are wasting your (and everyone else's) time.
To answer your question, this model has 671 billion weights and q4 means they are ~4bit each so 336GB RAM is the amount of memory required just to load the model, but you also need more RAM for context.
@fserb commented on GitHub (Jan 28, 2025):
I don't think that's what's going on. This is not just a "there's not enough RAM you are going to swap and being killed" issue.
I got a compressed model from https://huggingface.co/unsloth/DeepSeek-R1-GGUF (it's 150GB instead of 404GB). I had to merge it (because olama doesn't support split GGUF yet) but then, if I try to "ollama run" it, I get the same error as OP (
"Error: llama runner process has terminated: signal: killed") with similar logs.But if I ran the same model file with llama-cpp (actual line:
llama-cli --model DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf --cache-type-k q4_0 --threads 12 -no-cnv --n-gpu-layers 7 --prio 2 --temp 0.6 --ctx-size 8192 --seed 3407) it works fine on the same machine. (By fine I mean a couple tokens per second).olama killed logs
llama-cpp logs
I'm not 100% sure, but it seems to me that olama is choosing the wrong number of layers to offload? (Which then may be causing the auto-kill).
@rick-github commented on GitHub (Jan 28, 2025):
ollama estimates the number of layers it can offload, and passes that figure to llama.cpp, which actually allocates the memory. Each architecture has it's unique way of doing things, and not all of them are encoded in ollama's memory calculations. As these models get larger and larger, what was previously a minor overage is magnified to a large overage, killing the runner when it tries to allocate memory.
As a test, If you limit the layer count in ollama with
num_gpu: 7then the model should load. Conversely, if your try to run llama-cpp with--n-gpu-layers 21then it should die.@fserb commented on GitHub (Jan 28, 2025):
I thought
num_gpuhad been removed. Is it just from the documentation?@rick-github commented on GitHub (Jan 28, 2025):
The documentation is going through a refresh,
num_gpuwas removed from the modelfile documentation ine54a3c7fcdbut it's still a configurable parameter.@fserb commented on GitHub (Jan 28, 2025):
llama-cpp with
--n-gpu-layers 21does crash with oom but it has a more explicit error:olama with
PARAMETER num_gpu 7still crashes even though the "Metal model buffer size" is much smaller than total RAM (64GB). It also crashes withnum_gpu:1Could it be that olama is using the merged version and llama-cpp is using the sliced one?
@fserb commented on GitHub (Jan 28, 2025):
I wonder if this has something to do with "CPU model buffer size", i.e., olama is still loading it in RAM? As opposed to the llama-cpp log where RAM is much smaller?
@rick-github commented on GitHub (Jan 28, 2025):
Does ollama run it if
num_gpu:0?@fserb commented on GitHub (Jan 29, 2025):
crashes.
Is there an assumption somewhere that the model must at least fit in RAM?
@rick-github commented on GitHub (Jan 29, 2025):
OK, ruled out GPU as a problem. I see that llama-cpp is being run with quantized 8192 cache, is that the same for ollama? Can you try setting
OLLAMA_DEBUG=1in the ollama server environment? I'm not sure if it will show anything useful but worth a try.There's a calculation that size of model + size of context + size of graph < size of free ram + size of free swap. This is why i was surprised earlier that ollama went ahead and tried to load the model when on the face of it, there wasn't enough free stuff. But I'm not a Mac person so I don't know the underlying details.
@fserb commented on GitHub (Jan 29, 2025):
I'm not sure how to set the cache. I don't see anything on the logs about KV cache init.
I did
OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q4_0 OLLAMA_DEBUG=1 ollama servebut then:
@rick-github commented on GitHub (Jan 29, 2025):
Could you add a full ollama log? The log snippets so far are missing details on device detection, memory calculations, context size etc that could be helpful.
@Auto-Rooter commented on GitHub (Jan 29, 2025):
Got the same issue on MackBook Pro, M3 Max, 128 GB RAM with 600 GB free capacity.
@duttaoindril commented on GitHub (Jan 29, 2025):
https://news.ycombinator.com/item?id=42850222
Is it possible to add a 1.58Bit 671b modelfile on ollama?
They describe it as:
@rick-github commented on GitHub (Jan 29, 2025):
#8624
@fserb commented on GitHub (Jan 29, 2025):
(Still from the local merged file)
log
@rick-github commented on GitHub (Jan 29, 2025):
The system has 50G free RAM and 0G free swap and wants to use 172G to load the model. I had a look at the code and it turns out that MacOS explicitly doesn't do the calculation I mentioned earlier, because "Darwin has fully dynamic swap".
The runner didn't log anything about running out of memory and the server just noted that "llama runner process has terminated: signal: killed", ie the runner received a SIGKILL signal. This leads me to think that an external actor killed the runner, for example the OS may have decided that the runner was asking for too much memory, or it wasn't able to expand the dynamic swap fast enough, or some other kernel level mechanism kicked in and decided to terminate the runner. These sorts of events are usually logged somewhere, in MacOS it might be in the Console.
@fserb commented on GitHub (Jan 29, 2025):
Yeah, yeah. That's definitely true. It does get killed by the OS due to memory usage.
But the issue is that llama-cpp realizes that and doesn't try to load the whole model in RAM? Also it uses the KV-cache?
@rick-github commented on GitHub (Jan 29, 2025):
ollama is loading the model no-mmap. Try
@teekuningas commented on GitHub (Jan 29, 2025):
Just to hop in with my 250G RAM:
This and much bigger models run fine with llama.cpp. I guess it does the memory mapping from the .gguf file and does not load all the weights simultaneously on RAM?
Just to ask, is it by design that ollama has this kind of hard limit for memory or am I just missing something? I really love ollama for it's ability to juggle with different models and the fact that llama.cpp is used underneath would suggest that there would be a chance for ollama to use the same smart mechanism llama.cpp uses with limited RAM.
"@ duttaoindril what outcome do your expect? You have way to little memory for this huge model. Running it with using swap works in theory, but inference will be slow to the extreme, completely unusable. You are wasting your (and everyone else's) time."
@neuhaus: generating with a model that does not fit into RAM is not useless. It might take like 2 tokens / s, but all tasks aren't like code completion where you need lightning speed. For some tasks you may be happy to wait for a few minutes to get a good result from a big model instead of getting a bad result with high speed from a lesser model.
@fserb commented on GitHub (Jan 29, 2025):
and I could seed on the logs
after that
ollama run deepseek-r1:iq1_sworked (as in, it executed without crashing). The mmap only works fornum_gpu:0, is that WAI?@rick-github commented on GitHub (Jan 29, 2025):
As in, with
num_gpu:1the server uses mmap (ie, no --no-mmap on the command line) but is still killed by the OS? I don't think that is WAI, but I'd have to check.The way that ollama uses mmap has always bugged me and I've been meaning to go through the code for ages, but never got around to it. The arrival of these ginormous models has given me extra incentive.
@fserb commented on GitHub (Jan 29, 2025):
It doesn't seem to use mmap at all. It goes back to
llm_load_tensors: CPU model buffer size = 133730.06 MiBand something small on the GPU. Then starts loading and crashes the same way as before (at"model load progress 0.70").@fserb commented on GitHub (Jan 29, 2025):
Also, maybe I'm being weird, but the output of the model seems, hmmm, a bit off?
or:
@rick-github commented on GitHub (Jan 29, 2025):
iq1_s is a lot of quantization so I wouldn't be surprised by random output. However, the examples you give are indicative of a missing template: the prompt being sent to the model doesn't have the <User> and <Assistant> tokens that guide its output. Earlier you showed
ollama create deepseek-r1:iq1_s, so it looks like you have a custom Modelfile. Does it have a TEMPLATE field, and if so, what's in it?For comparison, I ran iq1_s out of swap and it worked fine, if slowly.
@sharpe5 commented on GitHub (Feb 1, 2025):
Your best bet is to run DeepSeek R1 Dynamic 1.58-bit as it will fit into 128GB of RAM on a Mac.
This model was released 4 days ago. It selectively quantises some layers to 1.58 bit to generate a model that is 131GB which is an 80% reduction in size. It leaves some layers at 6 bit to avoid model collapse. You are probably having problems as the model is too large to fit into RAM. Instructions to run on Mac:
I just ran this same model using LM Studio on Windows 10 on a E5-2695v3 Xeon with 256GB RAM. There is a version of LM Studio for Mac as well. It worked nicely, and used the advertised amount of RAM. It was running at 1.01 tokens/sec. Offloading 9 layers to my RTX 3090 with 24GB RAM increased this to 1.10 tokens/sec. I prefer the free version of ChatBox as a front end.
How many tokens/sec do you get on your Mac with 128GB RAM, and what is the exact spec?
@ZJL0111 commented on GitHub (Feb 8, 2025):
have you solved it? i encountered the same errror ;my device is 8*A800
@rick-github commented on GitHub (Feb 8, 2025):
https://github.com/ollama/ollama/issues/5975
@afsara-ben commented on GitHub (Mar 17, 2025):
i am still getting
Error: POST predict: Post "http://127.0.0.1:49185/completion": EOFon my 192GB mac ultra, how to fix this?@rick-github commented on GitHub (Mar 17, 2025):
https://github.com/ollama/ollama/issues/5975#issuecomment-2306851184