[GH-ISSUE #434] Crash on M2 Max 32GB RAM when running phind-codellama:34b-q5_K_M #46713

New Issue

GiteaMirror · 2026-04-27T23:38:29-05:00

GiteaMirror commented

2026-04-27 23:38:29 -05:00

Originally created by @spqw on GitHub (Aug 28, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/434

I got a crash with the following logs when running ollama run phind-codellama:34b-q5_K_M on Macbook Pro M2 Max with 32GB memory.

ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =   256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 8 failed with status 5
GGML_ASSERT: ggml-metal.m:1177: false

Please find complete logs below.

Questions

How is ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB set?
Is that a hard limit?
Is there a parameter I can tweak to try and ignore this?

Next actions
I will try a more aggressively quantized version and report here.

Complete logs
When running

ollama run phind-codellama:34b-q5_K_M

Then this happens:

llama.cpp: loading model from /Users/lion/.ollama/models/blobs/sha256:454a488edf6348d320b3ba4bc2fdfc98219312e43589b88194fca8ad9b0f1fd0
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 5504
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_head_kv  = 8
llama_model_load_internal: n_layer    = 48
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 8
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: freq_base  = 100000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 17 (mostly Q5_K - Medium)
llama_model_load_internal: model size = 34B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 23392.87 MB (+  384.00 MB per state)
llama_new_context_with_model: kv self size  =  384.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/opt/homebrew/Cellar/ollama/0.0.16/bin/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x12a407f50
ggml_metal_init: loaded kernel_add_row                        0x12a4084d0
ggml_metal_init: loaded kernel_mul                            0x12a408a10
ggml_metal_init: loaded kernel_mul_row                        0x12a409060
ggml_metal_init: loaded kernel_scale                          0x12a4095a0
ggml_metal_init: loaded kernel_silu                           0x12a409ae0
ggml_metal_init: loaded kernel_relu                           0x12a40a020
ggml_metal_init: loaded kernel_gelu                           0x12a40a560
ggml_metal_init: loaded kernel_soft_max                       0x12a40ac30
ggml_metal_init: loaded kernel_diag_mask_inf                  0x12a40b2b0
ggml_metal_init: loaded kernel_get_rows_f16                   0x12a40b950
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x12a40c110
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x12a40c7b0
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x12a305190
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x12a305950
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x12a305ff0
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x12a306690
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x12a306d30
ggml_metal_init: loaded kernel_rms_norm                       0x12a307410
ggml_metal_init: loaded kernel_norm                           0x12a307d70
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x12a308640
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x12a308d20
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x12a309400
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32               0x12a309c60
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32               0x12a30a340
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32               0x12a30aa20
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32               0x12a30b0e0
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32               0x12a30b9a0
ggml_metal_init: loaded kernel_rope                           0x12a30bee0
ggml_metal_init: loaded kernel_alibi_f32                      0x12a30ca20
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x12a30d2d0
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x12a30db80
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x12a30e310
ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB
ggml_metal_init: hasUnifiedMemory             = true
ggml_metal_init: maxTransferRate              = built-in GPU
llama_new_context_with_model: max tensor size =   205.08 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size = 16384.00 MB, offs =            0
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  6555.28 MB, offs =  16964812800, (22939.73 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =    16.17 MB, (22955.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   386.00 MB, (23341.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'scr0            ' buffer, size =   387.00 MB, (23728.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =   256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 8 failed with status 5
GGML_ASSERT: ggml-metal.m:1177: false
SIGABRT: abort
PC=0x19f024764 m=10 sigcode=0
signal arrived during cgo execution

goroutine 27 [syscall]:
runtime.cgocall(0x102aa5070, 0x14000492e28)
	runtime/cgocall.go:157 +0x44 fp=0x14000492df0 sp=0x14000492db0 pc=0x10247e2d4
github.com/jmorganca/ollama/llm._Cfunc_llama_eval(0x14c013600, 0x140003ff0f0, 0x1, 0x0, 0xc)
	_cgo_gotypes.go:216 +0x34 fp=0x14000492e20 sp=0x14000492df0 pc=0x1028098c4
github.com/jmorganca/ollama/llm.newLlama.func6(0x14000000001?, {0x140003ff0f0, 0x1, 0x0?}, 0x0?)
	github.com/jmorganca/ollama/llm/llama.go:293 +0x7c fp=0x14000492e70 sp=0x14000492e20 pc=0x10280ac1c
github.com/jmorganca/ollama/llm.newLlama({0x14000096150, 0x68}, {0x0, 0x0, 0x0?}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...})
	github.com/jmorganca/ollama/llm/llama.go:293 +0x500 fp=0x140004930e0 sp=0x14000492e70 pc=0x10280aac0
github.com/jmorganca/ollama/llm.New({0x14000096150, 0x68}, {0x0, 0x0, 0x0}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...})
	github.com/jmorganca/ollama/llm/llm.go:70 +0x408 fp=0x14000493270 sp=0x140004930e0 pc=0x102809348
github.com/jmorganca/ollama/server.load(0x140000fe510, 0x102c7d3c0?, 0x14000136ae0?)
	github.com/jmorganca/ollama/server/routes.go:82 +0x39c fp=0x14000493550 sp=0x14000493270 pc=0x102a98a8c
github.com/jmorganca/ollama/server.GenerateHandler(0x14000482100)
	github.com/jmorganca/ollama/server/routes.go:154 +0x2e8 fp=0x14000493760 sp=0x14000493550 pc=0x102a99188
github.com/gin-gonic/gin.(*Context).Next(...)
	github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x14000482100)
	github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x80 fp=0x140004937b0 sp=0x14000493760 pc=0x102a807b0
github.com/gin-gonic/gin.(*Context).Next(...)
	github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0x14000482100)
	github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xb0 fp=0x14000493960 sp=0x140004937b0 pc=0x102a7fb50
github.com/gin-gonic/gin.(*Context).Next(...)
	github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x1400036cd00, 0x14000482100)
	github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x524 fp=0x14000493af0 sp=0x14000493960 pc=0x102a7ec84
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x1400036cd00, {0x102d82050?, 0x140003d40e0}, 0x14000482200)
	github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1a0 fp=0x14000493b30 sp=0x14000493af0 pc=0x102a7e5d0
net/http.serverHandler.ServeHTTP({0x102d801e0?}, {0x102d82050?, 0x140003d40e0?}, 0x6?)
	net/http/server.go:2938 +0xbc fp=0x14000493b60 sp=0x14000493b30 pc=0x10270a38c
net/http.(*conn).serve(0x140000ff0e0, {0x102d83838, 0x14000403ef0})
	net/http/server.go:2009 +0x518 fp=0x14000493fa0 sp=0x14000493b60 pc=0x102706788
net/http.(*Server).Serve.func3()
	net/http/server.go:3086 +0x30 fp=0x14000493fd0 sp=0x14000493fa0 pc=0x10270aaa0
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000493fd0 sp=0x14000493fd0 pc=0x1024e3c04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3086 +0x4cc

goroutine 1 [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x1400051b5b0 sp=0x1400051b590 pc=0x1024b28b8
runtime.netpollblock(0x1400051b648?, 0x2574664?, 0x1?)
	runtime/netpoll.go:564 +0x158 fp=0x1400051b5f0 sp=0x1400051b5b0 pc=0x1024abfa8
internal/poll.runtime_pollWait(0x12a0dfba0, 0x72)
	runtime/netpoll.go:343 +0xa0 fp=0x1400051b620 sp=0x1400051b5f0 pc=0x1024dd7b0
internal/poll.(*pollDesc).wait(0x1400040a680?, 0x0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400051b650 sp=0x1400051b620 pc=0x10256fcc8
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x1400040a680)
	internal/poll/fd_unix.go:611 +0x250 fp=0x1400051b700 sp=0x1400051b650 pc=0x102574750
net.(*netFD).accept(0x1400040a680)
	net/fd_unix.go:172 +0x28 fp=0x1400051b7c0 sp=0x1400051b700 pc=0x1025d5e88
net.(*TCPListener).accept(0x140003e79c0)
	net/tcpsock_posix.go:152 +0x28 fp=0x1400051b7f0 sp=0x1400051b7c0 pc=0x1025e9ef8
net.(*TCPListener).Accept(0x140003e79c0)
	net/tcpsock.go:315 +0x2c fp=0x1400051b830 sp=0x1400051b7f0 pc=0x1025e90dc
net/http.(*onceCloseListener).Accept(0x140000ff0e0?)
	<autogenerated>:1 +0x30 fp=0x1400051b850 sp=0x1400051b830 pc=0x10272c3c0
net/http.(*Server).Serve(0x14000332ff0, {0x102d81e40, 0x140003e79c0})
	net/http/server.go:3056 +0x2b8 fp=0x1400051b980 sp=0x1400051b850 pc=0x10270a748
github.com/jmorganca/ollama/server.Serve({0x102d81e40, 0x140003e79c0}, {0x0, 0x0, 0x0})
	github.com/jmorganca/ollama/server/routes.go:457 +0x6cc fp=0x1400051bc50 sp=0x1400051b980 pc=0x102a9c86c
github.com/jmorganca/ollama/cmd.RunServer(0x1400042e200?, {0x102afd6d9?, 0x4?, 0x102afd699?})
	github.com/jmorganca/ollama/cmd/cmd.go:621 +0x1f4 fp=0x1400051bd10 sp=0x1400051bc50 pc=0x102aa2c04
github.com/spf13/cobra.(*Command).execute(0x140003c5500, {0x103291940, 0x0, 0x0})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x658 fp=0x1400051be50 sp=0x1400051bd10 pc=0x1027b3eb8
github.com/spf13/cobra.(*Command).ExecuteC(0x140003c4c00)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x1400051bf10 sp=0x1400051be50 pc=0x1027b45e0
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/jmorganca/ollama/main.go:11 +0x54 fp=0x1400051bf30 sp=0x1400051bf10 pc=0x102aa47d4
runtime.main()
	runtime/proc.go:267 +0x2bc fp=0x1400051bfd0 sp=0x1400051bf30 pc=0x1024b248c
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x1400051bfd0 sp=0x1400051bfd0 pc=0x1024e3c04

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000056f90 sp=0x14000056f70 pc=0x1024b28b8
runtime.goparkunlock(...)
	runtime/proc.go:404
runtime.forcegchelper()
	runtime/proc.go:322 +0xb8 fp=0x14000056fd0 sp=0x14000056f90 pc=0x1024b2748
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000056fd0 sp=0x14000056fd0 pc=0x1024e3c04
created by runtime.init.6 in goroutine 1
	runtime/proc.go:310 +0x24

goroutine 18 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000052760 sp=0x14000052740 pc=0x1024b28b8
runtime.goparkunlock(...)
	runtime/proc.go:404
runtime.bgsweep(0x0?)
	runtime/mgcsweep.go:321 +0x108 fp=0x140000527b0 sp=0x14000052760 pc=0x10249f0b8
runtime.gcenable.func1()
	runtime/mgc.go:200 +0x28 fp=0x140000527d0 sp=0x140000527b0 pc=0x102493b08
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000527d0 sp=0x140000527d0 pc=0x1024e3c04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:200 +0x6c

goroutine 19 [GC scavenge wait]:
runtime.gopark(0x14000096000?, 0x102c2a228?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000052f50 sp=0x14000052f30 pc=0x1024b28b8
runtime.goparkunlock(...)
	runtime/proc.go:404
runtime.(*scavengerState).park(0x1031d1c00)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000052f80 sp=0x14000052f50 pc=0x10249c8ac
runtime.bgscavenge(0x0?)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000052fb0 sp=0x14000052f80 pc=0x10249ce6c
runtime.gcenable.func2()
	runtime/mgc.go:201 +0x28 fp=0x14000052fd0 sp=0x14000052fb0 pc=0x102493aa8
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000052fd0 sp=0x14000052fd0 pc=0x1024e3c04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:201 +0xac

goroutine 20 [finalizer wait]:
runtime.gopark(0x1400008a820?, 0x1a0?, 0xe8?, 0x65?, 0x10276947c?)
	runtime/proc.go:398 +0xc8 fp=0x14000056580 sp=0x14000056560 pc=0x1024b28b8
runtime.runfinq()
	runtime/mfinal.go:193 +0x108 fp=0x140000567d0 sp=0x14000056580 pc=0x102492bf8
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000567d0 sp=0x140000567d0 pc=0x1024e3c04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:163 +0x80

goroutine 3 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000057730 sp=0x14000057710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000577d0 sp=0x14000057730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000577d0 sp=0x140000577d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 24 [GC worker (idle)]:
runtime.gopark(0x1?, 0x140003d9310?, 0xa8?, 0x37?, 0x102700c48?)
	runtime/proc.go:398 +0xc8 fp=0x14000053730 sp=0x14000053710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000537d0 sp=0x14000053730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000537d0 sp=0x140000537d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 4 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000057f30 sp=0x14000057f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000057fd0 sp=0x14000057f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000057fd0 sp=0x14000057fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 34 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000506730 sp=0x14000506710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140005067d0 sp=0x14000506730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140005067d0 sp=0x140005067d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000506f30 sp=0x14000506f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000506fd0 sp=0x14000506f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000506fd0 sp=0x14000506fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 5 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000058730 sp=0x14000058710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000587d0 sp=0x14000058730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000587d0 sp=0x140000587d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 36 [GC worker (idle)]:
runtime.gopark(0x2625f613a5?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000507730 sp=0x14000507710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140005077d0 sp=0x14000507730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140005077d0 sp=0x140005077d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x2625f5c48b?, 0x3?, 0xe2?, 0xf7?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000507f30 sp=0x14000507f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000507fd0 sp=0x14000507f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000507fd0 sp=0x14000507fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 25 [GC worker (idle)]:
runtime.gopark(0x2625f5bbea?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000053f30 sp=0x14000053f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000053fd0 sp=0x14000053f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000053fd0 sp=0x14000053fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 6 [GC worker (idle)]:
runtime.gopark(0x103293620?, 0x1?, 0xb7?, 0x85?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000058f30 sp=0x14000058f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000058fd0 sp=0x14000058f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000058fd0 sp=0x14000058fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x2625f4e46c?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000059730 sp=0x14000059710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000597d0 sp=0x14000059730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000597d0 sp=0x140000597d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 26 [GC worker (idle)]:
runtime.gopark(0x2625f4d8dd?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000054730 sp=0x14000054710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000547d0 sp=0x14000054730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000547d0 sp=0x140000547d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 9 [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1024f8a90?)
	runtime/proc.go:398 +0xc8 fp=0x14000508540 sp=0x14000508520 pc=0x1024b28b8
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:564 +0x158 fp=0x14000508580 sp=0x14000508540 pc=0x1024abfa8
internal/poll.runtime_pollWait(0x12a0dfaa8, 0x72)
	runtime/netpoll.go:343 +0xa0 fp=0x140005085b0 sp=0x14000508580 pc=0x1024dd7b0
internal/poll.(*pollDesc).wait(0x1400007e000?, 0x140004348e1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005085e0 sp=0x140005085b0 pc=0x10256fcc8
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400007e000, {0x140004348e1, 0x1, 0x1})
	internal/poll/fd_unix.go:164 +0x200 fp=0x14000508680 sp=0x140005085e0 pc=0x102571010
net.(*netFD).Read(0x1400007e000, {0x140004348e1?, 0x102ccfaa0?, 0x102ce4c80?})
	net/fd_posix.go:55 +0x28 fp=0x140005086d0 sp=0x14000508680 pc=0x1025d4278
net.(*conn).Read(0x140000aee90, {0x140004348e1?, 0x1?, 0x140003d8050?})
	net/net.go:179 +0x34 fp=0x14000508720 sp=0x140005086d0 pc=0x1025e1744
net.(*TCPConn).Read(0x140004348d0?, {0x140004348e1?, 0x140003d8050?, 0x0?})
	<autogenerated>:1 +0x2c fp=0x14000508750 sp=0x14000508720 pc=0x1025f2ddc
net/http.(*connReader).backgroundRead(0x140004348d0)
	net/http/server.go:683 +0x40 fp=0x140005087b0 sp=0x14000508750 pc=0x102700d20
net/http.(*connReader).startBackgroundRead.func2()
	net/http/server.go:679 +0x28 fp=0x140005087d0 sp=0x140005087b0 pc=0x102700c48
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140005087d0 sp=0x140005087d0 pc=0x1024e3c04
created by net/http.(*connReader).startBackgroundRead in goroutine 27
	net/http/server.go:679 +0xc8

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x0
r5      0x1721eabe0
r6      0xa
r7      0x0
r8      0xd49f50ddc5c353f6
r9      0xd49f50dcb7dde3f6
r10     0x2
r11     0xfffffffd
r12     0x10000000000
r13     0x0
r14     0x0
r15     0x0
r16     0x148
r17     0x1fec033a0
r18     0x0
r19     0x6
r20     0x1721eb000
r21     0x1803
r22     0x1721eb0e0
r23     0x14d800020
r24     0x60000187f780
r25     0xc
r26     0x6f4
r27     0x600003657c20
r28     0xc
r29     0x1721eab90
lr      0x19f05bc28
sp      0x1721eab70
pc      0x19f024764
fault   0x19f024764

Originally created by @spqw on GitHub (Aug 28, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/434 I got a crash with the following logs when running `ollama run phind-codellama:34b-q5_K_M` on Macbook Pro M2 Max with 32GB memory. ``` ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 8 failed with status 5 GGML_ASSERT: ggml-metal.m:1177: false ``` Please find complete logs below. **Questions** 1. How is `ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB` set? 2. Is that a hard limit? 3. Is there a parameter I can tweak to try and ignore this? **Next actions** I will try a more aggressively quantized version and report here. **Complete logs** When running ``` ollama run phind-codellama:34b-q5_K_M ``` Then this happens: ``` llama.cpp: loading model from /Users/lion/.ollama/models/blobs/sha256:454a488edf6348d320b3ba4bc2fdfc98219312e43589b88194fca8ad9b0f1fd0 llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 5504 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_head_kv = 8 llama_model_load_internal: n_layer = 48 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 8 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: freq_base = 100000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 17 (mostly Q5_K - Medium) llama_model_load_internal: model size = 34B llama_model_load_internal: ggml ctx size = 0.13 MB llama_model_load_internal: mem required = 23392.87 MB (+ 384.00 MB per state) llama_new_context_with_model: kv self size = 384.00 MB ggml_metal_init: allocating ggml_metal_init: using MPS ggml_metal_init: loading '/opt/homebrew/Cellar/ollama/0.0.16/bin/ggml-metal.metal' ggml_metal_init: loaded kernel_add 0x12a407f50 ggml_metal_init: loaded kernel_add_row 0x12a4084d0 ggml_metal_init: loaded kernel_mul 0x12a408a10 ggml_metal_init: loaded kernel_mul_row 0x12a409060 ggml_metal_init: loaded kernel_scale 0x12a4095a0 ggml_metal_init: loaded kernel_silu 0x12a409ae0 ggml_metal_init: loaded kernel_relu 0x12a40a020 ggml_metal_init: loaded kernel_gelu 0x12a40a560 ggml_metal_init: loaded kernel_soft_max 0x12a40ac30 ggml_metal_init: loaded kernel_diag_mask_inf 0x12a40b2b0 ggml_metal_init: loaded kernel_get_rows_f16 0x12a40b950 ggml_metal_init: loaded kernel_get_rows_q4_0 0x12a40c110 ggml_metal_init: loaded kernel_get_rows_q4_1 0x12a40c7b0 ggml_metal_init: loaded kernel_get_rows_q2_K 0x12a305190 ggml_metal_init: loaded kernel_get_rows_q3_K 0x12a305950 ggml_metal_init: loaded kernel_get_rows_q4_K 0x12a305ff0 ggml_metal_init: loaded kernel_get_rows_q5_K 0x12a306690 ggml_metal_init: loaded kernel_get_rows_q6_K 0x12a306d30 ggml_metal_init: loaded kernel_rms_norm 0x12a307410 ggml_metal_init: loaded kernel_norm 0x12a307d70 ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x12a308640 ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x12a308d20 ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x12a309400 ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x12a309c60 ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x12a30a340 ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x12a30aa20 ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x12a30b0e0 ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x12a30b9a0 ggml_metal_init: loaded kernel_rope 0x12a30bee0 ggml_metal_init: loaded kernel_alibi_f32 0x12a30ca20 ggml_metal_init: loaded kernel_cpy_f32_f16 0x12a30d2d0 ggml_metal_init: loaded kernel_cpy_f32_f32 0x12a30db80 ggml_metal_init: loaded kernel_cpy_f16_f16 0x12a30e310 ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: max tensor size = 205.08 MB ggml_metal_add_buffer: allocated 'data ' buffer, size = 16384.00 MB, offs = 0 ggml_metal_add_buffer: allocated 'data ' buffer, size = 6555.28 MB, offs = 16964812800, (22939.73 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'eval ' buffer, size = 16.17 MB, (22955.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'kv ' buffer, size = 386.00 MB, (23341.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 387.00 MB, (23728.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 8 failed with status 5 GGML_ASSERT: ggml-metal.m:1177: false SIGABRT: abort PC=0x19f024764 m=10 sigcode=0 signal arrived during cgo execution goroutine 27 [syscall]: runtime.cgocall(0x102aa5070, 0x14000492e28) runtime/cgocall.go:157 +0x44 fp=0x14000492df0 sp=0x14000492db0 pc=0x10247e2d4 github.com/jmorganca/ollama/llm._Cfunc_llama_eval(0x14c013600, 0x140003ff0f0, 0x1, 0x0, 0xc) _cgo_gotypes.go:216 +0x34 fp=0x14000492e20 sp=0x14000492df0 pc=0x1028098c4 github.com/jmorganca/ollama/llm.newLlama.func6(0x14000000001?, {0x140003ff0f0, 0x1, 0x0?}, 0x0?) github.com/jmorganca/ollama/llm/llama.go:293 +0x7c fp=0x14000492e70 sp=0x14000492e20 pc=0x10280ac1c github.com/jmorganca/ollama/llm.newLlama({0x14000096150, 0x68}, {0x0, 0x0, 0x0?}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...}) github.com/jmorganca/ollama/llm/llama.go:293 +0x500 fp=0x140004930e0 sp=0x14000492e70 pc=0x10280aac0 github.com/jmorganca/ollama/llm.New({0x14000096150, 0x68}, {0x0, 0x0, 0x0}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...}) github.com/jmorganca/ollama/llm/llm.go:70 +0x408 fp=0x14000493270 sp=0x140004930e0 pc=0x102809348 github.com/jmorganca/ollama/server.load(0x140000fe510, 0x102c7d3c0?, 0x14000136ae0?) github.com/jmorganca/ollama/server/routes.go:82 +0x39c fp=0x14000493550 sp=0x14000493270 pc=0x102a98a8c github.com/jmorganca/ollama/server.GenerateHandler(0x14000482100) github.com/jmorganca/ollama/server/routes.go:154 +0x2e8 fp=0x14000493760 sp=0x14000493550 pc=0x102a99188 github.com/gin-gonic/gin.(*Context).Next(...) github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x14000482100) github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x80 fp=0x140004937b0 sp=0x14000493760 pc=0x102a807b0 github.com/gin-gonic/gin.(*Context).Next(...) github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.LoggerWithConfig.func1(0x14000482100) github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xb0 fp=0x14000493960 sp=0x140004937b0 pc=0x102a7fb50 github.com/gin-gonic/gin.(*Context).Next(...) github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x1400036cd00, 0x14000482100) github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x524 fp=0x14000493af0 sp=0x14000493960 pc=0x102a7ec84 github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x1400036cd00, {0x102d82050?, 0x140003d40e0}, 0x14000482200) github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1a0 fp=0x14000493b30 sp=0x14000493af0 pc=0x102a7e5d0 net/http.serverHandler.ServeHTTP({0x102d801e0?}, {0x102d82050?, 0x140003d40e0?}, 0x6?) net/http/server.go:2938 +0xbc fp=0x14000493b60 sp=0x14000493b30 pc=0x10270a38c net/http.(*conn).serve(0x140000ff0e0, {0x102d83838, 0x14000403ef0}) net/http/server.go:2009 +0x518 fp=0x14000493fa0 sp=0x14000493b60 pc=0x102706788 net/http.(*Server).Serve.func3() net/http/server.go:3086 +0x30 fp=0x14000493fd0 sp=0x14000493fa0 pc=0x10270aaa0 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000493fd0 sp=0x14000493fd0 pc=0x1024e3c04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3086 +0x4cc goroutine 1 [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x1400051b5b0 sp=0x1400051b590 pc=0x1024b28b8 runtime.netpollblock(0x1400051b648?, 0x2574664?, 0x1?) runtime/netpoll.go:564 +0x158 fp=0x1400051b5f0 sp=0x1400051b5b0 pc=0x1024abfa8 internal/poll.runtime_pollWait(0x12a0dfba0, 0x72) runtime/netpoll.go:343 +0xa0 fp=0x1400051b620 sp=0x1400051b5f0 pc=0x1024dd7b0 internal/poll.(*pollDesc).wait(0x1400040a680?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400051b650 sp=0x1400051b620 pc=0x10256fcc8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x1400040a680) internal/poll/fd_unix.go:611 +0x250 fp=0x1400051b700 sp=0x1400051b650 pc=0x102574750 net.(*netFD).accept(0x1400040a680) net/fd_unix.go:172 +0x28 fp=0x1400051b7c0 sp=0x1400051b700 pc=0x1025d5e88 net.(*TCPListener).accept(0x140003e79c0) net/tcpsock_posix.go:152 +0x28 fp=0x1400051b7f0 sp=0x1400051b7c0 pc=0x1025e9ef8 net.(*TCPListener).Accept(0x140003e79c0) net/tcpsock.go:315 +0x2c fp=0x1400051b830 sp=0x1400051b7f0 pc=0x1025e90dc net/http.(*onceCloseListener).Accept(0x140000ff0e0?) <autogenerated>:1 +0x30 fp=0x1400051b850 sp=0x1400051b830 pc=0x10272c3c0 net/http.(*Server).Serve(0x14000332ff0, {0x102d81e40, 0x140003e79c0}) net/http/server.go:3056 +0x2b8 fp=0x1400051b980 sp=0x1400051b850 pc=0x10270a748 github.com/jmorganca/ollama/server.Serve({0x102d81e40, 0x140003e79c0}, {0x0, 0x0, 0x0}) github.com/jmorganca/ollama/server/routes.go:457 +0x6cc fp=0x1400051bc50 sp=0x1400051b980 pc=0x102a9c86c github.com/jmorganca/ollama/cmd.RunServer(0x1400042e200?, {0x102afd6d9?, 0x4?, 0x102afd699?}) github.com/jmorganca/ollama/cmd/cmd.go:621 +0x1f4 fp=0x1400051bd10 sp=0x1400051bc50 pc=0x102aa2c04 github.com/spf13/cobra.(*Command).execute(0x140003c5500, {0x103291940, 0x0, 0x0}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x658 fp=0x1400051be50 sp=0x1400051bd10 pc=0x1027b3eb8 github.com/spf13/cobra.(*Command).ExecuteC(0x140003c4c00) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x1400051bf10 sp=0x1400051be50 pc=0x1027b45e0 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/jmorganca/ollama/main.go:11 +0x54 fp=0x1400051bf30 sp=0x1400051bf10 pc=0x102aa47d4 runtime.main() runtime/proc.go:267 +0x2bc fp=0x1400051bfd0 sp=0x1400051bf30 pc=0x1024b248c runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x1400051bfd0 sp=0x1400051bfd0 pc=0x1024e3c04 goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000056f90 sp=0x14000056f70 pc=0x1024b28b8 runtime.goparkunlock(...) runtime/proc.go:404 runtime.forcegchelper() runtime/proc.go:322 +0xb8 fp=0x14000056fd0 sp=0x14000056f90 pc=0x1024b2748 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000056fd0 sp=0x14000056fd0 pc=0x1024e3c04 created by runtime.init.6 in goroutine 1 runtime/proc.go:310 +0x24 goroutine 18 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000052760 sp=0x14000052740 pc=0x1024b28b8 runtime.goparkunlock(...) runtime/proc.go:404 runtime.bgsweep(0x0?) runtime/mgcsweep.go:321 +0x108 fp=0x140000527b0 sp=0x14000052760 pc=0x10249f0b8 runtime.gcenable.func1() runtime/mgc.go:200 +0x28 fp=0x140000527d0 sp=0x140000527b0 pc=0x102493b08 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000527d0 sp=0x140000527d0 pc=0x1024e3c04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:200 +0x6c goroutine 19 [GC scavenge wait]: runtime.gopark(0x14000096000?, 0x102c2a228?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000052f50 sp=0x14000052f30 pc=0x1024b28b8 runtime.goparkunlock(...) runtime/proc.go:404 runtime.(*scavengerState).park(0x1031d1c00) runtime/mgcscavenge.go:425 +0x5c fp=0x14000052f80 sp=0x14000052f50 pc=0x10249c8ac runtime.bgscavenge(0x0?) runtime/mgcscavenge.go:658 +0xac fp=0x14000052fb0 sp=0x14000052f80 pc=0x10249ce6c runtime.gcenable.func2() runtime/mgc.go:201 +0x28 fp=0x14000052fd0 sp=0x14000052fb0 pc=0x102493aa8 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000052fd0 sp=0x14000052fd0 pc=0x1024e3c04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:201 +0xac goroutine 20 [finalizer wait]: runtime.gopark(0x1400008a820?, 0x1a0?, 0xe8?, 0x65?, 0x10276947c?) runtime/proc.go:398 +0xc8 fp=0x14000056580 sp=0x14000056560 pc=0x1024b28b8 runtime.runfinq() runtime/mfinal.go:193 +0x108 fp=0x140000567d0 sp=0x14000056580 pc=0x102492bf8 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000567d0 sp=0x140000567d0 pc=0x1024e3c04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x80 goroutine 3 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000057730 sp=0x14000057710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000577d0 sp=0x14000057730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000577d0 sp=0x140000577d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 24 [GC worker (idle)]: runtime.gopark(0x1?, 0x140003d9310?, 0xa8?, 0x37?, 0x102700c48?) runtime/proc.go:398 +0xc8 fp=0x14000053730 sp=0x14000053710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000537d0 sp=0x14000053730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000537d0 sp=0x140000537d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 4 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000057f30 sp=0x14000057f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000057fd0 sp=0x14000057f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000057fd0 sp=0x14000057fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 34 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000506730 sp=0x14000506710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140005067d0 sp=0x14000506730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140005067d0 sp=0x140005067d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 35 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000506f30 sp=0x14000506f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000506fd0 sp=0x14000506f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000506fd0 sp=0x14000506fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 5 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000058730 sp=0x14000058710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000587d0 sp=0x14000058730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000587d0 sp=0x140000587d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 36 [GC worker (idle)]: runtime.gopark(0x2625f613a5?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000507730 sp=0x14000507710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140005077d0 sp=0x14000507730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140005077d0 sp=0x140005077d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 37 [GC worker (idle)]: runtime.gopark(0x2625f5c48b?, 0x3?, 0xe2?, 0xf7?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000507f30 sp=0x14000507f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000507fd0 sp=0x14000507f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000507fd0 sp=0x14000507fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 25 [GC worker (idle)]: runtime.gopark(0x2625f5bbea?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000053f30 sp=0x14000053f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000053fd0 sp=0x14000053f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000053fd0 sp=0x14000053fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 6 [GC worker (idle)]: runtime.gopark(0x103293620?, 0x1?, 0xb7?, 0x85?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000058f30 sp=0x14000058f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000058fd0 sp=0x14000058f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000058fd0 sp=0x14000058fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 7 [GC worker (idle)]: runtime.gopark(0x2625f4e46c?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000059730 sp=0x14000059710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000597d0 sp=0x14000059730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000597d0 sp=0x140000597d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 26 [GC worker (idle)]: runtime.gopark(0x2625f4d8dd?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000054730 sp=0x14000054710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000547d0 sp=0x14000054730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000547d0 sp=0x140000547d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 9 [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1024f8a90?) runtime/proc.go:398 +0xc8 fp=0x14000508540 sp=0x14000508520 pc=0x1024b28b8 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:564 +0x158 fp=0x14000508580 sp=0x14000508540 pc=0x1024abfa8 internal/poll.runtime_pollWait(0x12a0dfaa8, 0x72) runtime/netpoll.go:343 +0xa0 fp=0x140005085b0 sp=0x14000508580 pc=0x1024dd7b0 internal/poll.(*pollDesc).wait(0x1400007e000?, 0x140004348e1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005085e0 sp=0x140005085b0 pc=0x10256fcc8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400007e000, {0x140004348e1, 0x1, 0x1}) internal/poll/fd_unix.go:164 +0x200 fp=0x14000508680 sp=0x140005085e0 pc=0x102571010 net.(*netFD).Read(0x1400007e000, {0x140004348e1?, 0x102ccfaa0?, 0x102ce4c80?}) net/fd_posix.go:55 +0x28 fp=0x140005086d0 sp=0x14000508680 pc=0x1025d4278 net.(*conn).Read(0x140000aee90, {0x140004348e1?, 0x1?, 0x140003d8050?}) net/net.go:179 +0x34 fp=0x14000508720 sp=0x140005086d0 pc=0x1025e1744 net.(*TCPConn).Read(0x140004348d0?, {0x140004348e1?, 0x140003d8050?, 0x0?}) <autogenerated>:1 +0x2c fp=0x14000508750 sp=0x14000508720 pc=0x1025f2ddc net/http.(*connReader).backgroundRead(0x140004348d0) net/http/server.go:683 +0x40 fp=0x140005087b0 sp=0x14000508750 pc=0x102700d20 net/http.(*connReader).startBackgroundRead.func2() net/http/server.go:679 +0x28 fp=0x140005087d0 sp=0x140005087b0 pc=0x102700c48 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140005087d0 sp=0x140005087d0 pc=0x1024e3c04 created by net/http.(*connReader).startBackgroundRead in goroutine 27 net/http/server.go:679 +0xc8 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x0 r5 0x1721eabe0 r6 0xa r7 0x0 r8 0xd49f50ddc5c353f6 r9 0xd49f50dcb7dde3f6 r10 0x2 r11 0xfffffffd r12 0x10000000000 r13 0x0 r14 0x0 r15 0x0 r16 0x148 r17 0x1fec033a0 r18 0x0 r19 0x6 r20 0x1721eb000 r21 0x1803 r22 0x1721eb0e0 r23 0x14d800020 r24 0x60000187f780 r25 0xc r26 0x6f4 r27 0x600003657c20 r28 0xc r29 0x1721eab90 lr 0x19f05bc28 sp 0x1721eab70 pc 0x19f024764 fault 0x19f024764 ```

GiteaMirror added the bug label 2026-04-27 23:38:30 -05:00

GiteaMirror closed this issue

2026-04-27 23:39:40 -05:00

GiteaMirror commented

2026-04-27 23:39:41 -05:00

@mchiang0610 commented on GitHub (Aug 30, 2023):

@spqw Would it be possible to see if you are still having issues with the latest Ollama (v0.0.17)? Sorry about that.

If you still have trouble with the latest version, please feel free to re-open this!

Questions:

That is the recommendation by Apple; not set by Ollama; set based on what is loaded into the GPU
No, there is no software hard limit
You can reduce the context size, which will reduce the amount of memory it uses. If you are off by a lot, it'll still error.

In the updated version, if you are way over memory limit, we'll warn you.

@mchiang0610 commented on GitHub (Aug 30, 2023): @spqw Would it be possible to see if you are still having issues with the latest Ollama (v0.0.17)? Sorry about that. If you still have trouble with the latest version, please feel free to re-open this! Questions: 1. That is the recommendation by Apple; not set by Ollama; set based on what is loaded into the GPU 2. No, there is no software hard limit 3. You can reduce the context size, which will reduce the amount of memory it uses. If you are off by a lot, it'll still error. In the updated version, if you are way over memory limit, we'll warn you.

Sign in to join this conversation.

Branches Tags

main

parth-remove-ollama-agent-command

parth-agent-harness-skills-synthetic-tool

hoyyeva/fix-anthropic-text-before-thinking

parth-agent-cli-markdown-rendering

mxyng/docs-cloud

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#46713

[GH-ISSUE #434] Crash on M2 Max 32GB RAM when running phind-codellama:34b-q5_K_M #46713

[GH-ISSUE #434] Crash on M2 Max 32GB RAM when running `phind-codellama:34b-q5_K_M` #46713