[GH-ISSUE #434] Crash on M2 Max 32GB RAM when running phind-codellama:34b-q5_K_M #46713

Closed
opened 2026-04-27 23:38:29 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @spqw on GitHub (Aug 28, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/434

I got a crash with the following logs when running ollama run phind-codellama:34b-q5_K_M on Macbook Pro M2 Max with 32GB memory.

ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =   256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 8 failed with status 5
GGML_ASSERT: ggml-metal.m:1177: false

Please find complete logs below.

Questions

  1. How is ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB set?
  2. Is that a hard limit?
  3. Is there a parameter I can tweak to try and ignore this?

Next actions
I will try a more aggressively quantized version and report here.

Complete logs
When running

ollama run phind-codellama:34b-q5_K_M

Then this happens:

llama.cpp: loading model from /Users/lion/.ollama/models/blobs/sha256:454a488edf6348d320b3ba4bc2fdfc98219312e43589b88194fca8ad9b0f1fd0
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 5504
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_head_kv  = 8
llama_model_load_internal: n_layer    = 48
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 8
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: freq_base  = 100000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 17 (mostly Q5_K - Medium)
llama_model_load_internal: model size = 34B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 23392.87 MB (+  384.00 MB per state)
llama_new_context_with_model: kv self size  =  384.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/opt/homebrew/Cellar/ollama/0.0.16/bin/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x12a407f50
ggml_metal_init: loaded kernel_add_row                        0x12a4084d0
ggml_metal_init: loaded kernel_mul                            0x12a408a10
ggml_metal_init: loaded kernel_mul_row                        0x12a409060
ggml_metal_init: loaded kernel_scale                          0x12a4095a0
ggml_metal_init: loaded kernel_silu                           0x12a409ae0
ggml_metal_init: loaded kernel_relu                           0x12a40a020
ggml_metal_init: loaded kernel_gelu                           0x12a40a560
ggml_metal_init: loaded kernel_soft_max                       0x12a40ac30
ggml_metal_init: loaded kernel_diag_mask_inf                  0x12a40b2b0
ggml_metal_init: loaded kernel_get_rows_f16                   0x12a40b950
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x12a40c110
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x12a40c7b0
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x12a305190
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x12a305950
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x12a305ff0
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x12a306690
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x12a306d30
ggml_metal_init: loaded kernel_rms_norm                       0x12a307410
ggml_metal_init: loaded kernel_norm                           0x12a307d70
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x12a308640
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x12a308d20
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x12a309400
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32               0x12a309c60
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32               0x12a30a340
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32               0x12a30aa20
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32               0x12a30b0e0
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32               0x12a30b9a0
ggml_metal_init: loaded kernel_rope                           0x12a30bee0
ggml_metal_init: loaded kernel_alibi_f32                      0x12a30ca20
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x12a30d2d0
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x12a30db80
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x12a30e310
ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB
ggml_metal_init: hasUnifiedMemory             = true
ggml_metal_init: maxTransferRate              = built-in GPU
llama_new_context_with_model: max tensor size =   205.08 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size = 16384.00 MB, offs =            0
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  6555.28 MB, offs =  16964812800, (22939.73 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =    16.17 MB, (22955.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   386.00 MB, (23341.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'scr0            ' buffer, size =   387.00 MB, (23728.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =   256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 8 failed with status 5
GGML_ASSERT: ggml-metal.m:1177: false
SIGABRT: abort
PC=0x19f024764 m=10 sigcode=0
signal arrived during cgo execution

goroutine 27 [syscall]:
runtime.cgocall(0x102aa5070, 0x14000492e28)
	runtime/cgocall.go:157 +0x44 fp=0x14000492df0 sp=0x14000492db0 pc=0x10247e2d4
github.com/jmorganca/ollama/llm._Cfunc_llama_eval(0x14c013600, 0x140003ff0f0, 0x1, 0x0, 0xc)
	_cgo_gotypes.go:216 +0x34 fp=0x14000492e20 sp=0x14000492df0 pc=0x1028098c4
github.com/jmorganca/ollama/llm.newLlama.func6(0x14000000001?, {0x140003ff0f0, 0x1, 0x0?}, 0x0?)
	github.com/jmorganca/ollama/llm/llama.go:293 +0x7c fp=0x14000492e70 sp=0x14000492e20 pc=0x10280ac1c
github.com/jmorganca/ollama/llm.newLlama({0x14000096150, 0x68}, {0x0, 0x0, 0x0?}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...})
	github.com/jmorganca/ollama/llm/llama.go:293 +0x500 fp=0x140004930e0 sp=0x14000492e70 pc=0x10280aac0
github.com/jmorganca/ollama/llm.New({0x14000096150, 0x68}, {0x0, 0x0, 0x0}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...})
	github.com/jmorganca/ollama/llm/llm.go:70 +0x408 fp=0x14000493270 sp=0x140004930e0 pc=0x102809348
github.com/jmorganca/ollama/server.load(0x140000fe510, 0x102c7d3c0?, 0x14000136ae0?)
	github.com/jmorganca/ollama/server/routes.go:82 +0x39c fp=0x14000493550 sp=0x14000493270 pc=0x102a98a8c
github.com/jmorganca/ollama/server.GenerateHandler(0x14000482100)
	github.com/jmorganca/ollama/server/routes.go:154 +0x2e8 fp=0x14000493760 sp=0x14000493550 pc=0x102a99188
github.com/gin-gonic/gin.(*Context).Next(...)
	github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x14000482100)
	github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x80 fp=0x140004937b0 sp=0x14000493760 pc=0x102a807b0
github.com/gin-gonic/gin.(*Context).Next(...)
	github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0x14000482100)
	github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xb0 fp=0x14000493960 sp=0x140004937b0 pc=0x102a7fb50
github.com/gin-gonic/gin.(*Context).Next(...)
	github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x1400036cd00, 0x14000482100)
	github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x524 fp=0x14000493af0 sp=0x14000493960 pc=0x102a7ec84
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x1400036cd00, {0x102d82050?, 0x140003d40e0}, 0x14000482200)
	github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1a0 fp=0x14000493b30 sp=0x14000493af0 pc=0x102a7e5d0
net/http.serverHandler.ServeHTTP({0x102d801e0?}, {0x102d82050?, 0x140003d40e0?}, 0x6?)
	net/http/server.go:2938 +0xbc fp=0x14000493b60 sp=0x14000493b30 pc=0x10270a38c
net/http.(*conn).serve(0x140000ff0e0, {0x102d83838, 0x14000403ef0})
	net/http/server.go:2009 +0x518 fp=0x14000493fa0 sp=0x14000493b60 pc=0x102706788
net/http.(*Server).Serve.func3()
	net/http/server.go:3086 +0x30 fp=0x14000493fd0 sp=0x14000493fa0 pc=0x10270aaa0
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000493fd0 sp=0x14000493fd0 pc=0x1024e3c04
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3086 +0x4cc

goroutine 1 [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x1400051b5b0 sp=0x1400051b590 pc=0x1024b28b8
runtime.netpollblock(0x1400051b648?, 0x2574664?, 0x1?)
	runtime/netpoll.go:564 +0x158 fp=0x1400051b5f0 sp=0x1400051b5b0 pc=0x1024abfa8
internal/poll.runtime_pollWait(0x12a0dfba0, 0x72)
	runtime/netpoll.go:343 +0xa0 fp=0x1400051b620 sp=0x1400051b5f0 pc=0x1024dd7b0
internal/poll.(*pollDesc).wait(0x1400040a680?, 0x0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400051b650 sp=0x1400051b620 pc=0x10256fcc8
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x1400040a680)
	internal/poll/fd_unix.go:611 +0x250 fp=0x1400051b700 sp=0x1400051b650 pc=0x102574750
net.(*netFD).accept(0x1400040a680)
	net/fd_unix.go:172 +0x28 fp=0x1400051b7c0 sp=0x1400051b700 pc=0x1025d5e88
net.(*TCPListener).accept(0x140003e79c0)
	net/tcpsock_posix.go:152 +0x28 fp=0x1400051b7f0 sp=0x1400051b7c0 pc=0x1025e9ef8
net.(*TCPListener).Accept(0x140003e79c0)
	net/tcpsock.go:315 +0x2c fp=0x1400051b830 sp=0x1400051b7f0 pc=0x1025e90dc
net/http.(*onceCloseListener).Accept(0x140000ff0e0?)
	<autogenerated>:1 +0x30 fp=0x1400051b850 sp=0x1400051b830 pc=0x10272c3c0
net/http.(*Server).Serve(0x14000332ff0, {0x102d81e40, 0x140003e79c0})
	net/http/server.go:3056 +0x2b8 fp=0x1400051b980 sp=0x1400051b850 pc=0x10270a748
github.com/jmorganca/ollama/server.Serve({0x102d81e40, 0x140003e79c0}, {0x0, 0x0, 0x0})
	github.com/jmorganca/ollama/server/routes.go:457 +0x6cc fp=0x1400051bc50 sp=0x1400051b980 pc=0x102a9c86c
github.com/jmorganca/ollama/cmd.RunServer(0x1400042e200?, {0x102afd6d9?, 0x4?, 0x102afd699?})
	github.com/jmorganca/ollama/cmd/cmd.go:621 +0x1f4 fp=0x1400051bd10 sp=0x1400051bc50 pc=0x102aa2c04
github.com/spf13/cobra.(*Command).execute(0x140003c5500, {0x103291940, 0x0, 0x0})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x658 fp=0x1400051be50 sp=0x1400051bd10 pc=0x1027b3eb8
github.com/spf13/cobra.(*Command).ExecuteC(0x140003c4c00)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x1400051bf10 sp=0x1400051be50 pc=0x1027b45e0
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/jmorganca/ollama/main.go:11 +0x54 fp=0x1400051bf30 sp=0x1400051bf10 pc=0x102aa47d4
runtime.main()
	runtime/proc.go:267 +0x2bc fp=0x1400051bfd0 sp=0x1400051bf30 pc=0x1024b248c
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x1400051bfd0 sp=0x1400051bfd0 pc=0x1024e3c04

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000056f90 sp=0x14000056f70 pc=0x1024b28b8
runtime.goparkunlock(...)
	runtime/proc.go:404
runtime.forcegchelper()
	runtime/proc.go:322 +0xb8 fp=0x14000056fd0 sp=0x14000056f90 pc=0x1024b2748
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000056fd0 sp=0x14000056fd0 pc=0x1024e3c04
created by runtime.init.6 in goroutine 1
	runtime/proc.go:310 +0x24

goroutine 18 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000052760 sp=0x14000052740 pc=0x1024b28b8
runtime.goparkunlock(...)
	runtime/proc.go:404
runtime.bgsweep(0x0?)
	runtime/mgcsweep.go:321 +0x108 fp=0x140000527b0 sp=0x14000052760 pc=0x10249f0b8
runtime.gcenable.func1()
	runtime/mgc.go:200 +0x28 fp=0x140000527d0 sp=0x140000527b0 pc=0x102493b08
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000527d0 sp=0x140000527d0 pc=0x1024e3c04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:200 +0x6c

goroutine 19 [GC scavenge wait]:
runtime.gopark(0x14000096000?, 0x102c2a228?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000052f50 sp=0x14000052f30 pc=0x1024b28b8
runtime.goparkunlock(...)
	runtime/proc.go:404
runtime.(*scavengerState).park(0x1031d1c00)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000052f80 sp=0x14000052f50 pc=0x10249c8ac
runtime.bgscavenge(0x0?)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000052fb0 sp=0x14000052f80 pc=0x10249ce6c
runtime.gcenable.func2()
	runtime/mgc.go:201 +0x28 fp=0x14000052fd0 sp=0x14000052fb0 pc=0x102493aa8
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000052fd0 sp=0x14000052fd0 pc=0x1024e3c04
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:201 +0xac

goroutine 20 [finalizer wait]:
runtime.gopark(0x1400008a820?, 0x1a0?, 0xe8?, 0x65?, 0x10276947c?)
	runtime/proc.go:398 +0xc8 fp=0x14000056580 sp=0x14000056560 pc=0x1024b28b8
runtime.runfinq()
	runtime/mfinal.go:193 +0x108 fp=0x140000567d0 sp=0x14000056580 pc=0x102492bf8
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000567d0 sp=0x140000567d0 pc=0x1024e3c04
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:163 +0x80

goroutine 3 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000057730 sp=0x14000057710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000577d0 sp=0x14000057730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000577d0 sp=0x140000577d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 24 [GC worker (idle)]:
runtime.gopark(0x1?, 0x140003d9310?, 0xa8?, 0x37?, 0x102700c48?)
	runtime/proc.go:398 +0xc8 fp=0x14000053730 sp=0x14000053710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000537d0 sp=0x14000053730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000537d0 sp=0x140000537d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 4 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000057f30 sp=0x14000057f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000057fd0 sp=0x14000057f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000057fd0 sp=0x14000057fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 34 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000506730 sp=0x14000506710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140005067d0 sp=0x14000506730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140005067d0 sp=0x140005067d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000506f30 sp=0x14000506f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000506fd0 sp=0x14000506f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000506fd0 sp=0x14000506fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 5 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000058730 sp=0x14000058710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000587d0 sp=0x14000058730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000587d0 sp=0x140000587d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 36 [GC worker (idle)]:
runtime.gopark(0x2625f613a5?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000507730 sp=0x14000507710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140005077d0 sp=0x14000507730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140005077d0 sp=0x140005077d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x2625f5c48b?, 0x3?, 0xe2?, 0xf7?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000507f30 sp=0x14000507f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000507fd0 sp=0x14000507f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000507fd0 sp=0x14000507fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 25 [GC worker (idle)]:
runtime.gopark(0x2625f5bbea?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000053f30 sp=0x14000053f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000053fd0 sp=0x14000053f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000053fd0 sp=0x14000053fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 6 [GC worker (idle)]:
runtime.gopark(0x103293620?, 0x1?, 0xb7?, 0x85?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000058f30 sp=0x14000058f10 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x14000058fd0 sp=0x14000058f30 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x14000058fd0 sp=0x14000058fd0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x2625f4e46c?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000059730 sp=0x14000059710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000597d0 sp=0x14000059730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000597d0 sp=0x140000597d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 26 [GC worker (idle)]:
runtime.gopark(0x2625f4d8dd?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:398 +0xc8 fp=0x14000054730 sp=0x14000054710 pc=0x1024b28b8
runtime.gcBgMarkWorker()
	runtime/mgc.go:1293 +0xd8 fp=0x140000547d0 sp=0x14000054730 pc=0x102495758
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140000547d0 sp=0x140000547d0 pc=0x1024e3c04
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1217 +0x28

goroutine 9 [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1024f8a90?)
	runtime/proc.go:398 +0xc8 fp=0x14000508540 sp=0x14000508520 pc=0x1024b28b8
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:564 +0x158 fp=0x14000508580 sp=0x14000508540 pc=0x1024abfa8
internal/poll.runtime_pollWait(0x12a0dfaa8, 0x72)
	runtime/netpoll.go:343 +0xa0 fp=0x140005085b0 sp=0x14000508580 pc=0x1024dd7b0
internal/poll.(*pollDesc).wait(0x1400007e000?, 0x140004348e1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005085e0 sp=0x140005085b0 pc=0x10256fcc8
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400007e000, {0x140004348e1, 0x1, 0x1})
	internal/poll/fd_unix.go:164 +0x200 fp=0x14000508680 sp=0x140005085e0 pc=0x102571010
net.(*netFD).Read(0x1400007e000, {0x140004348e1?, 0x102ccfaa0?, 0x102ce4c80?})
	net/fd_posix.go:55 +0x28 fp=0x140005086d0 sp=0x14000508680 pc=0x1025d4278
net.(*conn).Read(0x140000aee90, {0x140004348e1?, 0x1?, 0x140003d8050?})
	net/net.go:179 +0x34 fp=0x14000508720 sp=0x140005086d0 pc=0x1025e1744
net.(*TCPConn).Read(0x140004348d0?, {0x140004348e1?, 0x140003d8050?, 0x0?})
	<autogenerated>:1 +0x2c fp=0x14000508750 sp=0x14000508720 pc=0x1025f2ddc
net/http.(*connReader).backgroundRead(0x140004348d0)
	net/http/server.go:683 +0x40 fp=0x140005087b0 sp=0x14000508750 pc=0x102700d20
net/http.(*connReader).startBackgroundRead.func2()
	net/http/server.go:679 +0x28 fp=0x140005087d0 sp=0x140005087b0 pc=0x102700c48
runtime.goexit()
	runtime/asm_arm64.s:1197 +0x4 fp=0x140005087d0 sp=0x140005087d0 pc=0x1024e3c04
created by net/http.(*connReader).startBackgroundRead in goroutine 27
	net/http/server.go:679 +0xc8

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x0
r5      0x1721eabe0
r6      0xa
r7      0x0
r8      0xd49f50ddc5c353f6
r9      0xd49f50dcb7dde3f6
r10     0x2
r11     0xfffffffd
r12     0x10000000000
r13     0x0
r14     0x0
r15     0x0
r16     0x148
r17     0x1fec033a0
r18     0x0
r19     0x6
r20     0x1721eb000
r21     0x1803
r22     0x1721eb0e0
r23     0x14d800020
r24     0x60000187f780
r25     0xc
r26     0x6f4
r27     0x600003657c20
r28     0xc
r29     0x1721eab90
lr      0x19f05bc28
sp      0x1721eab70
pc      0x19f024764
fault   0x19f024764
Originally created by @spqw on GitHub (Aug 28, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/434 I got a crash with the following logs when running `ollama run phind-codellama:34b-q5_K_M` on Macbook Pro M2 Max with 32GB memory. ``` ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 8 failed with status 5 GGML_ASSERT: ggml-metal.m:1177: false ``` Please find complete logs below. **Questions** 1. How is `ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB` set? 2. Is that a hard limit? 3. Is there a parameter I can tweak to try and ignore this? **Next actions** I will try a more aggressively quantized version and report here. **Complete logs** When running ``` ollama run phind-codellama:34b-q5_K_M ``` Then this happens: ``` llama.cpp: loading model from /Users/lion/.ollama/models/blobs/sha256:454a488edf6348d320b3ba4bc2fdfc98219312e43589b88194fca8ad9b0f1fd0 llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 5504 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_head_kv = 8 llama_model_load_internal: n_layer = 48 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 8 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: freq_base = 100000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 17 (mostly Q5_K - Medium) llama_model_load_internal: model size = 34B llama_model_load_internal: ggml ctx size = 0.13 MB llama_model_load_internal: mem required = 23392.87 MB (+ 384.00 MB per state) llama_new_context_with_model: kv self size = 384.00 MB ggml_metal_init: allocating ggml_metal_init: using MPS ggml_metal_init: loading '/opt/homebrew/Cellar/ollama/0.0.16/bin/ggml-metal.metal' ggml_metal_init: loaded kernel_add 0x12a407f50 ggml_metal_init: loaded kernel_add_row 0x12a4084d0 ggml_metal_init: loaded kernel_mul 0x12a408a10 ggml_metal_init: loaded kernel_mul_row 0x12a409060 ggml_metal_init: loaded kernel_scale 0x12a4095a0 ggml_metal_init: loaded kernel_silu 0x12a409ae0 ggml_metal_init: loaded kernel_relu 0x12a40a020 ggml_metal_init: loaded kernel_gelu 0x12a40a560 ggml_metal_init: loaded kernel_soft_max 0x12a40ac30 ggml_metal_init: loaded kernel_diag_mask_inf 0x12a40b2b0 ggml_metal_init: loaded kernel_get_rows_f16 0x12a40b950 ggml_metal_init: loaded kernel_get_rows_q4_0 0x12a40c110 ggml_metal_init: loaded kernel_get_rows_q4_1 0x12a40c7b0 ggml_metal_init: loaded kernel_get_rows_q2_K 0x12a305190 ggml_metal_init: loaded kernel_get_rows_q3_K 0x12a305950 ggml_metal_init: loaded kernel_get_rows_q4_K 0x12a305ff0 ggml_metal_init: loaded kernel_get_rows_q5_K 0x12a306690 ggml_metal_init: loaded kernel_get_rows_q6_K 0x12a306d30 ggml_metal_init: loaded kernel_rms_norm 0x12a307410 ggml_metal_init: loaded kernel_norm 0x12a307d70 ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x12a308640 ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x12a308d20 ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x12a309400 ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x12a309c60 ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x12a30a340 ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x12a30aa20 ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x12a30b0e0 ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x12a30b9a0 ggml_metal_init: loaded kernel_rope 0x12a30bee0 ggml_metal_init: loaded kernel_alibi_f32 0x12a30ca20 ggml_metal_init: loaded kernel_cpy_f32_f16 0x12a30d2d0 ggml_metal_init: loaded kernel_cpy_f32_f32 0x12a30db80 ggml_metal_init: loaded kernel_cpy_f16_f16 0x12a30e310 ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: max tensor size = 205.08 MB ggml_metal_add_buffer: allocated 'data ' buffer, size = 16384.00 MB, offs = 0 ggml_metal_add_buffer: allocated 'data ' buffer, size = 6555.28 MB, offs = 16964812800, (22939.73 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'eval ' buffer, size = 16.17 MB, (22955.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'kv ' buffer, size = 386.00 MB, (23341.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 387.00 MB, (23728.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 256.00 MB, (23984.91 / 21845.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 8 failed with status 5 GGML_ASSERT: ggml-metal.m:1177: false SIGABRT: abort PC=0x19f024764 m=10 sigcode=0 signal arrived during cgo execution goroutine 27 [syscall]: runtime.cgocall(0x102aa5070, 0x14000492e28) runtime/cgocall.go:157 +0x44 fp=0x14000492df0 sp=0x14000492db0 pc=0x10247e2d4 github.com/jmorganca/ollama/llm._Cfunc_llama_eval(0x14c013600, 0x140003ff0f0, 0x1, 0x0, 0xc) _cgo_gotypes.go:216 +0x34 fp=0x14000492e20 sp=0x14000492df0 pc=0x1028098c4 github.com/jmorganca/ollama/llm.newLlama.func6(0x14000000001?, {0x140003ff0f0, 0x1, 0x0?}, 0x0?) github.com/jmorganca/ollama/llm/llama.go:293 +0x7c fp=0x14000492e70 sp=0x14000492e20 pc=0x10280ac1c github.com/jmorganca/ollama/llm.newLlama({0x14000096150, 0x68}, {0x0, 0x0, 0x0?}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...}) github.com/jmorganca/ollama/llm/llama.go:293 +0x500 fp=0x140004930e0 sp=0x14000492e70 pc=0x10280aac0 github.com/jmorganca/ollama/llm.New({0x14000096150, 0x68}, {0x0, 0x0, 0x0}, {0xffffffffffffffff, 0x0, 0x800, 0xffffffffffffffff, 0x200, ...}) github.com/jmorganca/ollama/llm/llm.go:70 +0x408 fp=0x14000493270 sp=0x140004930e0 pc=0x102809348 github.com/jmorganca/ollama/server.load(0x140000fe510, 0x102c7d3c0?, 0x14000136ae0?) github.com/jmorganca/ollama/server/routes.go:82 +0x39c fp=0x14000493550 sp=0x14000493270 pc=0x102a98a8c github.com/jmorganca/ollama/server.GenerateHandler(0x14000482100) github.com/jmorganca/ollama/server/routes.go:154 +0x2e8 fp=0x14000493760 sp=0x14000493550 pc=0x102a99188 github.com/gin-gonic/gin.(*Context).Next(...) github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x14000482100) github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x80 fp=0x140004937b0 sp=0x14000493760 pc=0x102a807b0 github.com/gin-gonic/gin.(*Context).Next(...) github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.LoggerWithConfig.func1(0x14000482100) github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xb0 fp=0x14000493960 sp=0x140004937b0 pc=0x102a7fb50 github.com/gin-gonic/gin.(*Context).Next(...) github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x1400036cd00, 0x14000482100) github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x524 fp=0x14000493af0 sp=0x14000493960 pc=0x102a7ec84 github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x1400036cd00, {0x102d82050?, 0x140003d40e0}, 0x14000482200) github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1a0 fp=0x14000493b30 sp=0x14000493af0 pc=0x102a7e5d0 net/http.serverHandler.ServeHTTP({0x102d801e0?}, {0x102d82050?, 0x140003d40e0?}, 0x6?) net/http/server.go:2938 +0xbc fp=0x14000493b60 sp=0x14000493b30 pc=0x10270a38c net/http.(*conn).serve(0x140000ff0e0, {0x102d83838, 0x14000403ef0}) net/http/server.go:2009 +0x518 fp=0x14000493fa0 sp=0x14000493b60 pc=0x102706788 net/http.(*Server).Serve.func3() net/http/server.go:3086 +0x30 fp=0x14000493fd0 sp=0x14000493fa0 pc=0x10270aaa0 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000493fd0 sp=0x14000493fd0 pc=0x1024e3c04 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3086 +0x4cc goroutine 1 [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x1400051b5b0 sp=0x1400051b590 pc=0x1024b28b8 runtime.netpollblock(0x1400051b648?, 0x2574664?, 0x1?) runtime/netpoll.go:564 +0x158 fp=0x1400051b5f0 sp=0x1400051b5b0 pc=0x1024abfa8 internal/poll.runtime_pollWait(0x12a0dfba0, 0x72) runtime/netpoll.go:343 +0xa0 fp=0x1400051b620 sp=0x1400051b5f0 pc=0x1024dd7b0 internal/poll.(*pollDesc).wait(0x1400040a680?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400051b650 sp=0x1400051b620 pc=0x10256fcc8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x1400040a680) internal/poll/fd_unix.go:611 +0x250 fp=0x1400051b700 sp=0x1400051b650 pc=0x102574750 net.(*netFD).accept(0x1400040a680) net/fd_unix.go:172 +0x28 fp=0x1400051b7c0 sp=0x1400051b700 pc=0x1025d5e88 net.(*TCPListener).accept(0x140003e79c0) net/tcpsock_posix.go:152 +0x28 fp=0x1400051b7f0 sp=0x1400051b7c0 pc=0x1025e9ef8 net.(*TCPListener).Accept(0x140003e79c0) net/tcpsock.go:315 +0x2c fp=0x1400051b830 sp=0x1400051b7f0 pc=0x1025e90dc net/http.(*onceCloseListener).Accept(0x140000ff0e0?) <autogenerated>:1 +0x30 fp=0x1400051b850 sp=0x1400051b830 pc=0x10272c3c0 net/http.(*Server).Serve(0x14000332ff0, {0x102d81e40, 0x140003e79c0}) net/http/server.go:3056 +0x2b8 fp=0x1400051b980 sp=0x1400051b850 pc=0x10270a748 github.com/jmorganca/ollama/server.Serve({0x102d81e40, 0x140003e79c0}, {0x0, 0x0, 0x0}) github.com/jmorganca/ollama/server/routes.go:457 +0x6cc fp=0x1400051bc50 sp=0x1400051b980 pc=0x102a9c86c github.com/jmorganca/ollama/cmd.RunServer(0x1400042e200?, {0x102afd6d9?, 0x4?, 0x102afd699?}) github.com/jmorganca/ollama/cmd/cmd.go:621 +0x1f4 fp=0x1400051bd10 sp=0x1400051bc50 pc=0x102aa2c04 github.com/spf13/cobra.(*Command).execute(0x140003c5500, {0x103291940, 0x0, 0x0}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x658 fp=0x1400051be50 sp=0x1400051bd10 pc=0x1027b3eb8 github.com/spf13/cobra.(*Command).ExecuteC(0x140003c4c00) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x1400051bf10 sp=0x1400051be50 pc=0x1027b45e0 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/jmorganca/ollama/main.go:11 +0x54 fp=0x1400051bf30 sp=0x1400051bf10 pc=0x102aa47d4 runtime.main() runtime/proc.go:267 +0x2bc fp=0x1400051bfd0 sp=0x1400051bf30 pc=0x1024b248c runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x1400051bfd0 sp=0x1400051bfd0 pc=0x1024e3c04 goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000056f90 sp=0x14000056f70 pc=0x1024b28b8 runtime.goparkunlock(...) runtime/proc.go:404 runtime.forcegchelper() runtime/proc.go:322 +0xb8 fp=0x14000056fd0 sp=0x14000056f90 pc=0x1024b2748 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000056fd0 sp=0x14000056fd0 pc=0x1024e3c04 created by runtime.init.6 in goroutine 1 runtime/proc.go:310 +0x24 goroutine 18 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000052760 sp=0x14000052740 pc=0x1024b28b8 runtime.goparkunlock(...) runtime/proc.go:404 runtime.bgsweep(0x0?) runtime/mgcsweep.go:321 +0x108 fp=0x140000527b0 sp=0x14000052760 pc=0x10249f0b8 runtime.gcenable.func1() runtime/mgc.go:200 +0x28 fp=0x140000527d0 sp=0x140000527b0 pc=0x102493b08 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000527d0 sp=0x140000527d0 pc=0x1024e3c04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:200 +0x6c goroutine 19 [GC scavenge wait]: runtime.gopark(0x14000096000?, 0x102c2a228?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000052f50 sp=0x14000052f30 pc=0x1024b28b8 runtime.goparkunlock(...) runtime/proc.go:404 runtime.(*scavengerState).park(0x1031d1c00) runtime/mgcscavenge.go:425 +0x5c fp=0x14000052f80 sp=0x14000052f50 pc=0x10249c8ac runtime.bgscavenge(0x0?) runtime/mgcscavenge.go:658 +0xac fp=0x14000052fb0 sp=0x14000052f80 pc=0x10249ce6c runtime.gcenable.func2() runtime/mgc.go:201 +0x28 fp=0x14000052fd0 sp=0x14000052fb0 pc=0x102493aa8 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000052fd0 sp=0x14000052fd0 pc=0x1024e3c04 created by runtime.gcenable in goroutine 1 runtime/mgc.go:201 +0xac goroutine 20 [finalizer wait]: runtime.gopark(0x1400008a820?, 0x1a0?, 0xe8?, 0x65?, 0x10276947c?) runtime/proc.go:398 +0xc8 fp=0x14000056580 sp=0x14000056560 pc=0x1024b28b8 runtime.runfinq() runtime/mfinal.go:193 +0x108 fp=0x140000567d0 sp=0x14000056580 pc=0x102492bf8 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000567d0 sp=0x140000567d0 pc=0x1024e3c04 created by runtime.createfing in goroutine 1 runtime/mfinal.go:163 +0x80 goroutine 3 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000057730 sp=0x14000057710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000577d0 sp=0x14000057730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000577d0 sp=0x140000577d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 24 [GC worker (idle)]: runtime.gopark(0x1?, 0x140003d9310?, 0xa8?, 0x37?, 0x102700c48?) runtime/proc.go:398 +0xc8 fp=0x14000053730 sp=0x14000053710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000537d0 sp=0x14000053730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000537d0 sp=0x140000537d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 4 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000057f30 sp=0x14000057f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000057fd0 sp=0x14000057f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000057fd0 sp=0x14000057fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 34 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000506730 sp=0x14000506710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140005067d0 sp=0x14000506730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140005067d0 sp=0x140005067d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 35 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000506f30 sp=0x14000506f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000506fd0 sp=0x14000506f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000506fd0 sp=0x14000506fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 5 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000058730 sp=0x14000058710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000587d0 sp=0x14000058730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000587d0 sp=0x140000587d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 36 [GC worker (idle)]: runtime.gopark(0x2625f613a5?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000507730 sp=0x14000507710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140005077d0 sp=0x14000507730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140005077d0 sp=0x140005077d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 37 [GC worker (idle)]: runtime.gopark(0x2625f5c48b?, 0x3?, 0xe2?, 0xf7?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000507f30 sp=0x14000507f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000507fd0 sp=0x14000507f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000507fd0 sp=0x14000507fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 25 [GC worker (idle)]: runtime.gopark(0x2625f5bbea?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000053f30 sp=0x14000053f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000053fd0 sp=0x14000053f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000053fd0 sp=0x14000053fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 6 [GC worker (idle)]: runtime.gopark(0x103293620?, 0x1?, 0xb7?, 0x85?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000058f30 sp=0x14000058f10 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x14000058fd0 sp=0x14000058f30 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x14000058fd0 sp=0x14000058fd0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 7 [GC worker (idle)]: runtime.gopark(0x2625f4e46c?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000059730 sp=0x14000059710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000597d0 sp=0x14000059730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000597d0 sp=0x140000597d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 26 [GC worker (idle)]: runtime.gopark(0x2625f4d8dd?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:398 +0xc8 fp=0x14000054730 sp=0x14000054710 pc=0x1024b28b8 runtime.gcBgMarkWorker() runtime/mgc.go:1293 +0xd8 fp=0x140000547d0 sp=0x14000054730 pc=0x102495758 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140000547d0 sp=0x140000547d0 pc=0x1024e3c04 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1217 +0x28 goroutine 9 [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1024f8a90?) runtime/proc.go:398 +0xc8 fp=0x14000508540 sp=0x14000508520 pc=0x1024b28b8 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:564 +0x158 fp=0x14000508580 sp=0x14000508540 pc=0x1024abfa8 internal/poll.runtime_pollWait(0x12a0dfaa8, 0x72) runtime/netpoll.go:343 +0xa0 fp=0x140005085b0 sp=0x14000508580 pc=0x1024dd7b0 internal/poll.(*pollDesc).wait(0x1400007e000?, 0x140004348e1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005085e0 sp=0x140005085b0 pc=0x10256fcc8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400007e000, {0x140004348e1, 0x1, 0x1}) internal/poll/fd_unix.go:164 +0x200 fp=0x14000508680 sp=0x140005085e0 pc=0x102571010 net.(*netFD).Read(0x1400007e000, {0x140004348e1?, 0x102ccfaa0?, 0x102ce4c80?}) net/fd_posix.go:55 +0x28 fp=0x140005086d0 sp=0x14000508680 pc=0x1025d4278 net.(*conn).Read(0x140000aee90, {0x140004348e1?, 0x1?, 0x140003d8050?}) net/net.go:179 +0x34 fp=0x14000508720 sp=0x140005086d0 pc=0x1025e1744 net.(*TCPConn).Read(0x140004348d0?, {0x140004348e1?, 0x140003d8050?, 0x0?}) <autogenerated>:1 +0x2c fp=0x14000508750 sp=0x14000508720 pc=0x1025f2ddc net/http.(*connReader).backgroundRead(0x140004348d0) net/http/server.go:683 +0x40 fp=0x140005087b0 sp=0x14000508750 pc=0x102700d20 net/http.(*connReader).startBackgroundRead.func2() net/http/server.go:679 +0x28 fp=0x140005087d0 sp=0x140005087b0 pc=0x102700c48 runtime.goexit() runtime/asm_arm64.s:1197 +0x4 fp=0x140005087d0 sp=0x140005087d0 pc=0x1024e3c04 created by net/http.(*connReader).startBackgroundRead in goroutine 27 net/http/server.go:679 +0xc8 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x0 r5 0x1721eabe0 r6 0xa r7 0x0 r8 0xd49f50ddc5c353f6 r9 0xd49f50dcb7dde3f6 r10 0x2 r11 0xfffffffd r12 0x10000000000 r13 0x0 r14 0x0 r15 0x0 r16 0x148 r17 0x1fec033a0 r18 0x0 r19 0x6 r20 0x1721eb000 r21 0x1803 r22 0x1721eb0e0 r23 0x14d800020 r24 0x60000187f780 r25 0xc r26 0x6f4 r27 0x600003657c20 r28 0xc r29 0x1721eab90 lr 0x19f05bc28 sp 0x1721eab70 pc 0x19f024764 fault 0x19f024764 ```
GiteaMirror added the bug label 2026-04-27 23:38:30 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Aug 30, 2023):

@spqw Would it be possible to see if you are still having issues with the latest Ollama (v0.0.17)? Sorry about that.

If you still have trouble with the latest version, please feel free to re-open this!

Questions:

  1. That is the recommendation by Apple; not set by Ollama; set based on what is loaded into the GPU
  2. No, there is no software hard limit
  3. You can reduce the context size, which will reduce the amount of memory it uses. If you are off by a lot, it'll still error.

In the updated version, if you are way over memory limit, we'll warn you.

<!-- gh-comment-id:1699837957 --> @mchiang0610 commented on GitHub (Aug 30, 2023): @spqw Would it be possible to see if you are still having issues with the latest Ollama (v0.0.17)? Sorry about that. If you still have trouble with the latest version, please feel free to re-open this! Questions: 1. That is the recommendation by Apple; not set by Ollama; set based on what is loaded into the GPU 2. No, there is no software hard limit 3. You can reduce the context size, which will reduce the amount of memory it uses. If you are off by a lot, it'll still error. In the updated version, if you are way over memory limit, we'll warn you.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46713