[GH-ISSUE #15228] Ollama recently stopped being able to run a 70b model #56250

Open
opened 2026-04-29 10:28:52 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @micseydel on GitHub (Apr 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15228

What is the issue?

I have a 64gb M4 Mac I've been running llama3.3:70b on for some time. It last worked on 2026-03-07 and failed on 2026-03-11. I suspect I updated Ollama but don't track it and I'm not sure where to look in the logs. Here's an example failure:

micseydel@Mac ~ % ollama run llama3.3:70b
>>> what color is the sky?
Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

and I've included the contents of ~/.ollama/logs/server.log after hitting enter above in Relevant log output.

Relevant log output

ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml-metal-context.m:235: fatal error
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x0000000101362ae4 ggml_print_backtrace + 276
1   ollama                              0x0000000101362cd0 ggml_abort + 156
2   ollama                              0x00000001015cb340 ggml_metal_synchronize + 208
3   ollama                              0x0000000101381ae0 ggml_backend_sched_graph_compute_async + 924
4   ollama                              0x00000001013f7888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160
5   ollama                              0x00000001013f7538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588
6   ollama                              0x00000001013f8c04 _ZN13llama_context6decodeERK11llama_batch + 1556
7   ollama                              0x00000001013fd4a0 llama_decode + 20
8   ollama                              0x000000010131b3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72
9   ollama                              0x000000010044b20c ollama + 520716
SIGABRT: abort
PC=0x183ac4388 m=4 sigcode=0
signal arrived during cgo execution

goroutine 66 gp=0x140004841c0 m=4 mp=0x14000100008 [syscall]:
runtime.cgocall(0x10131b398, 0x14000080b58)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000080b20 sp=0x14000080ae0 pc=0x10043f974
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12a504760, {0x10, 0x168008200, 0x0, 0x168008a00, 0x168009200, 0x168009a00, 0x12ba04080})
	_cgo_gotypes.go:685 +0x34 fp=0x14000080b50 sp=0x14000080b20 pc=0x100890c44
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	/Users/runner/work/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x1004432f8?)
	/Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000080c40 sp=0x14000080b50 pc=0x100893008
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000590140, 0x1400025e230, 0x14000253f18)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000080ed0 sp=0x14000080c40 pc=0x100934058
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000590140, {0x101be1a30, 0x1400058e0a0})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000080fa0 sp=0x14000080ed0 pc=0x100933d04
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000080fd0 sp=0x14000080fa0 pc=0x100938210
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000080fd0 sp=0x14000080fd0 pc=0x10044b414
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c

goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000523710 sp=0x140005236f0 pc=0x100442e98
runtime.netpollblock(0x140004a37a8?, 0x4c77d0?, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000523750 sp=0x14000523710 pc=0x1004088f8
internal/poll.runtime_pollWait(0x12a297dd0, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000523780 sp=0x14000523750 pc=0x100442050
internal/poll.(*pollDesc).wait(0x1400058c100?, 0x1004c9a38?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005237b0 sp=0x14000523780 pc=0x1004c2fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x1400058c100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000523860 sp=0x140005237b0 pc=0x1004c78bc
net.(*netFD).accept(0x1400058c100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000523920 sp=0x14000523860 pc=0x100537b28
net.(*TCPListener).accept(0x1400058a080)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000523970 sp=0x14000523920 pc=0x10054c304
net.(*TCPListener).Accept(0x1400058a080)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005239b0 sp=0x14000523970 pc=0x10054b2ec
net/http.(*onceCloseListener).Accept(0x140005a2090?)
	<autogenerated>:1 +0x30 fp=0x140005239d0 sp=0x140005239b0 pc=0x100734cc0
net/http.(*Server).Serve(0x1400012c800, {0x101bdefc0, 0x1400058a080})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000523b00 sp=0x140005239d0 pc=0x10070e400
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000523cd0 sp=0x14000523b00 pc=0x100937fec
github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?})
	/Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000523d10 sp=0x14000523cd0 pc=0x100a746fc
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035600?, {0x101620986?, 0x4?, 0x10162098a?})
	/Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000523d40 sp=0x14000523d10 pc=0x101179714
github.com/spf13/cobra.(*Command).execute(0x1400030bb08, {0x1400028f9c0, 0x4, 0x4})
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000523e60 sp=0x14000523d40 pc=0x1005a69c8
github.com/spf13/cobra.(*Command).ExecuteC(0x140000f8908)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000523f20 sp=0x14000523e60 pc=0x1005a7110
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000523f40 sp=0x14000523f20 pc=0x10117ae94
runtime.main()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000523fd0 sp=0x14000523f40 pc=0x10040f464
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000523fd0 sp=0x14000523fd0 pc=0x10044b414

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100442e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.forcegchelper()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10040f7b8
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x10044b414
created by runtime.init.7 in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x100442e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.bgsweep(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1003fa898
runtime.gcenable.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1003ee698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x10044b414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x101840360?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x100442e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x10268f960)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1003f832c
runtime.bgscavenge(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1003f88cc
runtime.gcenable.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1003ee638
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x10044b414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x180006c5c8?, 0x1293d9b88?, 0xc0?, 0x45?, 0x1c0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x100442e98
runtime.runfinq()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x1003ed698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x10044b414
created by runtime.createfing in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80

goroutine 34 gp=0x140002f01c0 m=nil [chan receive]:
runtime.gopark(0x140002a9220?, 0x1400031c180?, 0x48?, 0x87?, 0x10050bc58?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x100442e98
runtime.chanrecv(0x140002f81c0, 0x0, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x1003dfa0c
runtime.chanrecv1(0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1003df5a4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1003f18bc
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x10044b414
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78

goroutine 35 gp=0x140002f0380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x140002f0540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250f10 sp=0x14000250ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000250fb0 sp=0x14000250f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000250fd0 sp=0x14000250fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000250fd0 sp=0x14000250fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 21 gp=0x14000103340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251710 sp=0x140002516f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002517b0 sp=0x14000251710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002517d0 sp=0x140002517b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002517d0 sp=0x140002517d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 22 gp=0x14000103500 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf41a057200f?, 0x3?, 0x22?, 0x17?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251f10 sp=0x14000251ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000251fb0 sp=0x14000251f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000251fd0 sp=0x14000251fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000251fd0 sp=0x14000251fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x140002f0a80 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf41a0571a5d?, 0x3?, 0x61?, 0xdc?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 38 gp=0x140002f0c40 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf41a0567389?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 50 gp=0x14000484000 m=nil [GC worker (idle)]:
runtime.gopark(0x1026dd000?, 0x1?, 0x76?, 0x16?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x1026dd000?, 0x1?, 0x2e?, 0xad?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 67 gp=0x14000484380 m=nil [select]:
runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x100442e98
runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x100422ad4
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000590140, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1009359dc
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x101bdf1a0?, 0x1400052e700?}, 0x14000045b28?)
	<autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x100938600
net/http.HandlerFunc.ServeHTTP(0x14000594000?, {0x101bdf1a0?, 0x1400052e700?}, 0x14000045b10?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10070ae28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x10070c9b4
net/http.serverHandler.ServeHTTP({0x101bdb230?}, {0x101bdf1a0?, 0x1400052e700?}, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10072869c
net/http.(*conn).serve(0x140005a2090, {0x101be19f8, 0x14000588360})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x1007095cc
net/http.(*Server).Serve.gowrap3()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x10070e790
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x10044b414
created by net/http.(*Server).Serve in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8

goroutine 32 gp=0x140002f1180 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100466c30?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000253580 sp=0x14000253560 pc=0x100442e98
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140002535c0 sp=0x14000253580 pc=0x1004088f8
internal/poll.runtime_pollWait(0x12a297cb8, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140002535f0 sp=0x140002535c0 pc=0x100442050
internal/poll.(*pollDesc).wait(0x1400058c180?, 0x14000412041?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000253620 sp=0x140002535f0 pc=0x1004c2fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400058c180, {0x14000412041, 0x1, 0x1})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x140002536c0 sp=0x14000253620 pc=0x1004c429c
net.(*netFD).Read(0x1400058c180, {0x14000412041?, 0x14000253758?, 0x100704044?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000253710 sp=0x140002536c0 pc=0x1005360f8
net.(*conn).Read(0x14000070030, {0x14000412041?, 0x0?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000253760 sp=0x14000253710 pc=0x100542fc4
net/http.(*connReader).backgroundRead(0x14000412030)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x140002537b0 sp=0x14000253760 pc=0x100703f40
net/http.(*connReader).startBackgroundRead.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x140002537d0 sp=0x140002537b0 pc=0x100703e28
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002537d0 sp=0x140002537d0 pc=0x10044b414
created by net/http.(*connReader).startBackgroundRead in goroutine 67
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x183a09df8
r5      0x171a4b650
r6      0x36
r7      0x0
r8      0x46aa15ae81e80719
r9      0x46aa15aff04d3719
r10     0x3bb
r11     0x6
r12     0x6
r13     0x171a4b382
r14     0x1023263a8
r15     0x1
r16     0x148
r17     0x1f3a40ac0
r18     0x0
r19     0x6
r20     0xc03
r21     0x171a530e0
r22     0x0
r23     0x2
r24     0x12a504d68
r25     0x171a4ca08
r26     0xcf05e82c0
r27     0xcf05e8000
r28     0x1
r29     0x171a4bf40
lr      0x183afd88c
sp      0x171a4bf20
pc      0x183ac4388
fault   0x183ac4388
time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50202/completion\": EOF"
[GIN] 2026/04/02 - 10:42:35 | 500 | 20.477471958s |       127.0.0.1 | POST     "/api/chat"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.19.0

Originally created by @micseydel on GitHub (Apr 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15228 ### What is the issue? I have a 64gb M4 Mac I've been running llama3.3:70b on for some time. It last worked on 2026-03-07 and failed on 2026-03-11. I suspect I updated Ollama but don't track it and I'm not sure where to look in the logs. Here's an example failure: ``` micseydel@Mac ~ % ollama run llama3.3:70b >>> what color is the sky? Error: 500 Internal Server Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details ``` and I've included the contents of ~/.ollama/logs/server.log after hitting enter above in Relevant log output. ### Relevant log output ```shell ggml_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) ggml-metal-context.m:235: fatal error WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x0000000101362ae4 ggml_print_backtrace + 276 1 ollama 0x0000000101362cd0 ggml_abort + 156 2 ollama 0x00000001015cb340 ggml_metal_synchronize + 208 3 ollama 0x0000000101381ae0 ggml_backend_sched_graph_compute_async + 924 4 ollama 0x00000001013f7888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160 5 ollama 0x00000001013f7538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588 6 ollama 0x00000001013f8c04 _ZN13llama_context6decodeERK11llama_batch + 1556 7 ollama 0x00000001013fd4a0 llama_decode + 20 8 ollama 0x000000010131b3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72 9 ollama 0x000000010044b20c ollama + 520716 SIGABRT: abort PC=0x183ac4388 m=4 sigcode=0 signal arrived during cgo execution goroutine 66 gp=0x140004841c0 m=4 mp=0x14000100008 [syscall]: runtime.cgocall(0x10131b398, 0x14000080b58) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000080b20 sp=0x14000080ae0 pc=0x10043f974 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12a504760, {0x10, 0x168008200, 0x0, 0x168008a00, 0x168009200, 0x168009a00, 0x12ba04080}) _cgo_gotypes.go:685 +0x34 fp=0x14000080b50 sp=0x14000080b20 pc=0x100890c44 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) /Users/runner/work/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x1004432f8?) /Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000080c40 sp=0x14000080b50 pc=0x100893008 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000590140, 0x1400025e230, 0x14000253f18) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000080ed0 sp=0x14000080c40 pc=0x100934058 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000590140, {0x101be1a30, 0x1400058e0a0}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000080fa0 sp=0x14000080ed0 pc=0x100933d04 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000080fd0 sp=0x14000080fa0 pc=0x100938210 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000080fd0 sp=0x14000080fd0 pc=0x10044b414 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000523710 sp=0x140005236f0 pc=0x100442e98 runtime.netpollblock(0x140004a37a8?, 0x4c77d0?, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000523750 sp=0x14000523710 pc=0x1004088f8 internal/poll.runtime_pollWait(0x12a297dd0, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000523780 sp=0x14000523750 pc=0x100442050 internal/poll.(*pollDesc).wait(0x1400058c100?, 0x1004c9a38?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005237b0 sp=0x14000523780 pc=0x1004c2fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x1400058c100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000523860 sp=0x140005237b0 pc=0x1004c78bc net.(*netFD).accept(0x1400058c100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000523920 sp=0x14000523860 pc=0x100537b28 net.(*TCPListener).accept(0x1400058a080) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000523970 sp=0x14000523920 pc=0x10054c304 net.(*TCPListener).Accept(0x1400058a080) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005239b0 sp=0x14000523970 pc=0x10054b2ec net/http.(*onceCloseListener).Accept(0x140005a2090?) <autogenerated>:1 +0x30 fp=0x140005239d0 sp=0x140005239b0 pc=0x100734cc0 net/http.(*Server).Serve(0x1400012c800, {0x101bdefc0, 0x1400058a080}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000523b00 sp=0x140005239d0 pc=0x10070e400 github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000523cd0 sp=0x14000523b00 pc=0x100937fec github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?}) /Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000523d10 sp=0x14000523cd0 pc=0x100a746fc github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035600?, {0x101620986?, 0x4?, 0x10162098a?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000523d40 sp=0x14000523d10 pc=0x101179714 github.com/spf13/cobra.(*Command).execute(0x1400030bb08, {0x1400028f9c0, 0x4, 0x4}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000523e60 sp=0x14000523d40 pc=0x1005a69c8 github.com/spf13/cobra.(*Command).ExecuteC(0x140000f8908) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000523f20 sp=0x14000523e60 pc=0x1005a7110 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000523f40 sp=0x14000523f20 pc=0x10117ae94 runtime.main() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000523fd0 sp=0x14000523f40 pc=0x10040f464 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000523fd0 sp=0x14000523fd0 pc=0x10044b414 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100442e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.forcegchelper() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10040f7b8 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x10044b414 created by runtime.init.7 in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x100442e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.bgsweep(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1003fa898 runtime.gcenable.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1003ee698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x10044b414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x101840360?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x100442e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x10268f960) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1003f832c runtime.bgscavenge(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1003f88cc runtime.gcenable.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1003ee638 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x10044b414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x180006c5c8?, 0x1293d9b88?, 0xc0?, 0x45?, 0x1c0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x100442e98 runtime.runfinq() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x1003ed698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x10044b414 created by runtime.createfing in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80 goroutine 34 gp=0x140002f01c0 m=nil [chan receive]: runtime.gopark(0x140002a9220?, 0x1400031c180?, 0x48?, 0x87?, 0x10050bc58?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x100442e98 runtime.chanrecv(0x140002f81c0, 0x0, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x1003dfa0c runtime.chanrecv1(0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1003df5a4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1003f18bc runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x10044b414 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78 goroutine 35 gp=0x140002f0380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x140002f0540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250f10 sp=0x14000250ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000250fb0 sp=0x14000250f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000250fd0 sp=0x14000250fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000250fd0 sp=0x14000250fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x14000103340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251710 sp=0x140002516f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002517b0 sp=0x14000251710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002517d0 sp=0x140002517b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002517d0 sp=0x140002517d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 22 gp=0x14000103500 m=nil [GC worker (idle)]: runtime.gopark(0x6bf41a057200f?, 0x3?, 0x22?, 0x17?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251f10 sp=0x14000251ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000251fb0 sp=0x14000251f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000251fd0 sp=0x14000251fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000251fd0 sp=0x14000251fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x140002f0a80 m=nil [GC worker (idle)]: runtime.gopark(0x6bf41a0571a5d?, 0x3?, 0x61?, 0xdc?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 38 gp=0x140002f0c40 m=nil [GC worker (idle)]: runtime.gopark(0x6bf41a0567389?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 50 gp=0x14000484000 m=nil [GC worker (idle)]: runtime.gopark(0x1026dd000?, 0x1?, 0x76?, 0x16?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x1026dd000?, 0x1?, 0x2e?, 0xad?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 67 gp=0x14000484380 m=nil [select]: runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x100442e98 runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x100422ad4 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000590140, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1009359dc github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x101bdf1a0?, 0x1400052e700?}, 0x14000045b28?) <autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x100938600 net/http.HandlerFunc.ServeHTTP(0x14000594000?, {0x101bdf1a0?, 0x1400052e700?}, 0x14000045b10?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10070ae28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x10070c9b4 net/http.serverHandler.ServeHTTP({0x101bdb230?}, {0x101bdf1a0?, 0x1400052e700?}, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10072869c net/http.(*conn).serve(0x140005a2090, {0x101be19f8, 0x14000588360}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x1007095cc net/http.(*Server).Serve.gowrap3() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x10070e790 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x10044b414 created by net/http.(*Server).Serve in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8 goroutine 32 gp=0x140002f1180 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100466c30?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000253580 sp=0x14000253560 pc=0x100442e98 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140002535c0 sp=0x14000253580 pc=0x1004088f8 internal/poll.runtime_pollWait(0x12a297cb8, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140002535f0 sp=0x140002535c0 pc=0x100442050 internal/poll.(*pollDesc).wait(0x1400058c180?, 0x14000412041?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000253620 sp=0x140002535f0 pc=0x1004c2fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400058c180, {0x14000412041, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x140002536c0 sp=0x14000253620 pc=0x1004c429c net.(*netFD).Read(0x1400058c180, {0x14000412041?, 0x14000253758?, 0x100704044?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000253710 sp=0x140002536c0 pc=0x1005360f8 net.(*conn).Read(0x14000070030, {0x14000412041?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000253760 sp=0x14000253710 pc=0x100542fc4 net/http.(*connReader).backgroundRead(0x14000412030) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x140002537b0 sp=0x14000253760 pc=0x100703f40 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x140002537d0 sp=0x140002537b0 pc=0x100703e28 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002537d0 sp=0x140002537d0 pc=0x10044b414 created by net/http.(*connReader).startBackgroundRead in goroutine 67 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x183a09df8 r5 0x171a4b650 r6 0x36 r7 0x0 r8 0x46aa15ae81e80719 r9 0x46aa15aff04d3719 r10 0x3bb r11 0x6 r12 0x6 r13 0x171a4b382 r14 0x1023263a8 r15 0x1 r16 0x148 r17 0x1f3a40ac0 r18 0x0 r19 0x6 r20 0xc03 r21 0x171a530e0 r22 0x0 r23 0x2 r24 0x12a504d68 r25 0x171a4ca08 r26 0xcf05e82c0 r27 0xcf05e8000 r28 0x1 r29 0x171a4bf40 lr 0x183afd88c sp 0x171a4bf20 pc 0x183ac4388 fault 0x183ac4388 time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50202/completion\": EOF" [GIN] 2026/04/02 - 10:42:35 | 500 | 20.477471958s | 127.0.0.1 | POST "/api/chat" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.19.0
GiteaMirror added the bug label 2026-04-29 10:28:52 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 2, 2026):

Include the log from before the crash.

<!-- gh-comment-id:4179535763 --> @rick-github commented on GitHub (Apr 2, 2026): Include the log from before the crash.
Author
Owner

@micseydel commented on GitHub (Apr 2, 2026):

Here's the full thing all at once:

time=2026-04-02T10:11:41.544-07:00 level=INFO source=routes.go:1742 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/micseydel/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:true OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2026-04-02T10:11:41.545-07:00 level=INFO source=routes.go:1744 msg="Ollama cloud disabled: true"
time=2026-04-02T10:11:41.552-07:00 level=INFO source=images.go:477 msg="total blobs: 86"
time=2026-04-02T10:11:41.552-07:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-04-02T10:11:41.553-07:00 level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)"
time=2026-04-02T10:11:41.553-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-02T10:11:41.554-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 50139"
time=2026-04-02T10:11:41.668-07:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M4 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="48.0 GiB" available="48.0 GiB"
time=2026-04-02T10:11:41.668-07:00 level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="48.0 GiB" default_num_ctx=262144
[GIN] 2026/04/02 - 10:11:41 | 200 |     377.084µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/02 - 10:11:46 | 200 |     156.458µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:11:46 | 200 |  103.326333ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:11:46 | 200 |   53.286083ms |       127.0.0.1 | POST     "/api/show"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-04-02T10:11:47.088-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-04-02T10:11:47.089-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50145"
time=2026-04-02T10:11:47.092-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="47.1 GiB" free_swap="0 B"
time=2026-04-02T10:11:47.092-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:11:47.092-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1
time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB"
time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB"
time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB"
time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB"
time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB"
time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB"
time=2026-04-02T10:11:47.119-07:00 level=INFO source=runner.go:965 msg="starting go runner"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
time=2026-04-02T10:11:47.119-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2026-04-02T10:11:47.193-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50145"
time=2026-04-02T10:11:47.204-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
time=2026-04-02T10:11:47.205-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:11:47.205-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_embd_inp       = 8192
print_info: n_layer          = 80
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 70B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)

load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloaded 30/81 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 40543.11 MiB
load_tensors: Metal_Mapped model buffer size = 39721.13 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 131072
llama_context: n_ctx_seq     = 131072
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M4 Pro
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.52 MiB
llama_kv_cache:        CPU KV buffer size = 25600.00 MiB
llama_kv_cache:      Metal KV buffer size = 15360.00 MiB
llama_kv_cache: size = 40960.00 MiB (131072 cells,  80 layers,  1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      Metal compute buffer size =   328.01 MiB
llama_context:        CPU compute buffer size =   448.01 MiB
llama_context: graph nodes  = 2487
llama_context: graph splits = 503 (with bs=512), 3 (with bs=1)
time=2026-04-02T10:12:09.571-07:00 level=INFO source=server.go:1390 msg="llama runner started in 22.48 seconds"
time=2026-04-02T10:12:09.572-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-02T10:12:09.572-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:12:09.573-07:00 level=INFO source=server.go:1390 msg="llama runner started in 22.48 seconds"
[GIN] 2026/04/02 - 10:12:09 | 200 | 22.747522209s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml-metal-context.m:235: fatal error
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x0000000104ff6ae4 ggml_print_backtrace + 276
1   ollama                              0x0000000104ff6cd0 ggml_abort + 156
2   ollama                              0x000000010525f340 ggml_metal_synchronize + 208
3   ollama                              0x0000000105015ae0 ggml_backend_sched_graph_compute_async + 924
4   ollama                              0x000000010508b888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160
5   ollama                              0x000000010508b538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588
6   ollama                              0x000000010508cc04 _ZN13llama_context6decodeERK11llama_batch + 1556
7   ollama                              0x00000001050914a0 llama_decode + 20
8   ollama                              0x0000000104faf3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72
9   ollama                              0x00000001040df20c ollama + 520716
SIGABRT: abort
PC=0x183ac4388 m=12 sigcode=0
signal arrived during cgo execution

goroutine 50 gp=0x140004eaa80 m=12 mp=0x14000428808 [syscall]:
runtime.cgocall(0x104faf398, 0x14000294b58)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000294b20 sp=0x14000294ae0 pc=0x1040d3974
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x156e073c0, {0x10, 0x153822c00, 0x0, 0x153823400, 0x153823c00, 0x153809a00, 0x14fc04510})
	_cgo_gotypes.go:685 +0x34 fp=0x14000294b50 sp=0x14000294b20 pc=0x104524c44
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	/Users/runner/work/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0x1400012ca00?, 0x1040d72f8?)
	/Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000294c40 sp=0x14000294b50 pc=0x104527008
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000612140, 0x140000b46e0, 0x1400024ef18)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000294ed0 sp=0x14000294c40 pc=0x1045c8058
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000612140, {0x105875a30, 0x140006100a0})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000294fa0 sp=0x14000294ed0 pc=0x1045c7d04
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000294fd0 sp=0x14000294fa0 pc=0x1045cc210
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000294fd0 sp=0x14000294fd0 pc=0x1040df414
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c

goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140003db710 sp=0x140003db6f0 pc=0x1040d6e98
runtime.netpollblock(0x140003db7a8?, 0x415b7d0?, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140003db750 sp=0x140003db710 pc=0x10409c8f8
internal/poll.runtime_pollWait(0x14e542550, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140003db780 sp=0x140003db750 pc=0x1040d6050
internal/poll.(*pollDesc).wait(0x1400060e100?, 0x10407eccc?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140003db7b0 sp=0x140003db780 pc=0x104156fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x1400060e100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x140003db860 sp=0x140003db7b0 pc=0x10415b8bc
net.(*netFD).accept(0x1400060e100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x140003db920 sp=0x140003db860 pc=0x1041cbb28
net.(*TCPListener).accept(0x140007220c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x140003db970 sp=0x140003db920 pc=0x1041e0304
net.(*TCPListener).Accept(0x140007220c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140003db9b0 sp=0x140003db970 pc=0x1041df2ec
net/http.(*onceCloseListener).Accept(0x1400016dcb0?)
	<autogenerated>:1 +0x30 fp=0x140003db9d0 sp=0x140003db9b0 pc=0x1043c8cc0
net/http.(*Server).Serve(0x1400012c900, {0x105872fc0, 0x140007220c0})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x140003dbb00 sp=0x140003db9d0 pc=0x1043a2400
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x140003dbcd0 sp=0x140003dbb00 pc=0x1045cbfec
github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?})
	/Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x140003dbd10 sp=0x140003dbcd0 pc=0x1047086fc
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035400?, {0x1052b4986?, 0x4?, 0x1052b498a?})
	/Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x140003dbd40 sp=0x140003dbd10 pc=0x104e0d714
github.com/spf13/cobra.(*Command).execute(0x14000313b08, {0x14000533680, 0x4, 0x4})
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x140003dbe60 sp=0x140003dbd40 pc=0x10423a9c8
github.com/spf13/cobra.(*Command).ExecuteC(0x14000236908)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x140003dbf20 sp=0x140003dbe60 pc=0x10423b110
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x140003dbf40 sp=0x140003dbf20 pc=0x104e0ee94
runtime.main()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x140003dbfd0 sp=0x140003dbf40 pc=0x1040a3464
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140003dbfd0 sp=0x140003dbfd0 pc=0x1040df414

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x1040d6e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.forcegchelper()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1040a37b8
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x1040df414
created by runtime.init.7 in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x1040d6e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.bgsweep(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x10408e898
runtime.gcenable.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x104082698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x1040df414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x1054d4360?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x1040d6e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x106323960)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x10408c32c
runtime.bgscavenge(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10408c8cc
runtime.gcenable.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x104082638
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x1040df414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x180006c5c8?, 0x10670db88?, 0xc0?, 0x85?, 0x1c0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x1040d6e98
runtime.runfinq()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x104081698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x1040df414
created by runtime.createfing in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80

goroutine 5 gp=0x14000003a40 m=nil [chan receive]:
runtime.gopark(0x140000b7180?, 0x14000404018?, 0x48?, 0xe7?, 0x10419fc58?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e6f0 sp=0x1400006e6d0 pc=0x1040d6e98
runtime.chanrecv(0x1400003a230, 0x0, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x1400006e770 sp=0x1400006e6f0 pc=0x104073a0c
runtime.chanrecv1(0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x1400006e7a0 sp=0x1400006e770 pc=0x1040735a4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x1400006e7d0 sp=0x1400006e7a0 pc=0x1040858bc
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x1040df414
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78

goroutine 6 gp=0x14000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x14000003dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 34 gp=0x14000306000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068710 sp=0x140000686f0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000687b0 sp=0x14000068710 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x140004ea000 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c7cfb6c?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ff10 sp=0x1400006fef0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006ffb0 sp=0x1400006ff10 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006ffd0 sp=0x1400006ffb0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006ffd0 sp=0x1400006ffd0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 35 gp=0x140003061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c800e88?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 9 gp=0x140004ea1c0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c7ccb3f?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 10 gp=0x140004ea380 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c7ca964?, 0x3?, 0x8a?, 0x5b?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024cf10 sp=0x1400024cef0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024cfb0 sp=0x1400024cf10 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024cfd0 sp=0x1400024cfb0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024cfd0 sp=0x1400024cfd0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 11 gp=0x140004ea540 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c7c8acb?, 0x140004e41e0?, 0x1b?, 0xa?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024d710 sp=0x1400024d6f0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024d7b0 sp=0x1400024d710 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024d7d0 sp=0x1400024d7b0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024d7d0 sp=0x1400024d7d0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 12 gp=0x140004ea700 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c7cf007?, 0x1?, 0x70?, 0x94?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024df10 sp=0x1400024def0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024dfb0 sp=0x1400024df10 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024dfd0 sp=0x1400024dfb0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024dfd0 sp=0x1400024dfd0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 13 gp=0x140004ea8c0 m=nil [GC worker (idle)]:
runtime.gopark(0x106371000?, 0x1?, 0x6c?, 0x52?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024e710 sp=0x1400024e6f0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024e7b0 sp=0x1400024e710 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024e7d0 sp=0x1400024e7b0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024e7d0 sp=0x1400024e7d0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x14000306380 m=nil [GC worker (idle)]:
runtime.gopark(0x6bd9a9c802e45?, 0x3?, 0xd8?, 0xcf?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000083f10 sp=0x14000083ef0 pc=0x1040d6e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000083fb0 sp=0x14000083f10 pc=0x104084b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000083fd0 sp=0x14000083fb0 pc=0x104084a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x1040df414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 51 gp=0x140004eac40 m=nil [select]:
runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x1040d6e98
runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x1040b6ad4
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000612140, {0x1058731a0, 0x140003dc460}, 0x14000174500)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1045c99dc
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1058731a0?, 0x140003dc460?}, 0x14000045b28?)
	<autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x1045cc600
net/http.HandlerFunc.ServeHTTP(0x1400061e000?, {0x1058731a0?, 0x140003dc460?}, 0x14000045b10?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10439ee28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x1058731a0, 0x140003dc460}, 0x14000174500)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1043a09b4
net/http.serverHandler.ServeHTTP({0x10586f230?}, {0x1058731a0?, 0x140003dc460?}, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x1043bc69c
net/http.(*conn).serve(0x1400016dcb0, {0x1058759f8, 0x1400060c390})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x10439d5cc
net/http.(*Server).Serve.gowrap3()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x1043a2790
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x1040df414
created by net/http.(*Server).Serve in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8

goroutine 141 gp=0x140004eafc0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1040fac30?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000332d80 sp=0x14000332d60 pc=0x1040d6e98
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000332dc0 sp=0x14000332d80 pc=0x10409c8f8
internal/poll.runtime_pollWait(0x14e542438, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000332df0 sp=0x14000332dc0 pc=0x1040d6050
internal/poll.(*pollDesc).wait(0x1400060e000?, 0x1400060c3d1?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000332e20 sp=0x14000332df0 pc=0x104156fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400060e000, {0x1400060c3d1, 0x1, 0x1})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x14000332ec0 sp=0x14000332e20 pc=0x10415829c
net.(*netFD).Read(0x1400060e000, {0x1400060c3d1?, 0x14000332f58?, 0x104398044?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000332f10 sp=0x14000332ec0 pc=0x1041ca0f8
net.(*conn).Read(0x14000070060, {0x1400060c3d1?, 0x0?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000332f60 sp=0x14000332f10 pc=0x1041d6fc4
net/http.(*connReader).backgroundRead(0x1400060c3c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x14000332fb0 sp=0x14000332f60 pc=0x104397f40
net/http.(*connReader).startBackgroundRead.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x14000332fd0 sp=0x14000332fb0 pc=0x104397e28
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000332fd0 sp=0x14000332fd0 pc=0x1040df414
created by net/http.(*connReader).startBackgroundRead in goroutine 51
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x183a09df8
r5      0x17981f650
r6      0x36
r7      0x0
r8      0x7b7c0b534c79967d
r9      0x7b7c0b5235fbe67d
r10     0x3bb
r11     0x6
r12     0x6
r13     0x17981f382
r14     0x1023263a8
r15     0x1
r16     0x148
r17     0x1f3a40ac0
r18     0x0
r19     0x6
r20     0x3213
r21     0x1798270e0
r22     0x0
r23     0x2
r24     0x156e079c8
r25     0x179820a08
r26     0xd088402c0
r27     0xd08840000
r28     0x1
r29     0x17981ff40
lr      0x183afd88c
sp      0x17981ff20
pc      0x183ac4388
fault   0x183ac4388
time=2026-04-02T10:12:38.031-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50145/completion\": EOF"
time=2026-04-02T10:12:38.031-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
[GIN] 2026/04/02 - 10:12:38 | 500 | 22.210161209s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/04/02 - 10:40:11 | 200 |      45.292µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:40:11 | 404 |    7.332333ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:40:11 | 200 |  319.062708ms |       127.0.0.1 | POST     "/api/pull"
[GIN] 2026/04/02 - 10:40:19 | 200 |      32.416µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:40:19 | 200 |  100.795667ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:40:19 | 200 |   52.591084ms |       127.0.0.1 | POST     "/api/show"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /Users/micseydel/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 4.33 GiB (4.64 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.8000 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 8.03 B
print_info: general.name     = Meta-Llama-3-8B-Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-04-02T10:40:19.984-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=8192
time=2026-04-02T10:40:19.984-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 50182"
time=2026-04-02T10:40:19.988-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="51.4 GiB" free_swap="0 B"
time=2026-04-02T10:40:19.988-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:40:19.988-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=33 requested=-1
time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="4.1 GiB"
time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="1.0 GiB"
time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="560.0 MiB"
time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:272 msg="total memory" size="5.6 GiB"
time=2026-04-02T10:40:20.013-07:00 level=INFO source=runner.go:965 msg="starting go runner"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.010 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
time=2026-04-02T10:40:20.016-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2026-04-02T10:40:20.095-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50182"
time=2026-04-02T10:40:20.100-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:8192 KvCacheType: NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
time=2026-04-02T10:40:20.100-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:40:20.100-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /Users/micseydel/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 4.33 GiB (4.64 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.8000 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 8192
print_info: n_embd           = 4096
print_info: n_embd_inp       = 4096
print_info: n_layer          = 32
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 14336
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 8192
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 8B
print_info: model params     = 8.03 B
print_info: general.name     = Meta-Llama-3-8B-Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   281.81 MiB
load_tensors: Metal_Mapped model buffer size =  4155.99 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 8192
llama_context: n_ctx_seq     = 8192
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M4 Pro
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.50 MiB
llama_kv_cache:      Metal KV buffer size =  1024.00 MiB
llama_kv_cache: size = 1024.00 MiB (  8192 cells,  32 layers,  1/1 seqs), K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      Metal compute buffer size =   258.50 MiB
llama_context:        CPU compute buffer size =    24.01 MiB
llama_context: graph nodes  = 999
llama_context: graph splits = 2
time=2026-04-02T10:40:22.363-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.37 seconds"
time=2026-04-02T10:40:22.363-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-02T10:40:22.363-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:40:22.363-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.37 seconds"
[GIN] 2026/04/02 - 10:40:22 | 200 |   2.56401975s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2026/04/02 - 10:40:29 | 200 |  5.376164166s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/04/02 - 10:40:40 | 200 |      27.291µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:40:40 | 200 |   98.548667ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:40:40 | 200 |   54.164959ms |       127.0.0.1 | POST     "/api/show"
time=2026-04-02T10:40:40.755-07:00 level=INFO source=sched.go:627 msg="updated VRAM based on existing loaded models" gpu=0 library=Metal total="48.0 GiB" available="42.4 GiB"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-04-02T10:40:40.882-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-04-02T10:40:40.882-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50195"
time=2026-04-02T10:40:40.887-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="46.3 GiB" free_swap="0 B"
time=2026-04-02T10:40:40.887-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="41.9 GiB" free="42.4 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:40:40.887-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1
time=2026-04-02T10:40:40.887-07:00 level=INFO source=server.go:1031 msg="model requires more gpu memory than is currently available, evicting a model to make space" "loaded layers"=41
time=2026-04-02T10:40:40.931-07:00 level=INFO source=runner.go:965 msg="starting go runner"
time=2026-04-02T10:40:40.978-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="49.3 GiB" free_swap="0 B"
time=2026-04-02T10:40:40.978-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:40:40.978-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1
time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB"
time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB"
time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB"
time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB"
time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB"
time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
time=2026-04-02T10:40:40.932-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2026-04-02T10:40:41.012-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50195"
time=2026-04-02T10:40:41.023-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
time=2026-04-02T10:40:41.024-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:40:41.024-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_embd_inp       = 8192
print_info: n_layer          = 80
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 70B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)

load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloaded 30/81 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 40543.11 MiB
load_tensors: Metal_Mapped model buffer size = 39721.13 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 131072
llama_context: n_ctx_seq     = 131072
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M4 Pro
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.52 MiB
llama_kv_cache:        CPU KV buffer size = 25600.00 MiB
llama_kv_cache:      Metal KV buffer size = 15360.00 MiB
llama_kv_cache: size = 40960.00 MiB (131072 cells,  80 layers,  1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      Metal compute buffer size =   328.01 MiB
llama_context:        CPU compute buffer size =   448.01 MiB
llama_context: graph nodes  = 2487
llama_context: graph splits = 503 (with bs=512), 3 (with bs=1)
time=2026-04-02T10:40:45.792-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.91 seconds"
time=2026-04-02T10:40:45.793-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-02T10:40:45.793-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:40:45.793-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.91 seconds"
[GIN] 2026/04/02 - 10:40:45 | 200 |  5.097501292s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml-metal-context.m:235: fatal error
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x000000010512aae4 ggml_print_backtrace + 276
1   ollama                              0x000000010512acd0 ggml_abort + 156
2   ollama                              0x0000000105393340 ggml_metal_synchronize + 208
3   ollama                              0x0000000105149ae0 ggml_backend_sched_graph_compute_async + 924
4   ollama                              0x00000001051bf888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160
5   ollama                              0x00000001051bf538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588
6   ollama                              0x00000001051c0c04 _ZN13llama_context6decodeERK11llama_batch + 1556
7   ollama                              0x00000001051c54a0 llama_decode + 20
8   ollama                              0x00000001050e33e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72
9   ollama                              0x000000010421320c ollama + 520716
SIGABRT: abort
PC=0x183ac4388 m=3 sigcode=0
signal arrived during cgo execution

goroutine 7 gp=0x140002f0000 m=3 mp=0x14000073008 [syscall]:
runtime.cgocall(0x1050e3398, 0x14000083b58)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000083b20 sp=0x14000083ae0 pc=0x104207974
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12fd04760, {0x10, 0x13a829400, 0x0, 0x13a829c00, 0x13a82a400, 0x13a83be00, 0x131604710})
	_cgo_gotypes.go:685 +0x34 fp=0x14000083b50 sp=0x14000083b20 pc=0x104658c44
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	/Users/runner/work/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x10420b2f8?)
	/Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000083c40 sp=0x14000083b50 pc=0x10465b008
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x140000b8140, 0x140002a6320, 0x1400024f718)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000083ed0 sp=0x14000083c40 pc=0x1046fc058
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x140000b8140, {0x1059a9a30, 0x140000b60a0})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000083fa0 sp=0x14000083ed0 pc=0x1046fbd04
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000083fd0 sp=0x14000083fa0 pc=0x104700210
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x104213414
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c

goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000519710 sp=0x140005196f0 pc=0x10420ae98
runtime.netpollblock(0x140004a37a8?, 0x428f7d0?, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000519750 sp=0x14000519710 pc=0x1041d08f8
internal/poll.runtime_pollWait(0x12e5ceed0, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000519780 sp=0x14000519750 pc=0x10420a050
internal/poll.(*pollDesc).wait(0x140000b4100?, 0x104291a38?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005197b0 sp=0x14000519780 pc=0x10428afe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x140000b4100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000519860 sp=0x140005197b0 pc=0x10428f8bc
net.(*netFD).accept(0x140000b4100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000519920 sp=0x14000519860 pc=0x1042ffb28
net.(*TCPListener).accept(0x140000b2080)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000519970 sp=0x14000519920 pc=0x104314304
net.(*TCPListener).Accept(0x140000b2080)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005199b0 sp=0x14000519970 pc=0x1043132ec
net/http.(*onceCloseListener).Accept(0x1400016dcb0?)
	<autogenerated>:1 +0x30 fp=0x140005199d0 sp=0x140005199b0 pc=0x1044fccc0
net/http.(*Server).Serve(0x14000296100, {0x1059a6fc0, 0x140000b2080})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000519b00 sp=0x140005199d0 pc=0x1044d6400
github.com/ollama/ollama/runner/llamarunner.Execute({0x140001101a0, 0x4, 0x4})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000519cd0 sp=0x14000519b00 pc=0x1046fffec
github.com/ollama/ollama/runner.Execute({0x14000110190?, 0x0?, 0x0?})
	/Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000519d10 sp=0x14000519cd0 pc=0x10483c6fc
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035600?, {0x1053e8986?, 0x4?, 0x1053e898a?})
	/Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000519d40 sp=0x14000519d10 pc=0x104f41714
github.com/spf13/cobra.(*Command).execute(0x140004dfb08, {0x140002899c0, 0x4, 0x4})
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000519e60 sp=0x14000519d40 pc=0x10436e9c8
github.com/spf13/cobra.(*Command).ExecuteC(0x1400029a908)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000519f20 sp=0x14000519e60 pc=0x10436f110
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000519f40 sp=0x14000519f20 pc=0x104f42e94
runtime.main()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000519fd0 sp=0x14000519f40 pc=0x1041d7464
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000519fd0 sp=0x14000519fd0 pc=0x104213414

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x10420ae98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.forcegchelper()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1041d77b8
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104213414
created by runtime.init.7 in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24

goroutine 3 gp=0x14000003500 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x10420ae98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.bgsweep(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1041c2898
runtime.gcenable.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1041b6698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104213414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x140000036c0 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x105608360?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x10420ae98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x106457960)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1041c032c
runtime.bgscavenge(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1041c08cc
runtime.gcenable.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1041b6638
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104213414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x180006c5c8?, 0x106841b88?, 0xc0?, 0xc5?, 0x1c0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x10420ae98
runtime.runfinq()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x1041b5698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104213414
created by runtime.createfing in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80

goroutine 34 gp=0x140002f01c0 m=nil [chan receive]:
runtime.gopark(0x140002a9220?, 0x1400031c0c0?, 0x48?, 0x87?, 0x1042d3c58?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x10420ae98
runtime.chanrecv(0x140002f81c0, 0x0, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x1041a7a0c
runtime.chanrecv1(0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1041a75a4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1041b98bc
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104213414
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78

goroutine 35 gp=0x140002f0380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x140002f0540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x140002f0700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 38 gp=0x140002f08c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 39 gp=0x140002f0a80 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf2e4d42ef51?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006af10 sp=0x1400006aef0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006afb0 sp=0x1400006af10 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006afd0 sp=0x1400006afb0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006afd0 sp=0x1400006afd0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 40 gp=0x140002f0c40 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf2e4d433aad?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006b710 sp=0x1400006b6f0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006b7b0 sp=0x1400006b710 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006b7d0 sp=0x1400006b7b0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006b7d0 sp=0x1400006b7d0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 41 gp=0x140002f0e00 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf2e4d437e62?, 0x1?, 0x1c?, 0x1e?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006bf10 sp=0x1400006bef0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006bfb0 sp=0x1400006bf10 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006bfd0 sp=0x1400006bfb0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006bfd0 sp=0x1400006bfd0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 42 gp=0x140002f0fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf2e4d431732?, 0x1?, 0x34?, 0x28?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf2e4d42e02e?, 0x1?, 0xcc?, 0x62?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250f10 sp=0x14000250ef0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000250fb0 sp=0x14000250f10 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000250fd0 sp=0x14000250fb0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000250fd0 sp=0x14000250fd0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf2e4d43274f?, 0x1?, 0xf?, 0x23?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x10420ae98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1041b8b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1041b8a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x104213414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x140002f1500 m=nil [select]:
runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x10420ae98
runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x1041eaad4
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x140000b8140, {0x1059a71a0, 0x140002629a0}, 0x14000455040)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1046fd9dc
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1059a71a0?, 0x140002629a0?}, 0x14000045b28?)
	<autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x104700600
net/http.HandlerFunc.ServeHTTP(0x140000bc000?, {0x1059a71a0?, 0x140002629a0?}, 0x14000045b10?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x1044d2e28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x1059a71a0, 0x140002629a0}, 0x14000455040)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1044d49b4
net/http.serverHandler.ServeHTTP({0x1059a3230?}, {0x1059a71a0?, 0x140002629a0?}, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x1044f069c
net/http.(*conn).serve(0x1400016dcb0, {0x1059a99f8, 0x140000b0360})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x1044d15cc
net/http.(*Server).Serve.gowrap3()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x1044d6790
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x104213414
created by net/http.(*Server).Serve in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8

goroutine 61 gp=0x140002f16c0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x10422ec30?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251580 sp=0x14000251560 pc=0x10420ae98
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140002515c0 sp=0x14000251580 pc=0x1041d08f8
internal/poll.runtime_pollWait(0x12e5cedb8, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140002515f0 sp=0x140002515c0 pc=0x10420a050
internal/poll.(*pollDesc).wait(0x140000b4180?, 0x140004cd571?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000251620 sp=0x140002515f0 pc=0x10428afe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140000b4180, {0x140004cd571, 0x1, 0x1})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x140002516c0 sp=0x14000251620 pc=0x10428c29c
net.(*netFD).Read(0x140000b4180, {0x140004cd571?, 0x14000251758?, 0x1044cc044?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000251710 sp=0x140002516c0 pc=0x1042fe0f8
net.(*conn).Read(0x140001241e8, {0x140004cd571?, 0x0?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000251760 sp=0x14000251710 pc=0x10430afc4
net/http.(*connReader).backgroundRead(0x140004cd560)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x140002517b0 sp=0x14000251760 pc=0x1044cbf40
net/http.(*connReader).startBackgroundRead.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x140002517d0 sp=0x140002517b0 pc=0x1044cbe28
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002517d0 sp=0x140002517d0 pc=0x104213414
created by net/http.(*connReader).startBackgroundRead in goroutine 8
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x183a09df8
r5      0x16cc6f650
r6      0x36
r7      0x0
r8      0x7e8bdc898cbb1630
r9      0x7e8bdc88e07c6630
r10     0x3bb
r11     0x6
r12     0x6
r13     0x16cc6f382
r14     0x1023263a8
r15     0x1
r16     0x148
r17     0x1f3a40ac0
r18     0x0
r19     0x6
r20     0x1b03
r21     0x16cc770e0
r22     0x0
r23     0x2
r24     0x12fd04d68
r25     0x16cc70a08
r26     0xd004b82c0
r27     0xd004b8000
r28     0x1
r29     0x16cc6ff40
lr      0x183afd88c
sp      0x16cc6ff20
pc      0x183ac4388
fault   0x183ac4388
time=2026-04-02T10:41:08.344-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
time=2026-04-02T10:41:08.344-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50195/completion\": EOF"
[GIN] 2026/04/02 - 10:41:08 | 500 | 18.995847541s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/04/02 - 10:42:03 | 200 |      44.875µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:42:03 | 200 |    98.78775ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:42:03 | 200 |   52.682542ms |       127.0.0.1 | POST     "/api/show"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-04-02T10:42:03.905-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-04-02T10:42:03.906-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50202"
time=2026-04-02T10:42:03.909-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="57.5 GiB" free_swap="0 B"
time=2026-04-02T10:42:03.909-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:42:03.909-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1
time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB"
time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB"
time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB"
time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB"
time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB"
time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB"
time=2026-04-02T10:42:03.935-07:00 level=INFO source=runner.go:965 msg="starting go runner"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.010 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
time=2026-04-02T10:42:03.937-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2026-04-02T10:42:04.016-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50202"
time=2026-04-02T10:42:04.022-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
time=2026-04-02T10:42:04.022-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:42:04.022-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_embd_inp       = 8192
print_info: n_layer          = 80
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 70B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)

load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloaded 30/81 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 40543.11 MiB
load_tensors: Metal_Mapped model buffer size = 39721.13 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 131072
llama_context: n_ctx_seq     = 131072
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M4 Pro
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.52 MiB
llama_kv_cache:        CPU KV buffer size = 25600.00 MiB
llama_kv_cache:      Metal KV buffer size = 15360.00 MiB
llama_kv_cache: size = 40960.00 MiB (131072 cells,  80 layers,  1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      Metal compute buffer size =   328.01 MiB
llama_context:        CPU compute buffer size =   448.01 MiB
llama_context: graph nodes  = 2487
llama_context: graph splits = 503 (with bs=512), 3 (with bs=1)
time=2026-04-02T10:42:08.795-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.89 seconds"
time=2026-04-02T10:42:08.795-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-02T10:42:08.795-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:42:08.795-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.89 seconds"
[GIN] 2026/04/02 - 10:42:08 | 200 |   5.09026575s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml-metal-context.m:235: fatal error
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x0000000101362ae4 ggml_print_backtrace + 276
1   ollama                              0x0000000101362cd0 ggml_abort + 156
2   ollama                              0x00000001015cb340 ggml_metal_synchronize + 208
3   ollama                              0x0000000101381ae0 ggml_backend_sched_graph_compute_async + 924
4   ollama                              0x00000001013f7888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160
5   ollama                              0x00000001013f7538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588
6   ollama                              0x00000001013f8c04 _ZN13llama_context6decodeERK11llama_batch + 1556
7   ollama                              0x00000001013fd4a0 llama_decode + 20
8   ollama                              0x000000010131b3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72
9   ollama                              0x000000010044b20c ollama + 520716
SIGABRT: abort
PC=0x183ac4388 m=4 sigcode=0
signal arrived during cgo execution

goroutine 66 gp=0x140004841c0 m=4 mp=0x14000100008 [syscall]:
runtime.cgocall(0x10131b398, 0x14000080b58)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000080b20 sp=0x14000080ae0 pc=0x10043f974
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12a504760, {0x10, 0x168008200, 0x0, 0x168008a00, 0x168009200, 0x168009a00, 0x12ba04080})
	_cgo_gotypes.go:685 +0x34 fp=0x14000080b50 sp=0x14000080b20 pc=0x100890c44
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	/Users/runner/work/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x1004432f8?)
	/Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000080c40 sp=0x14000080b50 pc=0x100893008
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000590140, 0x1400025e230, 0x14000253f18)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000080ed0 sp=0x14000080c40 pc=0x100934058
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000590140, {0x101be1a30, 0x1400058e0a0})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000080fa0 sp=0x14000080ed0 pc=0x100933d04
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000080fd0 sp=0x14000080fa0 pc=0x100938210
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000080fd0 sp=0x14000080fd0 pc=0x10044b414
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c

goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000523710 sp=0x140005236f0 pc=0x100442e98
runtime.netpollblock(0x140004a37a8?, 0x4c77d0?, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000523750 sp=0x14000523710 pc=0x1004088f8
internal/poll.runtime_pollWait(0x12a297dd0, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000523780 sp=0x14000523750 pc=0x100442050
internal/poll.(*pollDesc).wait(0x1400058c100?, 0x1004c9a38?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005237b0 sp=0x14000523780 pc=0x1004c2fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x1400058c100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000523860 sp=0x140005237b0 pc=0x1004c78bc
net.(*netFD).accept(0x1400058c100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000523920 sp=0x14000523860 pc=0x100537b28
net.(*TCPListener).accept(0x1400058a080)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000523970 sp=0x14000523920 pc=0x10054c304
net.(*TCPListener).Accept(0x1400058a080)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005239b0 sp=0x14000523970 pc=0x10054b2ec
net/http.(*onceCloseListener).Accept(0x140005a2090?)
	<autogenerated>:1 +0x30 fp=0x140005239d0 sp=0x140005239b0 pc=0x100734cc0
net/http.(*Server).Serve(0x1400012c800, {0x101bdefc0, 0x1400058a080})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000523b00 sp=0x140005239d0 pc=0x10070e400
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000523cd0 sp=0x14000523b00 pc=0x100937fec
github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?})
	/Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000523d10 sp=0x14000523cd0 pc=0x100a746fc
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035600?, {0x101620986?, 0x4?, 0x10162098a?})
	/Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000523d40 sp=0x14000523d10 pc=0x101179714
github.com/spf13/cobra.(*Command).execute(0x1400030bb08, {0x1400028f9c0, 0x4, 0x4})
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000523e60 sp=0x14000523d40 pc=0x1005a69c8
github.com/spf13/cobra.(*Command).ExecuteC(0x140000f8908)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000523f20 sp=0x14000523e60 pc=0x1005a7110
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000523f40 sp=0x14000523f20 pc=0x10117ae94
runtime.main()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000523fd0 sp=0x14000523f40 pc=0x10040f464
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000523fd0 sp=0x14000523fd0 pc=0x10044b414

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100442e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.forcegchelper()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10040f7b8
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x10044b414
created by runtime.init.7 in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x100442e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.bgsweep(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1003fa898
runtime.gcenable.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1003ee698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x10044b414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x101840360?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x100442e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x10268f960)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1003f832c
runtime.bgscavenge(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1003f88cc
runtime.gcenable.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1003ee638
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x10044b414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x180006c5c8?, 0x1293d9b88?, 0xc0?, 0x45?, 0x1c0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x100442e98
runtime.runfinq()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x1003ed698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x10044b414
created by runtime.createfing in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80

goroutine 34 gp=0x140002f01c0 m=nil [chan receive]:
runtime.gopark(0x140002a9220?, 0x1400031c180?, 0x48?, 0x87?, 0x10050bc58?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x100442e98
runtime.chanrecv(0x140002f81c0, 0x0, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x1003dfa0c
runtime.chanrecv1(0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1003df5a4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1003f18bc
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x10044b414
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78

goroutine 35 gp=0x140002f0380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x140002f0540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250f10 sp=0x14000250ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000250fb0 sp=0x14000250f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000250fd0 sp=0x14000250fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000250fd0 sp=0x14000250fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 21 gp=0x14000103340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251710 sp=0x140002516f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002517b0 sp=0x14000251710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002517d0 sp=0x140002517b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002517d0 sp=0x140002517d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 22 gp=0x14000103500 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf41a057200f?, 0x3?, 0x22?, 0x17?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251f10 sp=0x14000251ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000251fb0 sp=0x14000251f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000251fd0 sp=0x14000251fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000251fd0 sp=0x14000251fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x140002f0a80 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf41a0571a5d?, 0x3?, 0x61?, 0xdc?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 38 gp=0x140002f0c40 m=nil [GC worker (idle)]:
runtime.gopark(0x6bf41a0567389?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 50 gp=0x14000484000 m=nil [GC worker (idle)]:
runtime.gopark(0x1026dd000?, 0x1?, 0x76?, 0x16?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x1026dd000?, 0x1?, 0x2e?, 0xad?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x100442e98
runtime.gcBgMarkWorker(0x140002f9420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x1003f0b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x1003f0a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x10044b414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 67 gp=0x14000484380 m=nil [select]:
runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x100442e98
runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x100422ad4
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000590140, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1009359dc
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x101bdf1a0?, 0x1400052e700?}, 0x14000045b28?)
	<autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x100938600
net/http.HandlerFunc.ServeHTTP(0x14000594000?, {0x101bdf1a0?, 0x1400052e700?}, 0x14000045b10?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10070ae28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x10070c9b4
net/http.serverHandler.ServeHTTP({0x101bdb230?}, {0x101bdf1a0?, 0x1400052e700?}, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10072869c
net/http.(*conn).serve(0x140005a2090, {0x101be19f8, 0x14000588360})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x1007095cc
net/http.(*Server).Serve.gowrap3()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x10070e790
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x10044b414
created by net/http.(*Server).Serve in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8

goroutine 32 gp=0x140002f1180 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100466c30?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000253580 sp=0x14000253560 pc=0x100442e98
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140002535c0 sp=0x14000253580 pc=0x1004088f8
internal/poll.runtime_pollWait(0x12a297cb8, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140002535f0 sp=0x140002535c0 pc=0x100442050
internal/poll.(*pollDesc).wait(0x1400058c180?, 0x14000412041?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000253620 sp=0x140002535f0 pc=0x1004c2fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400058c180, {0x14000412041, 0x1, 0x1})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x140002536c0 sp=0x14000253620 pc=0x1004c429c
net.(*netFD).Read(0x1400058c180, {0x14000412041?, 0x14000253758?, 0x100704044?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000253710 sp=0x140002536c0 pc=0x1005360f8
net.(*conn).Read(0x14000070030, {0x14000412041?, 0x0?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000253760 sp=0x14000253710 pc=0x100542fc4
net/http.(*connReader).backgroundRead(0x14000412030)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x140002537b0 sp=0x14000253760 pc=0x100703f40
net/http.(*connReader).startBackgroundRead.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x140002537d0 sp=0x140002537b0 pc=0x100703e28
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002537d0 sp=0x140002537d0 pc=0x10044b414
created by net/http.(*connReader).startBackgroundRead in goroutine 67
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x183a09df8
r5      0x171a4b650
r6      0x36
r7      0x0
r8      0x46aa15ae81e80719
r9      0x46aa15aff04d3719
r10     0x3bb
r11     0x6
r12     0x6
r13     0x171a4b382
r14     0x1023263a8
r15     0x1
r16     0x148
r17     0x1f3a40ac0
r18     0x0
r19     0x6
r20     0xc03
r21     0x171a530e0
r22     0x0
r23     0x2
r24     0x12a504d68
r25     0x171a4ca08
r26     0xcf05e82c0
r27     0xcf05e8000
r28     0x1
r29     0x171a4bf40
lr      0x183afd88c
sp      0x171a4bf20
pc      0x183ac4388
fault   0x183ac4388
time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50202/completion\": EOF"
[GIN] 2026/04/02 - 10:42:35 | 500 | 20.477471958s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/04/02 - 10:49:38 | 200 |      59.042µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/02 - 10:52:21 | 200 |      43.292µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:52:21 | 200 |  101.348667ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:52:22 | 200 |   52.597916ms |       127.0.0.1 | POST     "/api/show"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-04-02T10:52:22.243-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-04-02T10:52:22.243-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50225"
time=2026-04-02T10:52:22.249-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="60.9 GiB" free_swap="0 B"
time=2026-04-02T10:52:22.249-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:52:22.249-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1
time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB"
time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB"
time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB"
time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB"
time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB"
time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB"
time=2026-04-02T10:52:22.275-07:00 level=INFO source=runner.go:965 msg="starting go runner"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
time=2026-04-02T10:52:22.277-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2026-04-02T10:52:22.353-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50225"
time=2026-04-02T10:52:22.362-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
time=2026-04-02T10:52:22.362-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
time=2026-04-02T10:52:22.362-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_embd_inp       = 8192
print_info: n_layer          = 80
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 70B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)

load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloaded 30/81 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 40543.11 MiB
load_tensors: Metal_Mapped model buffer size = 39721.13 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 131072
llama_context: n_ctx_seq     = 131072
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M4 Pro
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.52 MiB
llama_kv_cache:        CPU KV buffer size = 25600.00 MiB
llama_kv_cache:      Metal KV buffer size = 15360.00 MiB
llama_kv_cache: size = 40960.00 MiB (131072 cells,  80 layers,  1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      Metal compute buffer size =   328.01 MiB
llama_context:        CPU compute buffer size =   448.01 MiB
llama_context: graph nodes  = 2487
llama_context: graph splits = 503 (with bs=512), 3 (with bs=1)
time=2026-04-02T10:52:26.632-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.38 seconds"
time=2026-04-02T10:52:26.632-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-02T10:52:26.632-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:52:26.633-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.38 seconds"
[GIN] 2026/04/02 - 10:52:26 | 200 |  4.576389625s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml-metal-context.m:235: fatal error
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x00000001030b6ae4 ggml_print_backtrace + 276
1   ollama                              0x00000001030b6cd0 ggml_abort + 156
2   ollama                              0x000000010331f340 ggml_metal_synchronize + 208
3   ollama                              0x00000001030d5ae0 ggml_backend_sched_graph_compute_async + 924
4   ollama                              0x000000010314b888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160
5   ollama                              0x000000010314b538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588
6   ollama                              0x000000010314cc04 _ZN13llama_context6decodeERK11llama_batch + 1556
7   ollama                              0x00000001031514a0 llama_decode + 20
8   ollama                              0x000000010306f3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72
9   ollama                              0x000000010219f20c ollama + 520716
SIGABRT: abort
PC=0x183ac4388 m=10 sigcode=0
signal arrived during cgo execution

goroutine 13 gp=0x14000486380 m=10 mp=0x140000a5808 [syscall]:
runtime.cgocall(0x10306f398, 0x14000083b58)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000083b20 sp=0x14000083ae0 pc=0x102193974
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12dc04760, {0x10, 0x168025200, 0x0, 0x168025a00, 0x168026200, 0x168026a00, 0x12dc05950})
	_cgo_gotypes.go:685 +0x34 fp=0x14000083b50 sp=0x14000083b20 pc=0x1025e4c44
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	/Users/runner/work/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0x14000035400?, 0x1021972f8?)
	/Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000083c40 sp=0x14000083b50 pc=0x1025e7008
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000276140, 0x14000134140, 0x1400024a718)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000083ed0 sp=0x14000083c40 pc=0x102688058
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000276140, {0x103935a30, 0x14000528370})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000083fa0 sp=0x14000083ed0 pc=0x102687d04
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000083fd0 sp=0x14000083fa0 pc=0x10268c210
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x10219f414
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c

goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000337710 sp=0x140003376f0 pc=0x102196e98
runtime.netpollblock(0x140004a77a8?, 0x221b7d0?, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000337750 sp=0x14000337710 pc=0x10215c8f8
internal/poll.runtime_pollWait(0x12b77b150, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000337780 sp=0x14000337750 pc=0x102196050
internal/poll.(*pollDesc).wait(0x14000274100?, 0x10221da38?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140003377b0 sp=0x14000337780 pc=0x102216fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000274100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000337860 sp=0x140003377b0 pc=0x10221b8bc
net.(*netFD).accept(0x14000274100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000337920 sp=0x14000337860 pc=0x10228bb28
net.(*TCPListener).accept(0x1400051e480)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000337970 sp=0x14000337920 pc=0x1022a0304
net.(*TCPListener).Accept(0x1400051e480)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140003379b0 sp=0x14000337970 pc=0x10229f2ec
net/http.(*onceCloseListener).Accept(0x140003a8090?)
	<autogenerated>:1 +0x30 fp=0x140003379d0 sp=0x140003379b0 pc=0x102488cc0
net/http.(*Server).Serve(0x140005c4200, {0x103932fc0, 0x1400051e480})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000337b00 sp=0x140003379d0 pc=0x102462400
github.com/ollama/ollama/runner/llamarunner.Execute({0x140000322c0, 0x4, 0x4})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000337cd0 sp=0x14000337b00 pc=0x10268bfec
github.com/ollama/ollama/runner.Execute({0x140000322b0?, 0x0?, 0x0?})
	/Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000337d10 sp=0x14000337cd0 pc=0x1027c86fc
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035b00?, {0x103374986?, 0x4?, 0x10337498a?})
	/Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000337d40 sp=0x14000337d10 pc=0x102ecd714
github.com/spf13/cobra.(*Command).execute(0x14000305b08, {0x14000439380, 0x4, 0x4})
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000337e60 sp=0x14000337d40 pc=0x1022fa9c8
github.com/spf13/cobra.(*Command).ExecuteC(0x14000124908)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000337f20 sp=0x14000337e60 pc=0x1022fb110
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000337f40 sp=0x14000337f20 pc=0x102ecee94
runtime.main()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000337fd0 sp=0x14000337f40 pc=0x102163464
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000337fd0 sp=0x14000337fd0 pc=0x10219f414

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x102196e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.forcegchelper()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1021637b8
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x10219f414
created by runtime.init.7 in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x102196e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.bgsweep(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x10214e898
runtime.gcenable.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x102142698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x10219f414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x103594360?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x102196e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x1043e3960)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x10214c32c
runtime.bgscavenge(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10214c8cc
runtime.gcenable.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x102142638
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x10219f414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac

goroutine 5 gp=0x14000003c00 m=nil [finalizer wait]:
runtime.gopark(0x180006c5c8?, 0x104603ef0?, 0x8?, 0x81?, 0x1c0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x102196e98
runtime.runfinq()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x102141698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x10219f414
created by runtime.createfing in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80

goroutine 18 gp=0x14000102540 m=nil [chan receive]:
runtime.gopark(0x14000137180?, 0x1400041c018?, 0x48?, 0x87?, 0x10225fc58?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x102196e98
runtime.chanrecv(0x140002801c0, 0x0, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x102133a0c
runtime.chanrecv1(0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1021335a4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1021458bc
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x10219f414
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78

goroutine 19 gp=0x14000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 20 gp=0x140001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 6 gp=0x140001dc540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x140001dc700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x140001dc8c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 9 gp=0x140001dca80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ff10 sp=0x1400006fef0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006ffb0 sp=0x1400006ff10 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006ffd0 sp=0x1400006ffb0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006ffd0 sp=0x1400006ffd0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 10 gp=0x140001dcc40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 34 gp=0x14000306000 m=nil [GC worker (idle)]:
runtime.gopark(0x104431000?, 0x1?, 0xac?, 0x14?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000248710 sp=0x140002486f0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002487b0 sp=0x14000248710 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002487d0 sp=0x140002487b0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002487d0 sp=0x140002487d0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 11 gp=0x140001dce00 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfd195b0c80e?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024cf10 sp=0x1400024cef0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024cfb0 sp=0x1400024cf10 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024cfd0 sp=0x1400024cfb0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024cfd0 sp=0x1400024cfd0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 12 gp=0x140001dcfc0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfd195b05937?, 0x1?, 0x2f?, 0x83?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024d710 sp=0x1400024d6f0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024d7b0 sp=0x1400024d710 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024d7d0 sp=0x1400024d7b0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024d7d0 sp=0x1400024d7d0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 35 gp=0x140003061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfd195b0a492?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000248f10 sp=0x14000248ef0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000248fb0 sp=0x14000248f10 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000248fd0 sp=0x14000248fb0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000248fd0 sp=0x14000248fd0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 21 gp=0x14000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfd195b0c520?, 0x3?, 0x46?, 0x11?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x102196e98
runtime.gcBgMarkWorker(0x14000281420)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x102144b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x102144a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x10219f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 14 gp=0x14000486540 m=nil [select]:
runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x102196e98
runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x102176ad4
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000276140, {0x1039331a0, 0x140001c47e0}, 0x140005fe640)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1026899dc
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1039331a0?, 0x140001c47e0?}, 0x1400032fb28?)
	<autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x10268c600
net/http.HandlerFunc.ServeHTTP(0x1400027a000?, {0x1039331a0?, 0x140001c47e0?}, 0x1400032fb10?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10245ee28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x1039331a0, 0x140001c47e0}, 0x140005fe640)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1024609b4
net/http.serverHandler.ServeHTTP({0x10392f230?}, {0x1039331a0?, 0x140001c47e0?}, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10247c69c
net/http.(*conn).serve(0x140003a8090, {0x1039359f8, 0x14000272360})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x10245d5cc
net/http.(*Server).Serve.gowrap3()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x102462790
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x10219f414
created by net/http.(*Server).Serve in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8

goroutine 70 gp=0x14000306540 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1021bac30?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024ad80 sp=0x1400024ad60 pc=0x102196e98
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x1400024adc0 sp=0x1400024ad80 pc=0x10215c8f8
internal/poll.runtime_pollWait(0x12b77b038, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x1400024adf0 sp=0x1400024adc0 pc=0x102196050
internal/poll.(*pollDesc).wait(0x14000274180?, 0x140002974e1?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400024ae20 sp=0x1400024adf0 pc=0x102216fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x14000274180, {0x140002974e1, 0x1, 0x1})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x1400024aec0 sp=0x1400024ae20 pc=0x10221829c
net.(*netFD).Read(0x14000274180, {0x140002974e1?, 0x1400024af58?, 0x102458044?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x1400024af10 sp=0x1400024aec0 pc=0x10228a0f8
net.(*conn).Read(0x140005a4050, {0x140002974e1?, 0x0?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x1400024af60 sp=0x1400024af10 pc=0x102296fc4
net/http.(*connReader).backgroundRead(0x140002974d0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x1400024afb0 sp=0x1400024af60 pc=0x102457f40
net/http.(*connReader).startBackgroundRead.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x1400024afd0 sp=0x1400024afb0 pc=0x102457e28
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024afd0 sp=0x1400024afd0 pc=0x10219f414
created by net/http.(*connReader).startBackgroundRead in goroutine 14
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x183a09df8
r5      0x17264b650
r6      0x36
r7      0x0
r8      0xaa98e3e9645a0b8a
r9      0xaa98e3e8163f3b8a
r10     0x3bb
r11     0x6
r12     0x6
r13     0x17264b382
r14     0x1023263a8
r15     0x1
r16     0x148
r17     0x1f3a40ac0
r18     0x0
r19     0x6
r20     0x3713
r21     0x1726530e0
r22     0x0
r23     0x2
r24     0x12dc04d68
r25     0x17264ca08
r26     0xcf80082c0
r27     0xcf8008000
r28     0x1
r29     0x17264bf40
lr      0x183afd88c
sp      0x17264bf20
pc      0x183ac4388
fault   0x183ac4388
time=2026-04-02T10:52:51.289-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
time=2026-04-02T10:52:51.289-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50225/completion\": EOF"
[GIN] 2026/04/02 - 10:52:51 | 500 |    17.269655s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2026/04/02 - 10:53:38 | 200 |      41.375µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/02 - 10:53:39 | 200 |   97.700291ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/02 - 10:53:39 | 200 |   52.942917ms |       127.0.0.1 | POST     "/api/show"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: no_alloc         = 0
print_info: model type       = ?B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2026-04-02T10:53:39.295-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-04-02T10:53:39.296-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50238"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="61.0 GiB" free_swap="0 B"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1
time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB"
time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB"
time=2026-04-02T10:53:39.325-07:00 level=INFO source=runner.go:965 msg="starting go runner"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.007 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M4 Pro
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 51539.61 MB
time=2026-04-02T10:53:39.328-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2026-04-02T10:53:39.403-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50238"
time=2026-04-02T10:53:39.412-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free
time=2026-04-02T10:53:39.412-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:53:39.413-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 70B Instruct 2024 12
llama_model_loader: - kv   3:                            general.version str              = 2024-12
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1
llama_model_loader: - kv   6:                         general.size_label str              = 70B
llama_model_loader: - kv   7:                            general.license str              = llama3.1
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.1 70B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                               general.tags arr[str,5]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv  13:                          general.languages arr[str,7]       = ["fr", "it", "pt", "hi", "es", "th", ...
llama_model_loader: - kv  14:                          llama.block_count u32              = 80
llama_model_loader: - kv  15:                       llama.context_length u32              = 131072
llama_model_loader: - kv  16:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  17:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  18:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  19:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  20:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  21:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  23:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  24:                          general.file_type u32              = 15
llama_model_loader: - kv  25:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  26:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  27:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  32:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  35:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_K:  441 tensors
llama_model_loader: - type q5_K:   40 tensors
llama_model_loader: - type q6_K:   81 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 39.59 GiB (4.82 BPW) 
load: printing all EOG tokens:
load:   - 128001 ('<|end_of_text|>')
load:   - 128008 ('<|eom_id|>')
load:   - 128009 ('<|eot_id|>')
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_embd_inp       = 8192
print_info: n_layer          = 80
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 70B
print_info: model params     = 70.55 B
print_info: general.name     = Llama 3.1 70B Instruct 2024 12
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128001 '<|end_of_text|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128001 '<|end_of_text|>'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)

load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloaded 30/81 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 40543.11 MiB
load_tensors: Metal_Mapped model buffer size = 39721.13 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 131072
llama_context: n_ctx_seq     = 131072
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 500000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M4 Pro
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
llama_context:        CPU  output buffer size =     0.52 MiB
llama_kv_cache:        CPU KV buffer size = 25600.00 MiB
llama_kv_cache:      Metal KV buffer size = 15360.00 MiB
llama_kv_cache: size = 40960.00 MiB (131072 cells,  80 layers,  1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      Metal compute buffer size =   328.01 MiB
llama_context:        CPU compute buffer size =   448.01 MiB
llama_context: graph nodes  = 2487
llama_context: graph splits = 503 (with bs=512), 3 (with bs=1)
time=2026-04-02T10:53:43.934-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.63 seconds"
time=2026-04-02T10:53:43.934-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-02T10:53:43.935-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-04-02T10:53:43.935-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.64 seconds"
[GIN] 2026/04/02 - 10:53:43 | 200 |     4.838439s |       127.0.0.1 | POST     "/api/generate"
ggml_metal_synchronize: error: command buffer 0 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
ggml-metal-context.m:235: fatal error
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x0000000101d76ae4 ggml_print_backtrace + 276
1   ollama                              0x0000000101d76cd0 ggml_abort + 156
2   ollama                              0x0000000101fdf340 ggml_metal_synchronize + 208
3   ollama                              0x0000000101d95ae0 ggml_backend_sched_graph_compute_async + 924
4   ollama                              0x0000000101e0b888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160
5   ollama                              0x0000000101e0b538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588
6   ollama                              0x0000000101e0cc04 _ZN13llama_context6decodeERK11llama_batch + 1556
7   ollama                              0x0000000101e114a0 llama_decode + 20
8   ollama                              0x0000000101d2f3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72
9   ollama                              0x0000000100e5f20c ollama + 520716
SIGABRT: abort
PC=0x183ac4388 m=10 sigcode=0
signal arrived during cgo execution

goroutine 50 gp=0x140004ea540 m=10 mp=0x14000428808 [syscall]:
runtime.cgocall(0x101d2f398, 0x14000085b58)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000085b20 sp=0x14000085ae0 pc=0x100e53974
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x152f04dd0, {0x10, 0x15080bc00, 0x0, 0x15082f400, 0x15082fc00, 0x150808800, 0x14c804710})
	_cgo_gotypes.go:685 +0x34 fp=0x14000085b50 sp=0x14000085b20 pc=0x1012a4c44
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
	/Users/runner/work/ollama/ollama/llama/llama.go:173
github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x100e572f8?)
	/Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000085c40 sp=0x14000085b50 pc=0x1012a7008
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000596140, 0x1400033a8c0, 0x1400030ef18)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000085ed0 sp=0x14000085c40 pc=0x101348058
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000596140, {0x1025f5a30, 0x14000618190})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000085fa0 sp=0x14000085ed0 pc=0x101347d04
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000085fd0 sp=0x14000085fa0 pc=0x10134c210
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000085fd0 sp=0x14000085fd0 pc=0x100e5f414
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c

goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140005bb710 sp=0x140005bb6f0 pc=0x100e56e98
runtime.netpollblock(0x140005bb7a8?, 0xedb7d0?, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140005bb750 sp=0x140005bb710 pc=0x100e1c8f8
internal/poll.runtime_pollWait(0x14b218b50, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140005bb780 sp=0x140005bb750 pc=0x100e56050
internal/poll.(*pollDesc).wait(0x14000594100?, 0x100dfeccc?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005bb7b0 sp=0x140005bb780 pc=0x100ed6fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000594100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x140005bb860 sp=0x140005bb7b0 pc=0x100edb8bc
net.(*netFD).accept(0x14000594100)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x140005bb920 sp=0x140005bb860 pc=0x100f4bb28
net.(*TCPListener).accept(0x140005100c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x140005bb970 sp=0x140005bb920 pc=0x100f60304
net.(*TCPListener).Accept(0x140005100c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005bb9b0 sp=0x140005bb970 pc=0x100f5f2ec
net/http.(*onceCloseListener).Accept(0x140002a8240?)
	<autogenerated>:1 +0x30 fp=0x140005bb9d0 sp=0x140005bb9b0 pc=0x101148cc0
net/http.(*Server).Serve(0x1400012c800, {0x1025f2fc0, 0x140005100c0})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x140005bbb00 sp=0x140005bb9d0 pc=0x101122400
github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4})
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x140005bbcd0 sp=0x140005bbb00 pc=0x10134bfec
github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?})
	/Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x140005bbd10 sp=0x140005bbcd0 pc=0x1014886fc
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035500?, {0x102034986?, 0x4?, 0x10203498a?})
	/Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x140005bbd40 sp=0x140005bbd10 pc=0x101b8d714
github.com/spf13/cobra.(*Command).execute(0x14000363b08, {0x14000327580, 0x4, 0x4})
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x140005bbe60 sp=0x140005bbd40 pc=0x100fba9c8
github.com/spf13/cobra.(*Command).ExecuteC(0x14000290908)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x140005bbf20 sp=0x140005bbe60 pc=0x100fbb110
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x140005bbf40 sp=0x140005bbf20 pc=0x101b8ee94
runtime.main()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x140005bbfd0 sp=0x140005bbf40 pc=0x100e23464
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140005bbfd0 sp=0x140005bbfd0 pc=0x100e5f414

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100e56e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.forcegchelper()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x100e237b8
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x100e5f414
created by runtime.init.7 in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x100e56e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.bgsweep(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x100e0e898
runtime.gcenable.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x100e02698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x100e5f414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x102254360?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x100e56e98
runtime.goparkunlock(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x1030a3960)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x100e0c32c
runtime.bgscavenge(0x14000098000)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x100e0c8cc
runtime.gcenable.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x100e02638
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x100e5f414
created by runtime.gcenable in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac

goroutine 18 gp=0x14000102700 m=nil [finalizer wait]:
runtime.gopark(0x180006c5c8?, 0x10338db88?, 0xc0?, 0x85?, 0x1c0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x100e56e98
runtime.runfinq()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x100e01698
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x100e5f414
created by runtime.createfing in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80

goroutine 5 gp=0x14000003a40 m=nil [chan receive]:
runtime.gopark(0x140000b72c0?, 0x14000336048?, 0x48?, 0xe7?, 0x100f1fc58?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e6f0 sp=0x1400006e6d0 pc=0x100e56e98
runtime.chanrecv(0x1400003a230, 0x0, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x1400006e770 sp=0x1400006e6f0 pc=0x100df3a0c
runtime.chanrecv1(0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x1400006e7a0 sp=0x1400006e770 pc=0x100df35a4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x1400006e7d0 sp=0x1400006e7a0 pc=0x100e058bc
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x100e5f414
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78

goroutine 6 gp=0x14000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068710 sp=0x140000686f0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000687b0 sp=0x14000068710 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 34 gp=0x14000306000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030c710 sp=0x1400030c6f0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030c7b0 sp=0x1400030c710 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030c7d0 sp=0x1400030c7b0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030c7d0 sp=0x1400030c7d0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x14000003dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x140004ea000 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a3160b?, 0x3?, 0x55?, 0x4c?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ff10 sp=0x1400006fef0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006ffb0 sp=0x1400006ff10 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006ffd0 sp=0x1400006ffb0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006ffd0 sp=0x1400006ffd0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 9 gp=0x140004ea1c0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a144e0?, 0x3?, 0x0?, 0x7d?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000308710 sp=0x140003086f0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140003087b0 sp=0x14000308710 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140003087d0 sp=0x140003087b0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140003087d0 sp=0x140003087d0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 21 gp=0x14000103340 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a2242e?, 0x3?, 0x5a?, 0x6e?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 35 gp=0x140003061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a23de6?, 0x3?, 0x1e?, 0x2?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030cf10 sp=0x1400030cef0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030cfb0 sp=0x1400030cf10 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030cfd0 sp=0x1400030cfb0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030cfd0 sp=0x1400030cfd0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 10 gp=0x140004ea380 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a26302?, 0x1?, 0xcd?, 0x4e?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000308f10 sp=0x14000308ef0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000308fb0 sp=0x14000308f10 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000308fd0 sp=0x14000308fb0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000308fd0 sp=0x14000308fd0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x14000306380 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a23187?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030d710 sp=0x1400030d6f0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030d7b0 sp=0x1400030d710 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030d7d0 sp=0x1400030d7b0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030d7d0 sp=0x1400030d7d0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x14000306540 m=nil [GC worker (idle)]:
runtime.gopark(0x6bfe385a22816?, 0x3?, 0xca?, 0x8c?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030df10 sp=0x1400030def0 pc=0x100e56e98
runtime.gcBgMarkWorker(0x1400003b490)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030dfb0 sp=0x1400030df10 pc=0x100e04b2c
runtime.gcBgMarkStartWorkers.gowrap1()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030dfd0 sp=0x1400030dfb0 pc=0x100e04a18
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030dfd0 sp=0x1400030dfd0 pc=0x100e5f414
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140

goroutine 38 gp=0x14000582380 m=nil [select]:
runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x100e56e98
runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x100e36ad4
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000596140, {0x1025f31a0, 0x1400036a9a0}, 0x140004472c0)
	/Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1013499dc
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1025f31a0?, 0x1400036a9a0?}, 0x14000045b28?)
	<autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x10134c600
net/http.HandlerFunc.ServeHTTP(0x1400059a000?, {0x1025f31a0?, 0x1400036a9a0?}, 0x14000045b10?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10111ee28
net/http.(*ServeMux).ServeHTTP(0x10?, {0x1025f31a0, 0x1400036a9a0}, 0x140004472c0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1011209b4
net/http.serverHandler.ServeHTTP({0x1025ef230?}, {0x1025f31a0?, 0x1400036a9a0?}, 0x1?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10113c69c
net/http.(*conn).serve(0x140002a8240, {0x1025f59f8, 0x14000592360})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x10111d5cc
net/http.(*Server).Serve.gowrap3()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x101122790
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x100e5f414
created by net/http.(*Server).Serve in goroutine 1
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8

goroutine 82 gp=0x14000583340 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100e7ac30?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030fd80 sp=0x1400030fd60 pc=0x100e56e98
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x1400030fdc0 sp=0x1400030fd80 pc=0x100e1c8f8
internal/poll.runtime_pollWait(0x14b218a38, 0x72)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x1400030fdf0 sp=0x1400030fdc0 pc=0x100e56050
internal/poll.(*pollDesc).wait(0x1400029e000?, 0x14000592041?, 0x0)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400030fe20 sp=0x1400030fdf0 pc=0x100ed6fe8
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x1400029e000, {0x14000592041, 0x1, 0x1})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x1400030fec0 sp=0x1400030fe20 pc=0x100ed829c
net.(*netFD).Read(0x1400029e000, {0x14000592041?, 0x1400030ff58?, 0x101118044?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x1400030ff10 sp=0x1400030fec0 pc=0x100f4a0f8
net.(*conn).Read(0x14000122058, {0x14000592041?, 0x0?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x1400030ff60 sp=0x1400030ff10 pc=0x100f56fc4
net/http.(*connReader).backgroundRead(0x14000592030)
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x1400030ffb0 sp=0x1400030ff60 pc=0x101117f40
net/http.(*connReader).startBackgroundRead.gowrap2()
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x1400030ffd0 sp=0x1400030ffb0 pc=0x101117e28
runtime.goexit({})
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030ffd0 sp=0x1400030ffd0 pc=0x100e5f414
created by net/http.(*connReader).startBackgroundRead in goroutine 38
	/Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x183a09df8
r5      0x17398b650
r6      0x36
r7      0x0
r8      0xc171944f973fb752
r9      0xc171944ee4a68752
r10     0x3bb
r11     0x6
r12     0x6
r13     0x17398b382
r14     0x1023263a8
r15     0x1
r16     0x148
r17     0x1f3a40ac0
r18     0x0
r19     0x6
r20     0x3313
r21     0x1739930e0
r22     0x0
r23     0x2
r24     0x152f053d8
r25     0x17398ca08
r26     0xcf80082c0
r27     0xcf8008000
r28     0x1
r29     0x17398bf40
lr      0x183afd88c
sp      0x17398bf20
pc      0x183ac4388
fault   0x183ac4388
time=2026-04-02T10:54:06.529-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
time=2026-04-02T10:54:06.529-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50238/completion\": EOF"
[GIN] 2026/04/02 - 10:54:06 | 500 |  20.05786875s |       127.0.0.1 | POST     "/api/chat"
<!-- gh-comment-id:4179645954 --> @micseydel commented on GitHub (Apr 2, 2026): Here's the full thing all at once: ``` time=2026-04-02T10:11:41.544-07:00 level=INFO source=routes.go:1742 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/micseydel/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:true OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2026-04-02T10:11:41.545-07:00 level=INFO source=routes.go:1744 msg="Ollama cloud disabled: true" time=2026-04-02T10:11:41.552-07:00 level=INFO source=images.go:477 msg="total blobs: 86" time=2026-04-02T10:11:41.552-07:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-04-02T10:11:41.553-07:00 level=INFO source=routes.go:1800 msg="Listening on [::]:11434 (version 0.19.0)" time=2026-04-02T10:11:41.553-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-02T10:11:41.554-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 50139" time=2026-04-02T10:11:41.668-07:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M4 Pro" libdirs="" driver=0.0 pci_id="" type=discrete total="48.0 GiB" available="48.0 GiB" time=2026-04-02T10:11:41.668-07:00 level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="48.0 GiB" default_num_ctx=262144 [GIN] 2026/04/02 - 10:11:41 | 200 | 377.084µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/04/02 - 10:11:46 | 200 | 156.458µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:11:46 | 200 | 103.326333ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:11:46 | 200 | 53.286083ms | 127.0.0.1 | POST "/api/show" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-04-02T10:11:47.088-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-04-02T10:11:47.089-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50145" time=2026-04-02T10:11:47.092-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="47.1 GiB" free_swap="0 B" time=2026-04-02T10:11:47.092-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:11:47.092-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1 time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB" time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB" time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB" time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB" time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB" time=2026-04-02T10:11:47.093-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB" time=2026-04-02T10:11:47.119-07:00 level=INFO source=runner.go:965 msg="starting go runner" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB time=2026-04-02T10:11:47.119-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2026-04-02T10:11:47.193-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50145" time=2026-04-02T10:11:47.204-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free time=2026-04-02T10:11:47.205-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:11:47.205-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 8192 print_info: n_embd_inp = 8192 print_info: n_layer = 80 print_info: n_head = 64 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 8 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 28672 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 70B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloaded 30/81 layers to GPU load_tensors: CPU_Mapped model buffer size = 40543.11 MiB load_tensors: Metal_Mapped model buffer size = 39721.13 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 131072 llama_context: n_ctx_seq = 131072 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M4 Pro ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.52 MiB llama_kv_cache: CPU KV buffer size = 25600.00 MiB llama_kv_cache: Metal KV buffer size = 15360.00 MiB llama_kv_cache: size = 40960.00 MiB (131072 cells, 80 layers, 1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Metal compute buffer size = 328.01 MiB llama_context: CPU compute buffer size = 448.01 MiB llama_context: graph nodes = 2487 llama_context: graph splits = 503 (with bs=512), 3 (with bs=1) time=2026-04-02T10:12:09.571-07:00 level=INFO source=server.go:1390 msg="llama runner started in 22.48 seconds" time=2026-04-02T10:12:09.572-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-02T10:12:09.572-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:12:09.573-07:00 level=INFO source=server.go:1390 msg="llama runner started in 22.48 seconds" [GIN] 2026/04/02 - 10:12:09 | 200 | 22.747522209s | 127.0.0.1 | POST "/api/generate" ggml_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) ggml-metal-context.m:235: fatal error WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x0000000104ff6ae4 ggml_print_backtrace + 276 1 ollama 0x0000000104ff6cd0 ggml_abort + 156 2 ollama 0x000000010525f340 ggml_metal_synchronize + 208 3 ollama 0x0000000105015ae0 ggml_backend_sched_graph_compute_async + 924 4 ollama 0x000000010508b888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160 5 ollama 0x000000010508b538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588 6 ollama 0x000000010508cc04 _ZN13llama_context6decodeERK11llama_batch + 1556 7 ollama 0x00000001050914a0 llama_decode + 20 8 ollama 0x0000000104faf3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72 9 ollama 0x00000001040df20c ollama + 520716 SIGABRT: abort PC=0x183ac4388 m=12 sigcode=0 signal arrived during cgo execution goroutine 50 gp=0x140004eaa80 m=12 mp=0x14000428808 [syscall]: runtime.cgocall(0x104faf398, 0x14000294b58) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000294b20 sp=0x14000294ae0 pc=0x1040d3974 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x156e073c0, {0x10, 0x153822c00, 0x0, 0x153823400, 0x153823c00, 0x153809a00, 0x14fc04510}) _cgo_gotypes.go:685 +0x34 fp=0x14000294b50 sp=0x14000294b20 pc=0x104524c44 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) /Users/runner/work/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0x1400012ca00?, 0x1040d72f8?) /Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000294c40 sp=0x14000294b50 pc=0x104527008 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000612140, 0x140000b46e0, 0x1400024ef18) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000294ed0 sp=0x14000294c40 pc=0x1045c8058 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000612140, {0x105875a30, 0x140006100a0}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000294fa0 sp=0x14000294ed0 pc=0x1045c7d04 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000294fd0 sp=0x14000294fa0 pc=0x1045cc210 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000294fd0 sp=0x14000294fd0 pc=0x1040df414 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140003db710 sp=0x140003db6f0 pc=0x1040d6e98 runtime.netpollblock(0x140003db7a8?, 0x415b7d0?, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140003db750 sp=0x140003db710 pc=0x10409c8f8 internal/poll.runtime_pollWait(0x14e542550, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140003db780 sp=0x140003db750 pc=0x1040d6050 internal/poll.(*pollDesc).wait(0x1400060e100?, 0x10407eccc?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140003db7b0 sp=0x140003db780 pc=0x104156fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x1400060e100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x140003db860 sp=0x140003db7b0 pc=0x10415b8bc net.(*netFD).accept(0x1400060e100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x140003db920 sp=0x140003db860 pc=0x1041cbb28 net.(*TCPListener).accept(0x140007220c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x140003db970 sp=0x140003db920 pc=0x1041e0304 net.(*TCPListener).Accept(0x140007220c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140003db9b0 sp=0x140003db970 pc=0x1041df2ec net/http.(*onceCloseListener).Accept(0x1400016dcb0?) <autogenerated>:1 +0x30 fp=0x140003db9d0 sp=0x140003db9b0 pc=0x1043c8cc0 net/http.(*Server).Serve(0x1400012c900, {0x105872fc0, 0x140007220c0}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x140003dbb00 sp=0x140003db9d0 pc=0x1043a2400 github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x140003dbcd0 sp=0x140003dbb00 pc=0x1045cbfec github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?}) /Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x140003dbd10 sp=0x140003dbcd0 pc=0x1047086fc github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035400?, {0x1052b4986?, 0x4?, 0x1052b498a?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x140003dbd40 sp=0x140003dbd10 pc=0x104e0d714 github.com/spf13/cobra.(*Command).execute(0x14000313b08, {0x14000533680, 0x4, 0x4}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x140003dbe60 sp=0x140003dbd40 pc=0x10423a9c8 github.com/spf13/cobra.(*Command).ExecuteC(0x14000236908) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x140003dbf20 sp=0x140003dbe60 pc=0x10423b110 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x140003dbf40 sp=0x140003dbf20 pc=0x104e0ee94 runtime.main() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x140003dbfd0 sp=0x140003dbf40 pc=0x1040a3464 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140003dbfd0 sp=0x140003dbfd0 pc=0x1040df414 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x1040d6e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.forcegchelper() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1040a37b8 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x1040df414 created by runtime.init.7 in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x1040d6e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.bgsweep(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x10408e898 runtime.gcenable.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x104082698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x1040df414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x1054d4360?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x1040d6e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x106323960) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x10408c32c runtime.bgscavenge(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10408c8cc runtime.gcenable.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x104082638 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x1040df414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x180006c5c8?, 0x10670db88?, 0xc0?, 0x85?, 0x1c0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x1040d6e98 runtime.runfinq() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x104081698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x1040df414 created by runtime.createfing in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80 goroutine 5 gp=0x14000003a40 m=nil [chan receive]: runtime.gopark(0x140000b7180?, 0x14000404018?, 0x48?, 0xe7?, 0x10419fc58?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e6f0 sp=0x1400006e6d0 pc=0x1040d6e98 runtime.chanrecv(0x1400003a230, 0x0, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x1400006e770 sp=0x1400006e6f0 pc=0x104073a0c runtime.chanrecv1(0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x1400006e7a0 sp=0x1400006e770 pc=0x1040735a4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x1400006e7d0 sp=0x1400006e7a0 pc=0x1040858bc runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x1040df414 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78 goroutine 6 gp=0x14000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x14000003dc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x14000306000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068710 sp=0x140000686f0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000687b0 sp=0x14000068710 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x140004ea000 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c7cfb6c?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ff10 sp=0x1400006fef0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006ffb0 sp=0x1400006ff10 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006ffd0 sp=0x1400006ffb0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006ffd0 sp=0x1400006ffd0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x140003061c0 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c800e88?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 9 gp=0x140004ea1c0 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c7ccb3f?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 10 gp=0x140004ea380 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c7ca964?, 0x3?, 0x8a?, 0x5b?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024cf10 sp=0x1400024cef0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024cfb0 sp=0x1400024cf10 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024cfd0 sp=0x1400024cfb0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024cfd0 sp=0x1400024cfd0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 11 gp=0x140004ea540 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c7c8acb?, 0x140004e41e0?, 0x1b?, 0xa?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024d710 sp=0x1400024d6f0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024d7b0 sp=0x1400024d710 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024d7d0 sp=0x1400024d7b0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024d7d0 sp=0x1400024d7d0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 12 gp=0x140004ea700 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c7cf007?, 0x1?, 0x70?, 0x94?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024df10 sp=0x1400024def0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024dfb0 sp=0x1400024df10 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024dfd0 sp=0x1400024dfb0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024dfd0 sp=0x1400024dfd0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 13 gp=0x140004ea8c0 m=nil [GC worker (idle)]: runtime.gopark(0x106371000?, 0x1?, 0x6c?, 0x52?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024e710 sp=0x1400024e6f0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024e7b0 sp=0x1400024e710 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024e7d0 sp=0x1400024e7b0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024e7d0 sp=0x1400024e7d0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x14000306380 m=nil [GC worker (idle)]: runtime.gopark(0x6bd9a9c802e45?, 0x3?, 0xd8?, 0xcf?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000083f10 sp=0x14000083ef0 pc=0x1040d6e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000083fb0 sp=0x14000083f10 pc=0x104084b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000083fd0 sp=0x14000083fb0 pc=0x104084a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x1040df414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 51 gp=0x140004eac40 m=nil [select]: runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x1040d6e98 runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x1040b6ad4 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000612140, {0x1058731a0, 0x140003dc460}, 0x14000174500) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1045c99dc github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1058731a0?, 0x140003dc460?}, 0x14000045b28?) <autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x1045cc600 net/http.HandlerFunc.ServeHTTP(0x1400061e000?, {0x1058731a0?, 0x140003dc460?}, 0x14000045b10?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10439ee28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x1058731a0, 0x140003dc460}, 0x14000174500) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1043a09b4 net/http.serverHandler.ServeHTTP({0x10586f230?}, {0x1058731a0?, 0x140003dc460?}, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x1043bc69c net/http.(*conn).serve(0x1400016dcb0, {0x1058759f8, 0x1400060c390}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x10439d5cc net/http.(*Server).Serve.gowrap3() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x1043a2790 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x1040df414 created by net/http.(*Server).Serve in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8 goroutine 141 gp=0x140004eafc0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1040fac30?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000332d80 sp=0x14000332d60 pc=0x1040d6e98 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000332dc0 sp=0x14000332d80 pc=0x10409c8f8 internal/poll.runtime_pollWait(0x14e542438, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000332df0 sp=0x14000332dc0 pc=0x1040d6050 internal/poll.(*pollDesc).wait(0x1400060e000?, 0x1400060c3d1?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000332e20 sp=0x14000332df0 pc=0x104156fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400060e000, {0x1400060c3d1, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x14000332ec0 sp=0x14000332e20 pc=0x10415829c net.(*netFD).Read(0x1400060e000, {0x1400060c3d1?, 0x14000332f58?, 0x104398044?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000332f10 sp=0x14000332ec0 pc=0x1041ca0f8 net.(*conn).Read(0x14000070060, {0x1400060c3d1?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000332f60 sp=0x14000332f10 pc=0x1041d6fc4 net/http.(*connReader).backgroundRead(0x1400060c3c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x14000332fb0 sp=0x14000332f60 pc=0x104397f40 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x14000332fd0 sp=0x14000332fb0 pc=0x104397e28 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000332fd0 sp=0x14000332fd0 pc=0x1040df414 created by net/http.(*connReader).startBackgroundRead in goroutine 51 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x183a09df8 r5 0x17981f650 r6 0x36 r7 0x0 r8 0x7b7c0b534c79967d r9 0x7b7c0b5235fbe67d r10 0x3bb r11 0x6 r12 0x6 r13 0x17981f382 r14 0x1023263a8 r15 0x1 r16 0x148 r17 0x1f3a40ac0 r18 0x0 r19 0x6 r20 0x3213 r21 0x1798270e0 r22 0x0 r23 0x2 r24 0x156e079c8 r25 0x179820a08 r26 0xd088402c0 r27 0xd08840000 r28 0x1 r29 0x17981ff40 lr 0x183afd88c sp 0x17981ff20 pc 0x183ac4388 fault 0x183ac4388 time=2026-04-02T10:12:38.031-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50145/completion\": EOF" time=2026-04-02T10:12:38.031-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" [GIN] 2026/04/02 - 10:12:38 | 500 | 22.210161209s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/02 - 10:40:11 | 200 | 45.292µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:40:11 | 404 | 7.332333ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:40:11 | 200 | 319.062708ms | 127.0.0.1 | POST "/api/pull" [GIN] 2026/04/02 - 10:40:19 | 200 | 32.416µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:40:19 | 200 | 100.795667ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:40:19 | 200 | 52.591084ms | 127.0.0.1 | POST "/api/show" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /Users/micseydel/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_0 print_info: file size = 4.33 GiB (4.64 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.8000 MB print_info: arch = llama print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 8.03 B print_info: general.name = Meta-Llama-3-8B-Instruct print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-04-02T10:40:19.984-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=8192 time=2026-04-02T10:40:19.984-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --port 50182" time=2026-04-02T10:40:19.988-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="51.4 GiB" free_swap="0 B" time=2026-04-02T10:40:19.988-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:40:19.988-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=33 requested=-1 time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="4.1 GiB" time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="1.0 GiB" time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="560.0 MiB" time=2026-04-02T10:40:19.988-07:00 level=INFO source=device.go:272 msg="total memory" size="5.6 GiB" time=2026-04-02T10:40:20.013-07:00 level=INFO source=runner.go:965 msg="starting go runner" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.010 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB time=2026-04-02T10:40:20.016-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2026-04-02T10:40:20.095-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50182" time=2026-04-02T10:40:20.100-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:8192 KvCacheType: NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free time=2026-04-02T10:40:20.100-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:40:20.100-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /Users/micseydel/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_0 print_info: file size = 4.33 GiB (4.64 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.8000 MB print_info: arch = llama print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 8192 print_info: n_embd = 4096 print_info: n_embd_inp = 4096 print_info: n_layer = 32 print_info: n_head = 32 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 4 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 14336 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 8192 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 8B print_info: model params = 8.03 B print_info: general.name = Meta-Llama-3-8B-Instruct print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 32 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 33/33 layers to GPU load_tensors: CPU_Mapped model buffer size = 281.81 MiB load_tensors: Metal_Mapped model buffer size = 4155.99 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 8192 llama_context: n_ctx_seq = 8192 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M4 Pro ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.50 MiB llama_kv_cache: Metal KV buffer size = 1024.00 MiB llama_kv_cache: size = 1024.00 MiB ( 8192 cells, 32 layers, 1/1 seqs), K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Metal compute buffer size = 258.50 MiB llama_context: CPU compute buffer size = 24.01 MiB llama_context: graph nodes = 999 llama_context: graph splits = 2 time=2026-04-02T10:40:22.363-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.37 seconds" time=2026-04-02T10:40:22.363-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-02T10:40:22.363-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:40:22.363-07:00 level=INFO source=server.go:1390 msg="llama runner started in 2.37 seconds" [GIN] 2026/04/02 - 10:40:22 | 200 | 2.56401975s | 127.0.0.1 | POST "/api/generate" [GIN] 2026/04/02 - 10:40:29 | 200 | 5.376164166s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/02 - 10:40:40 | 200 | 27.291µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:40:40 | 200 | 98.548667ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:40:40 | 200 | 54.164959ms | 127.0.0.1 | POST "/api/show" time=2026-04-02T10:40:40.755-07:00 level=INFO source=sched.go:627 msg="updated VRAM based on existing loaded models" gpu=0 library=Metal total="48.0 GiB" available="42.4 GiB" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-04-02T10:40:40.882-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-04-02T10:40:40.882-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50195" time=2026-04-02T10:40:40.887-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="46.3 GiB" free_swap="0 B" time=2026-04-02T10:40:40.887-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="41.9 GiB" free="42.4 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:40:40.887-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1 time=2026-04-02T10:40:40.887-07:00 level=INFO source=server.go:1031 msg="model requires more gpu memory than is currently available, evicting a model to make space" "loaded layers"=41 time=2026-04-02T10:40:40.931-07:00 level=INFO source=runner.go:965 msg="starting go runner" time=2026-04-02T10:40:40.978-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="49.3 GiB" free_swap="0 B" time=2026-04-02T10:40:40.978-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:40:40.978-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1 time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB" time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB" time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB" time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB" time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB" time=2026-04-02T10:40:40.979-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB time=2026-04-02T10:40:40.932-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2026-04-02T10:40:41.012-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50195" time=2026-04-02T10:40:41.023-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free time=2026-04-02T10:40:41.024-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:40:41.024-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 8192 print_info: n_embd_inp = 8192 print_info: n_layer = 80 print_info: n_head = 64 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 8 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 28672 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 70B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloaded 30/81 layers to GPU load_tensors: CPU_Mapped model buffer size = 40543.11 MiB load_tensors: Metal_Mapped model buffer size = 39721.13 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 131072 llama_context: n_ctx_seq = 131072 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M4 Pro ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.52 MiB llama_kv_cache: CPU KV buffer size = 25600.00 MiB llama_kv_cache: Metal KV buffer size = 15360.00 MiB llama_kv_cache: size = 40960.00 MiB (131072 cells, 80 layers, 1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Metal compute buffer size = 328.01 MiB llama_context: CPU compute buffer size = 448.01 MiB llama_context: graph nodes = 2487 llama_context: graph splits = 503 (with bs=512), 3 (with bs=1) time=2026-04-02T10:40:45.792-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.91 seconds" time=2026-04-02T10:40:45.793-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-02T10:40:45.793-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:40:45.793-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.91 seconds" [GIN] 2026/04/02 - 10:40:45 | 200 | 5.097501292s | 127.0.0.1 | POST "/api/generate" ggml_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) ggml-metal-context.m:235: fatal error WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x000000010512aae4 ggml_print_backtrace + 276 1 ollama 0x000000010512acd0 ggml_abort + 156 2 ollama 0x0000000105393340 ggml_metal_synchronize + 208 3 ollama 0x0000000105149ae0 ggml_backend_sched_graph_compute_async + 924 4 ollama 0x00000001051bf888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160 5 ollama 0x00000001051bf538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588 6 ollama 0x00000001051c0c04 _ZN13llama_context6decodeERK11llama_batch + 1556 7 ollama 0x00000001051c54a0 llama_decode + 20 8 ollama 0x00000001050e33e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72 9 ollama 0x000000010421320c ollama + 520716 SIGABRT: abort PC=0x183ac4388 m=3 sigcode=0 signal arrived during cgo execution goroutine 7 gp=0x140002f0000 m=3 mp=0x14000073008 [syscall]: runtime.cgocall(0x1050e3398, 0x14000083b58) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000083b20 sp=0x14000083ae0 pc=0x104207974 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12fd04760, {0x10, 0x13a829400, 0x0, 0x13a829c00, 0x13a82a400, 0x13a83be00, 0x131604710}) _cgo_gotypes.go:685 +0x34 fp=0x14000083b50 sp=0x14000083b20 pc=0x104658c44 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) /Users/runner/work/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x10420b2f8?) /Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000083c40 sp=0x14000083b50 pc=0x10465b008 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x140000b8140, 0x140002a6320, 0x1400024f718) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000083ed0 sp=0x14000083c40 pc=0x1046fc058 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x140000b8140, {0x1059a9a30, 0x140000b60a0}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000083fa0 sp=0x14000083ed0 pc=0x1046fbd04 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000083fd0 sp=0x14000083fa0 pc=0x104700210 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x104213414 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000519710 sp=0x140005196f0 pc=0x10420ae98 runtime.netpollblock(0x140004a37a8?, 0x428f7d0?, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000519750 sp=0x14000519710 pc=0x1041d08f8 internal/poll.runtime_pollWait(0x12e5ceed0, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000519780 sp=0x14000519750 pc=0x10420a050 internal/poll.(*pollDesc).wait(0x140000b4100?, 0x104291a38?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005197b0 sp=0x14000519780 pc=0x10428afe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x140000b4100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000519860 sp=0x140005197b0 pc=0x10428f8bc net.(*netFD).accept(0x140000b4100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000519920 sp=0x14000519860 pc=0x1042ffb28 net.(*TCPListener).accept(0x140000b2080) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000519970 sp=0x14000519920 pc=0x104314304 net.(*TCPListener).Accept(0x140000b2080) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005199b0 sp=0x14000519970 pc=0x1043132ec net/http.(*onceCloseListener).Accept(0x1400016dcb0?) <autogenerated>:1 +0x30 fp=0x140005199d0 sp=0x140005199b0 pc=0x1044fccc0 net/http.(*Server).Serve(0x14000296100, {0x1059a6fc0, 0x140000b2080}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000519b00 sp=0x140005199d0 pc=0x1044d6400 github.com/ollama/ollama/runner/llamarunner.Execute({0x140001101a0, 0x4, 0x4}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000519cd0 sp=0x14000519b00 pc=0x1046fffec github.com/ollama/ollama/runner.Execute({0x14000110190?, 0x0?, 0x0?}) /Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000519d10 sp=0x14000519cd0 pc=0x10483c6fc github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035600?, {0x1053e8986?, 0x4?, 0x1053e898a?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000519d40 sp=0x14000519d10 pc=0x104f41714 github.com/spf13/cobra.(*Command).execute(0x140004dfb08, {0x140002899c0, 0x4, 0x4}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000519e60 sp=0x14000519d40 pc=0x10436e9c8 github.com/spf13/cobra.(*Command).ExecuteC(0x1400029a908) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000519f20 sp=0x14000519e60 pc=0x10436f110 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000519f40 sp=0x14000519f20 pc=0x104f42e94 runtime.main() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000519fd0 sp=0x14000519f40 pc=0x1041d7464 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000519fd0 sp=0x14000519fd0 pc=0x104213414 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x10420ae98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.forcegchelper() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1041d77b8 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x104213414 created by runtime.init.7 in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24 goroutine 3 gp=0x14000003500 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x10420ae98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.bgsweep(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1041c2898 runtime.gcenable.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1041b6698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x104213414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c goroutine 4 gp=0x140000036c0 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x105608360?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x10420ae98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x106457960) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1041c032c runtime.bgscavenge(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1041c08cc runtime.gcenable.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1041b6638 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x104213414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x180006c5c8?, 0x106841b88?, 0xc0?, 0xc5?, 0x1c0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x10420ae98 runtime.runfinq() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x1041b5698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x104213414 created by runtime.createfing in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80 goroutine 34 gp=0x140002f01c0 m=nil [chan receive]: runtime.gopark(0x140002a9220?, 0x1400031c0c0?, 0x48?, 0x87?, 0x1042d3c58?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x10420ae98 runtime.chanrecv(0x140002f81c0, 0x0, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x1041a7a0c runtime.chanrecv1(0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1041a75a4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1041b98bc runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x104213414 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78 goroutine 35 gp=0x140002f0380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x140002f0540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x140002f0700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 38 gp=0x140002f08c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 39 gp=0x140002f0a80 m=nil [GC worker (idle)]: runtime.gopark(0x6bf2e4d42ef51?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006af10 sp=0x1400006aef0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006afb0 sp=0x1400006af10 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006afd0 sp=0x1400006afb0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006afd0 sp=0x1400006afd0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 40 gp=0x140002f0c40 m=nil [GC worker (idle)]: runtime.gopark(0x6bf2e4d433aad?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006b710 sp=0x1400006b6f0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006b7b0 sp=0x1400006b710 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006b7d0 sp=0x1400006b7b0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006b7d0 sp=0x1400006b7d0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 41 gp=0x140002f0e00 m=nil [GC worker (idle)]: runtime.gopark(0x6bf2e4d437e62?, 0x1?, 0x1c?, 0x1e?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006bf10 sp=0x1400006bef0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006bfb0 sp=0x1400006bf10 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006bfd0 sp=0x1400006bfb0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006bfd0 sp=0x1400006bfd0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 42 gp=0x140002f0fc0 m=nil [GC worker (idle)]: runtime.gopark(0x6bf2e4d431732?, 0x1?, 0x34?, 0x28?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]: runtime.gopark(0x6bf2e4d42e02e?, 0x1?, 0xcc?, 0x62?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250f10 sp=0x14000250ef0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000250fb0 sp=0x14000250f10 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000250fd0 sp=0x14000250fb0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000250fd0 sp=0x14000250fd0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x6bf2e4d43274f?, 0x1?, 0xf?, 0x23?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x10420ae98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1041b8b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1041b8a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x104213414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x140002f1500 m=nil [select]: runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x10420ae98 runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x1041eaad4 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x140000b8140, {0x1059a71a0, 0x140002629a0}, 0x14000455040) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1046fd9dc github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1059a71a0?, 0x140002629a0?}, 0x14000045b28?) <autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x104700600 net/http.HandlerFunc.ServeHTTP(0x140000bc000?, {0x1059a71a0?, 0x140002629a0?}, 0x14000045b10?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x1044d2e28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x1059a71a0, 0x140002629a0}, 0x14000455040) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1044d49b4 net/http.serverHandler.ServeHTTP({0x1059a3230?}, {0x1059a71a0?, 0x140002629a0?}, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x1044f069c net/http.(*conn).serve(0x1400016dcb0, {0x1059a99f8, 0x140000b0360}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x1044d15cc net/http.(*Server).Serve.gowrap3() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x1044d6790 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x104213414 created by net/http.(*Server).Serve in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8 goroutine 61 gp=0x140002f16c0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x10422ec30?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251580 sp=0x14000251560 pc=0x10420ae98 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140002515c0 sp=0x14000251580 pc=0x1041d08f8 internal/poll.runtime_pollWait(0x12e5cedb8, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140002515f0 sp=0x140002515c0 pc=0x10420a050 internal/poll.(*pollDesc).wait(0x140000b4180?, 0x140004cd571?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000251620 sp=0x140002515f0 pc=0x10428afe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x140000b4180, {0x140004cd571, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x140002516c0 sp=0x14000251620 pc=0x10428c29c net.(*netFD).Read(0x140000b4180, {0x140004cd571?, 0x14000251758?, 0x1044cc044?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000251710 sp=0x140002516c0 pc=0x1042fe0f8 net.(*conn).Read(0x140001241e8, {0x140004cd571?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000251760 sp=0x14000251710 pc=0x10430afc4 net/http.(*connReader).backgroundRead(0x140004cd560) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x140002517b0 sp=0x14000251760 pc=0x1044cbf40 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x140002517d0 sp=0x140002517b0 pc=0x1044cbe28 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002517d0 sp=0x140002517d0 pc=0x104213414 created by net/http.(*connReader).startBackgroundRead in goroutine 8 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x183a09df8 r5 0x16cc6f650 r6 0x36 r7 0x0 r8 0x7e8bdc898cbb1630 r9 0x7e8bdc88e07c6630 r10 0x3bb r11 0x6 r12 0x6 r13 0x16cc6f382 r14 0x1023263a8 r15 0x1 r16 0x148 r17 0x1f3a40ac0 r18 0x0 r19 0x6 r20 0x1b03 r21 0x16cc770e0 r22 0x0 r23 0x2 r24 0x12fd04d68 r25 0x16cc70a08 r26 0xd004b82c0 r27 0xd004b8000 r28 0x1 r29 0x16cc6ff40 lr 0x183afd88c sp 0x16cc6ff20 pc 0x183ac4388 fault 0x183ac4388 time=2026-04-02T10:41:08.344-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" time=2026-04-02T10:41:08.344-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50195/completion\": EOF" [GIN] 2026/04/02 - 10:41:08 | 500 | 18.995847541s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/02 - 10:42:03 | 200 | 44.875µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:42:03 | 200 | 98.78775ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:42:03 | 200 | 52.682542ms | 127.0.0.1 | POST "/api/show" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-04-02T10:42:03.905-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-04-02T10:42:03.906-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50202" time=2026-04-02T10:42:03.909-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="57.5 GiB" free_swap="0 B" time=2026-04-02T10:42:03.909-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:42:03.909-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1 time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB" time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB" time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB" time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB" time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB" time=2026-04-02T10:42:03.910-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB" time=2026-04-02T10:42:03.935-07:00 level=INFO source=runner.go:965 msg="starting go runner" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.010 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB time=2026-04-02T10:42:03.937-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2026-04-02T10:42:04.016-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50202" time=2026-04-02T10:42:04.022-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free time=2026-04-02T10:42:04.022-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:42:04.022-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 8192 print_info: n_embd_inp = 8192 print_info: n_layer = 80 print_info: n_head = 64 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 8 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 28672 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 70B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloaded 30/81 layers to GPU load_tensors: CPU_Mapped model buffer size = 40543.11 MiB load_tensors: Metal_Mapped model buffer size = 39721.13 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 131072 llama_context: n_ctx_seq = 131072 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M4 Pro ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.52 MiB llama_kv_cache: CPU KV buffer size = 25600.00 MiB llama_kv_cache: Metal KV buffer size = 15360.00 MiB llama_kv_cache: size = 40960.00 MiB (131072 cells, 80 layers, 1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Metal compute buffer size = 328.01 MiB llama_context: CPU compute buffer size = 448.01 MiB llama_context: graph nodes = 2487 llama_context: graph splits = 503 (with bs=512), 3 (with bs=1) time=2026-04-02T10:42:08.795-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.89 seconds" time=2026-04-02T10:42:08.795-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-02T10:42:08.795-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:42:08.795-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.89 seconds" [GIN] 2026/04/02 - 10:42:08 | 200 | 5.09026575s | 127.0.0.1 | POST "/api/generate" ggml_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) ggml-metal-context.m:235: fatal error WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x0000000101362ae4 ggml_print_backtrace + 276 1 ollama 0x0000000101362cd0 ggml_abort + 156 2 ollama 0x00000001015cb340 ggml_metal_synchronize + 208 3 ollama 0x0000000101381ae0 ggml_backend_sched_graph_compute_async + 924 4 ollama 0x00000001013f7888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160 5 ollama 0x00000001013f7538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588 6 ollama 0x00000001013f8c04 _ZN13llama_context6decodeERK11llama_batch + 1556 7 ollama 0x00000001013fd4a0 llama_decode + 20 8 ollama 0x000000010131b3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72 9 ollama 0x000000010044b20c ollama + 520716 SIGABRT: abort PC=0x183ac4388 m=4 sigcode=0 signal arrived during cgo execution goroutine 66 gp=0x140004841c0 m=4 mp=0x14000100008 [syscall]: runtime.cgocall(0x10131b398, 0x14000080b58) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000080b20 sp=0x14000080ae0 pc=0x10043f974 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12a504760, {0x10, 0x168008200, 0x0, 0x168008a00, 0x168009200, 0x168009a00, 0x12ba04080}) _cgo_gotypes.go:685 +0x34 fp=0x14000080b50 sp=0x14000080b20 pc=0x100890c44 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) /Users/runner/work/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x1004432f8?) /Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000080c40 sp=0x14000080b50 pc=0x100893008 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000590140, 0x1400025e230, 0x14000253f18) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000080ed0 sp=0x14000080c40 pc=0x100934058 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000590140, {0x101be1a30, 0x1400058e0a0}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000080fa0 sp=0x14000080ed0 pc=0x100933d04 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000080fd0 sp=0x14000080fa0 pc=0x100938210 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000080fd0 sp=0x14000080fd0 pc=0x10044b414 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000523710 sp=0x140005236f0 pc=0x100442e98 runtime.netpollblock(0x140004a37a8?, 0x4c77d0?, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000523750 sp=0x14000523710 pc=0x1004088f8 internal/poll.runtime_pollWait(0x12a297dd0, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000523780 sp=0x14000523750 pc=0x100442050 internal/poll.(*pollDesc).wait(0x1400058c100?, 0x1004c9a38?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005237b0 sp=0x14000523780 pc=0x1004c2fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x1400058c100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000523860 sp=0x140005237b0 pc=0x1004c78bc net.(*netFD).accept(0x1400058c100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000523920 sp=0x14000523860 pc=0x100537b28 net.(*TCPListener).accept(0x1400058a080) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000523970 sp=0x14000523920 pc=0x10054c304 net.(*TCPListener).Accept(0x1400058a080) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005239b0 sp=0x14000523970 pc=0x10054b2ec net/http.(*onceCloseListener).Accept(0x140005a2090?) <autogenerated>:1 +0x30 fp=0x140005239d0 sp=0x140005239b0 pc=0x100734cc0 net/http.(*Server).Serve(0x1400012c800, {0x101bdefc0, 0x1400058a080}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000523b00 sp=0x140005239d0 pc=0x10070e400 github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000523cd0 sp=0x14000523b00 pc=0x100937fec github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?}) /Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000523d10 sp=0x14000523cd0 pc=0x100a746fc github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035600?, {0x101620986?, 0x4?, 0x10162098a?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000523d40 sp=0x14000523d10 pc=0x101179714 github.com/spf13/cobra.(*Command).execute(0x1400030bb08, {0x1400028f9c0, 0x4, 0x4}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000523e60 sp=0x14000523d40 pc=0x1005a69c8 github.com/spf13/cobra.(*Command).ExecuteC(0x140000f8908) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000523f20 sp=0x14000523e60 pc=0x1005a7110 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000523f40 sp=0x14000523f20 pc=0x10117ae94 runtime.main() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000523fd0 sp=0x14000523f40 pc=0x10040f464 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000523fd0 sp=0x14000523fd0 pc=0x10044b414 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100442e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.forcegchelper() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x10040f7b8 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x10044b414 created by runtime.init.7 in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x100442e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.bgsweep(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x1003fa898 runtime.gcenable.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x1003ee698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x10044b414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x101840360?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x100442e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x10268f960) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x1003f832c runtime.bgscavenge(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x1003f88cc runtime.gcenable.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x1003ee638 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x10044b414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x180006c5c8?, 0x1293d9b88?, 0xc0?, 0x45?, 0x1c0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x100442e98 runtime.runfinq() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x1003ed698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x10044b414 created by runtime.createfing in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80 goroutine 34 gp=0x140002f01c0 m=nil [chan receive]: runtime.gopark(0x140002a9220?, 0x1400031c180?, 0x48?, 0x87?, 0x10050bc58?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x100442e98 runtime.chanrecv(0x140002f81c0, 0x0, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x1003dfa0c runtime.chanrecv1(0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1003df5a4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1003f18bc runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x10044b414 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78 goroutine 35 gp=0x140002f0380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250710 sp=0x140002506f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002507b0 sp=0x14000250710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002507d0 sp=0x140002507b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002507d0 sp=0x140002507d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 5 gp=0x14000003880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x140002f0540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000250f10 sp=0x14000250ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000250fb0 sp=0x14000250f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000250fd0 sp=0x14000250fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000250fd0 sp=0x14000250fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x14000103340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251710 sp=0x140002516f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002517b0 sp=0x14000251710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002517d0 sp=0x140002517b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002517d0 sp=0x140002517d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 6 gp=0x14000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 22 gp=0x14000103500 m=nil [GC worker (idle)]: runtime.gopark(0x6bf41a057200f?, 0x3?, 0x22?, 0x17?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000251f10 sp=0x14000251ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000251fb0 sp=0x14000251f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000251fd0 sp=0x14000251fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000251fd0 sp=0x14000251fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x140002f0a80 m=nil [GC worker (idle)]: runtime.gopark(0x6bf41a0571a5d?, 0x3?, 0x61?, 0xdc?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 38 gp=0x140002f0c40 m=nil [GC worker (idle)]: runtime.gopark(0x6bf41a0567389?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006a710 sp=0x1400006a6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006a7b0 sp=0x1400006a710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006a7d0 sp=0x1400006a7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006a7d0 sp=0x1400006a7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 50 gp=0x14000484000 m=nil [GC worker (idle)]: runtime.gopark(0x1026dd000?, 0x1?, 0x76?, 0x16?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x14000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x1026dd000?, 0x1?, 0x2e?, 0xad?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x100442e98 runtime.gcBgMarkWorker(0x140002f9420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x1003f0b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x1003f0a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x10044b414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 67 gp=0x14000484380 m=nil [select]: runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x100442e98 runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x100422ad4 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000590140, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1009359dc github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x101bdf1a0?, 0x1400052e700?}, 0x14000045b28?) <autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x100938600 net/http.HandlerFunc.ServeHTTP(0x14000594000?, {0x101bdf1a0?, 0x1400052e700?}, 0x14000045b10?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10070ae28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x101bdf1a0, 0x1400052e700}, 0x1400026f2c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x10070c9b4 net/http.serverHandler.ServeHTTP({0x101bdb230?}, {0x101bdf1a0?, 0x1400052e700?}, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10072869c net/http.(*conn).serve(0x140005a2090, {0x101be19f8, 0x14000588360}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x1007095cc net/http.(*Server).Serve.gowrap3() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x10070e790 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x10044b414 created by net/http.(*Server).Serve in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8 goroutine 32 gp=0x140002f1180 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100466c30?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000253580 sp=0x14000253560 pc=0x100442e98 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140002535c0 sp=0x14000253580 pc=0x1004088f8 internal/poll.runtime_pollWait(0x12a297cb8, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140002535f0 sp=0x140002535c0 pc=0x100442050 internal/poll.(*pollDesc).wait(0x1400058c180?, 0x14000412041?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000253620 sp=0x140002535f0 pc=0x1004c2fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400058c180, {0x14000412041, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x140002536c0 sp=0x14000253620 pc=0x1004c429c net.(*netFD).Read(0x1400058c180, {0x14000412041?, 0x14000253758?, 0x100704044?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x14000253710 sp=0x140002536c0 pc=0x1005360f8 net.(*conn).Read(0x14000070030, {0x14000412041?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x14000253760 sp=0x14000253710 pc=0x100542fc4 net/http.(*connReader).backgroundRead(0x14000412030) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x140002537b0 sp=0x14000253760 pc=0x100703f40 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x140002537d0 sp=0x140002537b0 pc=0x100703e28 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002537d0 sp=0x140002537d0 pc=0x10044b414 created by net/http.(*connReader).startBackgroundRead in goroutine 67 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x183a09df8 r5 0x171a4b650 r6 0x36 r7 0x0 r8 0x46aa15ae81e80719 r9 0x46aa15aff04d3719 r10 0x3bb r11 0x6 r12 0x6 r13 0x171a4b382 r14 0x1023263a8 r15 0x1 r16 0x148 r17 0x1f3a40ac0 r18 0x0 r19 0x6 r20 0xc03 r21 0x171a530e0 r22 0x0 r23 0x2 r24 0x12a504d68 r25 0x171a4ca08 r26 0xcf05e82c0 r27 0xcf05e8000 r28 0x1 r29 0x171a4bf40 lr 0x183afd88c sp 0x171a4bf20 pc 0x183ac4388 fault 0x183ac4388 time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" time=2026-04-02T10:42:35.087-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50202/completion\": EOF" [GIN] 2026/04/02 - 10:42:35 | 500 | 20.477471958s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/02 - 10:49:38 | 200 | 59.042µs | 127.0.0.1 | GET "/api/version" [GIN] 2026/04/02 - 10:52:21 | 200 | 43.292µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:52:21 | 200 | 101.348667ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:52:22 | 200 | 52.597916ms | 127.0.0.1 | POST "/api/show" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-04-02T10:52:22.243-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-04-02T10:52:22.243-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50225" time=2026-04-02T10:52:22.249-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="60.9 GiB" free_swap="0 B" time=2026-04-02T10:52:22.249-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:52:22.249-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1 time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB" time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB" time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB" time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB" time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB" time=2026-04-02T10:52:22.250-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB" time=2026-04-02T10:52:22.275-07:00 level=INFO source=runner.go:965 msg="starting go runner" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB time=2026-04-02T10:52:22.277-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2026-04-02T10:52:22.353-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50225" time=2026-04-02T10:52:22.362-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" time=2026-04-02T10:52:22.362-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free time=2026-04-02T10:52:22.362-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 8192 print_info: n_embd_inp = 8192 print_info: n_layer = 80 print_info: n_head = 64 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 8 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 28672 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 70B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloaded 30/81 layers to GPU load_tensors: CPU_Mapped model buffer size = 40543.11 MiB load_tensors: Metal_Mapped model buffer size = 39721.13 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 131072 llama_context: n_ctx_seq = 131072 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M4 Pro ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.52 MiB llama_kv_cache: CPU KV buffer size = 25600.00 MiB llama_kv_cache: Metal KV buffer size = 15360.00 MiB llama_kv_cache: size = 40960.00 MiB (131072 cells, 80 layers, 1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Metal compute buffer size = 328.01 MiB llama_context: CPU compute buffer size = 448.01 MiB llama_context: graph nodes = 2487 llama_context: graph splits = 503 (with bs=512), 3 (with bs=1) time=2026-04-02T10:52:26.632-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.38 seconds" time=2026-04-02T10:52:26.632-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-02T10:52:26.632-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:52:26.633-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.38 seconds" [GIN] 2026/04/02 - 10:52:26 | 200 | 4.576389625s | 127.0.0.1 | POST "/api/generate" ggml_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) ggml-metal-context.m:235: fatal error WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x00000001030b6ae4 ggml_print_backtrace + 276 1 ollama 0x00000001030b6cd0 ggml_abort + 156 2 ollama 0x000000010331f340 ggml_metal_synchronize + 208 3 ollama 0x00000001030d5ae0 ggml_backend_sched_graph_compute_async + 924 4 ollama 0x000000010314b888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160 5 ollama 0x000000010314b538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588 6 ollama 0x000000010314cc04 _ZN13llama_context6decodeERK11llama_batch + 1556 7 ollama 0x00000001031514a0 llama_decode + 20 8 ollama 0x000000010306f3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72 9 ollama 0x000000010219f20c ollama + 520716 SIGABRT: abort PC=0x183ac4388 m=10 sigcode=0 signal arrived during cgo execution goroutine 13 gp=0x14000486380 m=10 mp=0x140000a5808 [syscall]: runtime.cgocall(0x10306f398, 0x14000083b58) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000083b20 sp=0x14000083ae0 pc=0x102193974 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x12dc04760, {0x10, 0x168025200, 0x0, 0x168025a00, 0x168026200, 0x168026a00, 0x12dc05950}) _cgo_gotypes.go:685 +0x34 fp=0x14000083b50 sp=0x14000083b20 pc=0x1025e4c44 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) /Users/runner/work/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0x14000035400?, 0x1021972f8?) /Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000083c40 sp=0x14000083b50 pc=0x1025e7008 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000276140, 0x14000134140, 0x1400024a718) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000083ed0 sp=0x14000083c40 pc=0x102688058 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000276140, {0x103935a30, 0x14000528370}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000083fa0 sp=0x14000083ed0 pc=0x102687d04 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000083fd0 sp=0x14000083fa0 pc=0x10268c210 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000083fd0 sp=0x14000083fd0 pc=0x10219f414 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000337710 sp=0x140003376f0 pc=0x102196e98 runtime.netpollblock(0x140004a77a8?, 0x221b7d0?, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x14000337750 sp=0x14000337710 pc=0x10215c8f8 internal/poll.runtime_pollWait(0x12b77b150, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x14000337780 sp=0x14000337750 pc=0x102196050 internal/poll.(*pollDesc).wait(0x14000274100?, 0x10221da38?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140003377b0 sp=0x14000337780 pc=0x102216fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x14000274100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x14000337860 sp=0x140003377b0 pc=0x10221b8bc net.(*netFD).accept(0x14000274100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x14000337920 sp=0x14000337860 pc=0x10228bb28 net.(*TCPListener).accept(0x1400051e480) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x14000337970 sp=0x14000337920 pc=0x1022a0304 net.(*TCPListener).Accept(0x1400051e480) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140003379b0 sp=0x14000337970 pc=0x10229f2ec net/http.(*onceCloseListener).Accept(0x140003a8090?) <autogenerated>:1 +0x30 fp=0x140003379d0 sp=0x140003379b0 pc=0x102488cc0 net/http.(*Server).Serve(0x140005c4200, {0x103932fc0, 0x1400051e480}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x14000337b00 sp=0x140003379d0 pc=0x102462400 github.com/ollama/ollama/runner/llamarunner.Execute({0x140000322c0, 0x4, 0x4}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x14000337cd0 sp=0x14000337b00 pc=0x10268bfec github.com/ollama/ollama/runner.Execute({0x140000322b0?, 0x0?, 0x0?}) /Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x14000337d10 sp=0x14000337cd0 pc=0x1027c86fc github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035b00?, {0x103374986?, 0x4?, 0x10337498a?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x14000337d40 sp=0x14000337d10 pc=0x102ecd714 github.com/spf13/cobra.(*Command).execute(0x14000305b08, {0x14000439380, 0x4, 0x4}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x14000337e60 sp=0x14000337d40 pc=0x1022fa9c8 github.com/spf13/cobra.(*Command).ExecuteC(0x14000124908) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x14000337f20 sp=0x14000337e60 pc=0x1022fb110 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x14000337f40 sp=0x14000337f20 pc=0x102ecee94 runtime.main() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x14000337fd0 sp=0x14000337f40 pc=0x102163464 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000337fd0 sp=0x14000337fd0 pc=0x10219f414 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x102196e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.forcegchelper() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x1021637b8 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x10219f414 created by runtime.init.7 in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x102196e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.bgsweep(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x10214e898 runtime.gcenable.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x102142698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x10219f414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x103594360?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x102196e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x1043e3960) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x10214c32c runtime.bgscavenge(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x10214c8cc runtime.gcenable.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x102142638 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x10219f414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac goroutine 5 gp=0x14000003c00 m=nil [finalizer wait]: runtime.gopark(0x180006c5c8?, 0x104603ef0?, 0x8?, 0x81?, 0x1c0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x102196e98 runtime.runfinq() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x102141698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x10219f414 created by runtime.createfing in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80 goroutine 18 gp=0x14000102540 m=nil [chan receive]: runtime.gopark(0x14000137180?, 0x1400041c018?, 0x48?, 0x87?, 0x10225fc58?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000686f0 sp=0x140000686d0 pc=0x102196e98 runtime.chanrecv(0x140002801c0, 0x0, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x14000068770 sp=0x140000686f0 pc=0x102133a0c runtime.chanrecv1(0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x140000687a0 sp=0x14000068770 pc=0x1021335a4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x140000687d0 sp=0x140000687a0 pc=0x1021458bc runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x10219f414 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78 goroutine 19 gp=0x14000102700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x140001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 6 gp=0x140001dc540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006e7b0 sp=0x1400006e710 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x140001dc700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x140001dc8c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 9 gp=0x140001dca80 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ff10 sp=0x1400006fef0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006ffb0 sp=0x1400006ff10 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006ffd0 sp=0x1400006ffb0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006ffd0 sp=0x1400006ffd0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 10 gp=0x140001dcc40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024c710 sp=0x1400024c6f0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024c7b0 sp=0x1400024c710 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024c7d0 sp=0x1400024c7b0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024c7d0 sp=0x1400024c7d0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x14000306000 m=nil [GC worker (idle)]: runtime.gopark(0x104431000?, 0x1?, 0xac?, 0x14?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000248710 sp=0x140002486f0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140002487b0 sp=0x14000248710 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140002487d0 sp=0x140002487b0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140002487d0 sp=0x140002487d0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 11 gp=0x140001dce00 m=nil [GC worker (idle)]: runtime.gopark(0x6bfd195b0c80e?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024cf10 sp=0x1400024cef0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024cfb0 sp=0x1400024cf10 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024cfd0 sp=0x1400024cfb0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024cfd0 sp=0x1400024cfd0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 12 gp=0x140001dcfc0 m=nil [GC worker (idle)]: runtime.gopark(0x6bfd195b05937?, 0x1?, 0x2f?, 0x83?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024d710 sp=0x1400024d6f0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400024d7b0 sp=0x1400024d710 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400024d7d0 sp=0x1400024d7b0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024d7d0 sp=0x1400024d7d0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x140003061c0 m=nil [GC worker (idle)]: runtime.gopark(0x6bfd195b0a492?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000248f10 sp=0x14000248ef0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000248fb0 sp=0x14000248f10 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000248fd0 sp=0x14000248fb0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000248fd0 sp=0x14000248fd0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x14000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x6bfd195b0c520?, 0x3?, 0x46?, 0x11?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069f10 sp=0x14000069ef0 pc=0x102196e98 runtime.gcBgMarkWorker(0x14000281420) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000069fb0 sp=0x14000069f10 pc=0x102144b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000069fd0 sp=0x14000069fb0 pc=0x102144a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000069fd0 sp=0x14000069fd0 pc=0x10219f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 14 gp=0x14000486540 m=nil [select]: runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x102196e98 runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x102176ad4 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000276140, {0x1039331a0, 0x140001c47e0}, 0x140005fe640) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1026899dc github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1039331a0?, 0x140001c47e0?}, 0x1400032fb28?) <autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x10268c600 net/http.HandlerFunc.ServeHTTP(0x1400027a000?, {0x1039331a0?, 0x140001c47e0?}, 0x1400032fb10?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10245ee28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x1039331a0, 0x140001c47e0}, 0x140005fe640) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1024609b4 net/http.serverHandler.ServeHTTP({0x10392f230?}, {0x1039331a0?, 0x140001c47e0?}, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10247c69c net/http.(*conn).serve(0x140003a8090, {0x1039359f8, 0x14000272360}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x10245d5cc net/http.(*Server).Serve.gowrap3() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x102462790 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x10219f414 created by net/http.(*Server).Serve in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8 goroutine 70 gp=0x14000306540 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1021bac30?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400024ad80 sp=0x1400024ad60 pc=0x102196e98 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x1400024adc0 sp=0x1400024ad80 pc=0x10215c8f8 internal/poll.runtime_pollWait(0x12b77b038, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x1400024adf0 sp=0x1400024adc0 pc=0x102196050 internal/poll.(*pollDesc).wait(0x14000274180?, 0x140002974e1?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400024ae20 sp=0x1400024adf0 pc=0x102216fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x14000274180, {0x140002974e1, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x1400024aec0 sp=0x1400024ae20 pc=0x10221829c net.(*netFD).Read(0x14000274180, {0x140002974e1?, 0x1400024af58?, 0x102458044?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x1400024af10 sp=0x1400024aec0 pc=0x10228a0f8 net.(*conn).Read(0x140005a4050, {0x140002974e1?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x1400024af60 sp=0x1400024af10 pc=0x102296fc4 net/http.(*connReader).backgroundRead(0x140002974d0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x1400024afb0 sp=0x1400024af60 pc=0x102457f40 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x1400024afd0 sp=0x1400024afb0 pc=0x102457e28 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400024afd0 sp=0x1400024afd0 pc=0x10219f414 created by net/http.(*connReader).startBackgroundRead in goroutine 14 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x183a09df8 r5 0x17264b650 r6 0x36 r7 0x0 r8 0xaa98e3e9645a0b8a r9 0xaa98e3e8163f3b8a r10 0x3bb r11 0x6 r12 0x6 r13 0x17264b382 r14 0x1023263a8 r15 0x1 r16 0x148 r17 0x1f3a40ac0 r18 0x0 r19 0x6 r20 0x3713 r21 0x1726530e0 r22 0x0 r23 0x2 r24 0x12dc04d68 r25 0x17264ca08 r26 0xcf80082c0 r27 0xcf8008000 r28 0x1 r29 0x17264bf40 lr 0x183afd88c sp 0x17264bf20 pc 0x183ac4388 fault 0x183ac4388 time=2026-04-02T10:52:51.289-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" time=2026-04-02T10:52:51.289-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50225/completion\": EOF" [GIN] 2026/04/02 - 10:52:51 | 500 | 17.269655s | 127.0.0.1 | POST "/api/chat" [GIN] 2026/04/02 - 10:53:38 | 200 | 41.375µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/02 - 10:53:39 | 200 | 97.700291ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/04/02 - 10:53:39 | 200 | 52.942917ms | 127.0.0.1 | POST "/api/show" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 1 print_info: no_alloc = 0 print_info: model type = ?B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2026-04-02T10:53:39.295-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-04-02T10:53:39.296-07:00 level=INFO source=server.go:432 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d --port 50238" time=2026-04-02T10:53:39.300-07:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="61.0 GiB" free_swap="0 B" time=2026-04-02T10:53:39.300-07:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=Metal available="47.5 GiB" free="48.0 GiB" minimum="512.0 MiB" overhead="0 B" time=2026-04-02T10:53:39.300-07:00 level=INFO source=server.go:499 msg="loading model" "model layers"=81 requested=-1 time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:240 msg="model weights" device=Metal size="14.5 GiB" time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="24.6 GiB" time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:251 msg="kv cache" device=Metal size="15.0 GiB" time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="25.0 GiB" time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:262 msg="compute graph" device=Metal size="16.3 GiB" time=2026-04-02T10:53:39.300-07:00 level=INFO source=device.go:272 msg="total memory" size="95.4 GiB" time=2026-04-02T10:53:39.325-07:00 level=INFO source=runner.go:965 msg="starting go runner" ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 Pro ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 51539.61 MB time=2026-04-02T10:53:39.328-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2026-04-02T10:53:39.403-07:00 level=INFO source=runner.go:1001 msg="Server listening on 127.0.0.1:50238" time=2026-04-02T10:53:39.412-07:00 level=INFO source=runner.go:895 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Auto KvSize:131072 KvCacheType: NumThreads:8 GPULayers:30[ID:0 Layers:30(50..79)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" llama_model_load_from_file_impl: using device Metal (Apple M4 Pro) (unknown id) - 49150 MiB free time=2026-04-02T10:53:39.412-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:53:39.413-07:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 36 key-value pairs and 724 tensors from /Users/micseydel/.ollama/models/blobs/sha256-4824460d29f2058aaf6e1118a63a7a197a09bed509f0e7d4e2efb1ee273b447d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.1 70B Instruct 2024 12 llama_model_loader: - kv 3: general.version str = 2024-12 llama_model_loader: - kv 4: general.finetune str = Instruct llama_model_loader: - kv 5: general.basename str = Llama-3.1 llama_model_loader: - kv 6: general.size_label str = 70B llama_model_loader: - kv 7: general.license str = llama3.1 llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.1 70B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.tags arr[str,5] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 13: general.languages arr[str,7] = ["fr", "it", "pt", "hi", "es", "th", ... llama_model_loader: - kv 14: llama.block_count u32 = 80 llama_model_loader: - kv 15: llama.context_length u32 = 131072 llama_model_loader: - kv 16: llama.embedding_length u32 = 8192 llama_model_loader: - kv 17: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 18: llama.attention.head_count u32 = 64 llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 20: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 22: llama.attention.key_length u32 = 128 llama_model_loader: - kv 23: llama.attention.value_length u32 = 128 llama_model_loader: - kv 24: general.file_type u32 = 15 llama_model_loader: - kv 25: llama.vocab_size u32 = 128256 llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 28: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 34: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 35: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_K: 441 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 81 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 39.59 GiB (4.82 BPW) load: printing all EOG tokens: load: - 128001 ('<|end_of_text|>') load: - 128008 ('<|eom_id|>') load: - 128009 ('<|eot_id|>') load: special tokens cache size = 256 load: token to piece cache size = 0.7999 MB print_info: arch = llama print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 8192 print_info: n_embd_inp = 8192 print_info: n_layer = 80 print_info: n_head = 64 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 8 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 28672 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 500000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: model type = 70B print_info: model params = 70.55 B print_info: general.name = Llama 3.1 70B Instruct 2024 12 print_info: vocab type = BPE print_info: n_vocab = 128256 print_info: n_merges = 280147 print_info: BOS token = 128000 '<|begin_of_text|>' print_info: EOS token = 128009 '<|eot_id|>' print_info: EOT token = 128001 '<|end_of_text|>' print_info: EOM token = 128008 '<|eom_id|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 128001 '<|end_of_text|>' print_info: EOG token = 128008 '<|eom_id|>' print_info: EOG token = 128009 '<|eot_id|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloaded 30/81 layers to GPU load_tensors: CPU_Mapped model buffer size = 40543.11 MiB load_tensors: Metal_Mapped model buffer size = 39721.13 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 131072 llama_context: n_ctx_seq = 131072 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = auto llama_context: kv_unified = false llama_context: freq_base = 500000.0 llama_context: freq_scale = 1 ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M4 Pro ggml_metal_init: use fusion = true ggml_metal_init: use concurrency = true ggml_metal_init: use graph optimize = true llama_context: CPU output buffer size = 0.52 MiB llama_kv_cache: CPU KV buffer size = 25600.00 MiB llama_kv_cache: Metal KV buffer size = 15360.00 MiB llama_kv_cache: size = 40960.00 MiB (131072 cells, 80 layers, 1/1 seqs), K (f16): 20480.00 MiB, V (f16): 20480.00 MiB llama_context: Flash Attention was auto, set to enabled llama_context: Metal compute buffer size = 328.01 MiB llama_context: CPU compute buffer size = 448.01 MiB llama_context: graph nodes = 2487 llama_context: graph splits = 503 (with bs=512), 3 (with bs=1) time=2026-04-02T10:53:43.934-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.63 seconds" time=2026-04-02T10:53:43.934-07:00 level=INFO source=sched.go:561 msg="loaded runners" count=1 time=2026-04-02T10:53:43.935-07:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" time=2026-04-02T10:53:43.935-07:00 level=INFO source=server.go:1390 msg="llama runner started in 4.64 seconds" [GIN] 2026/04/02 - 10:53:43 | 200 | 4.838439s | 127.0.0.1 | POST "/api/generate" ggml_metal_synchronize: error: command buffer 0 failed with status 5 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) ggml-metal-context.m:235: fatal error WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x0000000101d76ae4 ggml_print_backtrace + 276 1 ollama 0x0000000101d76cd0 ggml_abort + 156 2 ollama 0x0000000101fdf340 ggml_metal_synchronize + 208 3 ollama 0x0000000101d95ae0 ggml_backend_sched_graph_compute_async + 924 4 ollama 0x0000000101e0b888 _ZN13llama_context13graph_computeEP11ggml_cgraphb + 160 5 ollama 0x0000000101e0b538 _ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status + 588 6 ollama 0x0000000101e0cc04 _ZN13llama_context6decodeERK11llama_batch + 1556 7 ollama 0x0000000101e114a0 llama_decode + 20 8 ollama 0x0000000101d2f3e0 _cgo_7e52092beca7_Cfunc_llama_decode + 72 9 ollama 0x0000000100e5f20c ollama + 520716 SIGABRT: abort PC=0x183ac4388 m=10 sigcode=0 signal arrived during cgo execution goroutine 50 gp=0x140004ea540 m=10 mp=0x14000428808 [syscall]: runtime.cgocall(0x101d2f398, 0x14000085b58) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/cgocall.go:167 +0x44 fp=0x14000085b20 sp=0x14000085ae0 pc=0x100e53974 github.com/ollama/ollama/llama._Cfunc_llama_decode(0x152f04dd0, {0x10, 0x15080bc00, 0x0, 0x15082f400, 0x15082fc00, 0x150808800, 0x14c804710}) _cgo_gotypes.go:685 +0x34 fp=0x14000085b50 sp=0x14000085b20 pc=0x1012a4c44 github.com/ollama/ollama/llama.(*Context).Decode.func1(...) /Users/runner/work/ollama/ollama/llama/llama.go:173 github.com/ollama/ollama/llama.(*Context).Decode(0x14000034300?, 0x100e572f8?) /Users/runner/work/ollama/ollama/llama/llama.go:173 +0xc8 fp=0x14000085c40 sp=0x14000085b50 pc=0x1012a7008 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0x14000596140, 0x1400033a8c0, 0x1400030ef18) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:494 +0x1e8 fp=0x14000085ed0 sp=0x14000085c40 pc=0x101348058 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x14000596140, {0x1025f5a30, 0x14000618190}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:387 +0x164 fp=0x14000085fa0 sp=0x14000085ed0 pc=0x101347d04 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x30 fp=0x14000085fd0 sp=0x14000085fa0 pc=0x10134c210 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000085fd0 sp=0x14000085fd0 pc=0x100e5f414 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:981 +0x44c goroutine 1 gp=0x140000021c0 m=nil [IO wait, locked to thread]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140005bb710 sp=0x140005bb6f0 pc=0x100e56e98 runtime.netpollblock(0x140005bb7a8?, 0xedb7d0?, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x140005bb750 sp=0x140005bb710 pc=0x100e1c8f8 internal/poll.runtime_pollWait(0x14b218b50, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x140005bb780 sp=0x140005bb750 pc=0x100e56050 internal/poll.(*pollDesc).wait(0x14000594100?, 0x100dfeccc?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140005bb7b0 sp=0x140005bb780 pc=0x100ed6fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x14000594100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:620 +0x24c fp=0x140005bb860 sp=0x140005bb7b0 pc=0x100edb8bc net.(*netFD).accept(0x14000594100) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_unix.go:172 +0x28 fp=0x140005bb920 sp=0x140005bb860 pc=0x100f4bb28 net.(*TCPListener).accept(0x140005100c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock_posix.go:159 +0x24 fp=0x140005bb970 sp=0x140005bb920 pc=0x100f60304 net.(*TCPListener).Accept(0x140005100c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/tcpsock.go:380 +0x2c fp=0x140005bb9b0 sp=0x140005bb970 pc=0x100f5f2ec net/http.(*onceCloseListener).Accept(0x140002a8240?) <autogenerated>:1 +0x30 fp=0x140005bb9d0 sp=0x140005bb9b0 pc=0x101148cc0 net/http.(*Server).Serve(0x1400012c800, {0x1025f2fc0, 0x140005100c0}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3424 +0x290 fp=0x140005bbb00 sp=0x140005bb9d0 pc=0x101122400 github.com/ollama/ollama/runner/llamarunner.Execute({0x14000132140, 0x4, 0x4}) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:1002 +0x7ac fp=0x140005bbcd0 sp=0x140005bbb00 pc=0x10134bfec github.com/ollama/ollama/runner.Execute({0x14000132130?, 0x0?, 0x0?}) /Users/runner/work/ollama/ollama/runner/runner.go:25 +0x1cc fp=0x140005bbd10 sp=0x140005bbcd0 pc=0x1014886fc github.com/ollama/ollama/cmd.NewCLI.func3(0x14000035500?, {0x102034986?, 0x4?, 0x10203498a?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:2273 +0x54 fp=0x140005bbd40 sp=0x140005bbd10 pc=0x101b8d714 github.com/spf13/cobra.(*Command).execute(0x14000363b08, {0x14000327580, 0x4, 0x4}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x140005bbe60 sp=0x140005bbd40 pc=0x100fba9c8 github.com/spf13/cobra.(*Command).ExecuteC(0x14000290908) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x140005bbf20 sp=0x140005bbe60 pc=0x100fbb110 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 fp=0x140005bbf40 sp=0x140005bbf20 pc=0x101b8ee94 runtime.main() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:283 +0x284 fp=0x140005bbfd0 sp=0x140005bbf40 pc=0x100e23464 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140005bbfd0 sp=0x140005bbfd0 pc=0x100e5f414 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006cf90 sp=0x1400006cf70 pc=0x100e56e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.forcegchelper() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:348 +0xb8 fp=0x1400006cfd0 sp=0x1400006cf90 pc=0x100e237b8 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006cfd0 sp=0x1400006cfd0 pc=0x100e5f414 created by runtime.init.7 in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:336 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006d760 sp=0x1400006d740 pc=0x100e56e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.bgsweep(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcsweep.go:316 +0x108 fp=0x1400006d7b0 sp=0x1400006d760 pc=0x100e0e898 runtime.gcenable.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x28 fp=0x1400006d7d0 sp=0x1400006d7b0 pc=0x100e02698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006d7d0 sp=0x1400006d7d0 pc=0x100e5f414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:204 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x102254360?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006df60 sp=0x1400006df40 pc=0x100e56e98 runtime.goparkunlock(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:441 runtime.(*scavengerState).park(0x1030a3960) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:425 +0x5c fp=0x1400006df90 sp=0x1400006df60 pc=0x100e0c32c runtime.bgscavenge(0x14000098000) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgcscavenge.go:658 +0xac fp=0x1400006dfb0 sp=0x1400006df90 pc=0x100e0c8cc runtime.gcenable.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0x28 fp=0x1400006dfd0 sp=0x1400006dfb0 pc=0x100e02638 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006dfd0 sp=0x1400006dfd0 pc=0x100e5f414 created by runtime.gcenable in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:205 +0xac goroutine 18 gp=0x14000102700 m=nil [finalizer wait]: runtime.gopark(0x180006c5c8?, 0x10338db88?, 0xc0?, 0x85?, 0x1c0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006c590 sp=0x1400006c570 pc=0x100e56e98 runtime.runfinq() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:196 +0x108 fp=0x1400006c7d0 sp=0x1400006c590 pc=0x100e01698 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006c7d0 sp=0x1400006c7d0 pc=0x100e5f414 created by runtime.createfing in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mfinal.go:166 +0x80 goroutine 5 gp=0x14000003a40 m=nil [chan receive]: runtime.gopark(0x140000b72c0?, 0x14000336048?, 0x48?, 0xe7?, 0x100f1fc58?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006e6f0 sp=0x1400006e6d0 pc=0x100e56e98 runtime.chanrecv(0x1400003a230, 0x0, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:664 +0x42c fp=0x1400006e770 sp=0x1400006e6f0 pc=0x100df3a0c runtime.chanrecv1(0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/chan.go:506 +0x14 fp=0x1400006e7a0 sp=0x1400006e770 pc=0x100df35a4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1799 +0x3c fp=0x1400006e7d0 sp=0x1400006e7a0 pc=0x100e058bc runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x100e5f414 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1794 +0x78 goroutine 6 gp=0x14000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ef10 sp=0x1400006eef0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006efb0 sp=0x1400006ef10 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006efd0 sp=0x1400006efb0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006efd0 sp=0x1400006efd0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x14000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068710 sp=0x140000686f0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000687b0 sp=0x14000068710 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000687d0 sp=0x140000687b0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000687d0 sp=0x140000687d0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x14000103180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000068f10 sp=0x14000068ef0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000068fb0 sp=0x14000068f10 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000068fd0 sp=0x14000068fb0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000068fd0 sp=0x14000068fd0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x14000306000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030c710 sp=0x1400030c6f0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030c7b0 sp=0x1400030c710 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030c7d0 sp=0x1400030c7b0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030c7d0 sp=0x1400030c7d0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x14000003dc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006f7b0 sp=0x1400006f710 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x140004ea000 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a3160b?, 0x3?, 0x55?, 0x4c?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400006ff10 sp=0x1400006fef0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400006ffb0 sp=0x1400006ff10 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400006ffd0 sp=0x1400006ffb0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400006ffd0 sp=0x1400006ffd0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 9 gp=0x140004ea1c0 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a144e0?, 0x3?, 0x0?, 0x7d?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000308710 sp=0x140003086f0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140003087b0 sp=0x14000308710 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140003087d0 sp=0x140003087b0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140003087d0 sp=0x140003087d0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x14000103340 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a2242e?, 0x3?, 0x5a?, 0x6e?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000069710 sp=0x140000696f0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x140000697b0 sp=0x14000069710 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x140000697d0 sp=0x140000697b0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x140000697d0 sp=0x140000697d0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x140003061c0 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a23de6?, 0x3?, 0x1e?, 0x2?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030cf10 sp=0x1400030cef0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030cfb0 sp=0x1400030cf10 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030cfd0 sp=0x1400030cfb0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030cfd0 sp=0x1400030cfd0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 10 gp=0x140004ea380 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a26302?, 0x1?, 0xcd?, 0x4e?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x14000308f10 sp=0x14000308ef0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x14000308fb0 sp=0x14000308f10 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x14000308fd0 sp=0x14000308fb0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000308fd0 sp=0x14000308fd0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x14000306380 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a23187?, 0x0?, 0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030d710 sp=0x1400030d6f0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030d7b0 sp=0x1400030d710 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030d7d0 sp=0x1400030d7b0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030d7d0 sp=0x1400030d7d0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x14000306540 m=nil [GC worker (idle)]: runtime.gopark(0x6bfe385a22816?, 0x3?, 0xca?, 0x8c?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030df10 sp=0x1400030def0 pc=0x100e56e98 runtime.gcBgMarkWorker(0x1400003b490) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1423 +0xdc fp=0x1400030dfb0 sp=0x1400030df10 pc=0x100e04b2c runtime.gcBgMarkStartWorkers.gowrap1() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x28 fp=0x1400030dfd0 sp=0x1400030dfb0 pc=0x100e04a18 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030dfd0 sp=0x1400030dfd0 pc=0x100e5f414 created by runtime.gcBgMarkStartWorkers in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/mgc.go:1339 +0x140 goroutine 38 gp=0x14000582380 m=nil [select]: runtime.gopark(0x14000045a60?, 0x2?, 0xa?, 0x0?, 0x14000045864?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x140000456b0 sp=0x14000045690 pc=0x100e56e98 runtime.selectgo(0x14000045a60, 0x14000045860, 0x10?, 0x0, 0x1?, 0x1) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/select.go:351 +0x6c4 fp=0x140000457e0 sp=0x140000456b0 pc=0x100e36ad4 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0x14000596140, {0x1025f31a0, 0x1400036a9a0}, 0x140004472c0) /Users/runner/work/ollama/ollama/runner/llamarunner/runner.go:716 +0xa1c fp=0x14000045aa0 sp=0x140000457e0 pc=0x1013499dc github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x1025f31a0?, 0x1400036a9a0?}, 0x14000045b28?) <autogenerated>:1 +0x40 fp=0x14000045ad0 sp=0x14000045aa0 pc=0x10134c600 net/http.HandlerFunc.ServeHTTP(0x1400059a000?, {0x1025f31a0?, 0x1400036a9a0?}, 0x14000045b10?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2294 +0x38 fp=0x14000045b00 sp=0x14000045ad0 pc=0x10111ee28 net/http.(*ServeMux).ServeHTTP(0x10?, {0x1025f31a0, 0x1400036a9a0}, 0x140004472c0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2822 +0x1b4 fp=0x14000045b50 sp=0x14000045b00 pc=0x1011209b4 net/http.serverHandler.ServeHTTP({0x1025ef230?}, {0x1025f31a0?, 0x1400036a9a0?}, 0x1?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3301 +0xbc fp=0x14000045b80 sp=0x14000045b50 pc=0x10113c69c net/http.(*conn).serve(0x140002a8240, {0x1025f59f8, 0x14000592360}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:2102 +0x52c fp=0x14000045fa0 sp=0x14000045b80 pc=0x10111d5cc net/http.(*Server).Serve.gowrap3() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x30 fp=0x14000045fd0 sp=0x14000045fa0 pc=0x101122790 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x14000045fd0 sp=0x14000045fd0 pc=0x100e5f414 created by net/http.(*Server).Serve in goroutine 1 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:3454 +0x3d8 goroutine 82 gp=0x14000583340 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x100e7ac30?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030fd80 sp=0x1400030fd60 pc=0x100e56e98 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x1400030fdc0 sp=0x1400030fd80 pc=0x100e1c8f8 internal/poll.runtime_pollWait(0x14b218a38, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x1400030fdf0 sp=0x1400030fdc0 pc=0x100e56050 internal/poll.(*pollDesc).wait(0x1400029e000?, 0x14000592041?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400030fe20 sp=0x1400030fdf0 pc=0x100ed6fe8 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x1400029e000, {0x14000592041, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x1400030fec0 sp=0x1400030fe20 pc=0x100ed829c net.(*netFD).Read(0x1400029e000, {0x14000592041?, 0x1400030ff58?, 0x101118044?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x1400030ff10 sp=0x1400030fec0 pc=0x100f4a0f8 net.(*conn).Read(0x14000122058, {0x14000592041?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x1400030ff60 sp=0x1400030ff10 pc=0x100f56fc4 net/http.(*connReader).backgroundRead(0x14000592030) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x1400030ffb0 sp=0x1400030ff60 pc=0x101117f40 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x1400030ffd0 sp=0x1400030ffb0 pc=0x101117e28 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030ffd0 sp=0x1400030ffd0 pc=0x100e5f414 created by net/http.(*connReader).startBackgroundRead in goroutine 38 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x183a09df8 r5 0x17398b650 r6 0x36 r7 0x0 r8 0xc171944f973fb752 r9 0xc171944ee4a68752 r10 0x3bb r11 0x6 r12 0x6 r13 0x17398b382 r14 0x1023263a8 r15 0x1 r16 0x148 r17 0x1f3a40ac0 r18 0x0 r19 0x6 r20 0x3313 r21 0x1739930e0 r22 0x0 r23 0x2 r24 0x152f053d8 r25 0x17398ca08 r26 0xcf80082c0 r27 0xcf8008000 r28 0x1 r29 0x17398bf40 lr 0x183afd88c sp 0x17398bf20 pc 0x183ac4388 fault 0x183ac4388 time=2026-04-02T10:54:06.529-07:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" time=2026-04-02T10:54:06.529-07:00 level=ERROR source=server.go:1612 msg="post predict" error="Post \"http://127.0.0.1:50238/completion\": EOF" [GIN] 2026/04/02 - 10:54:06 | 500 | 20.05786875s | 127.0.0.1 | POST "/api/chat" ```
Author
Owner

@micseydel commented on GitHub (Apr 5, 2026):

I didn't try raw llama.cpp before, but got it working on two machines. The 70b models I tried crawled, to the point of being unusable. Prompts used to take 2-3 minutes through Ollama's web API, but I can't say if direct llama.cpp outside Ollama used to work for my configuration or if I'm actually doing things right.

Apologies if the double-comment+delete caused confusion earlier, I'd be happy to provide llama.cpp logs if you let me know what you do (and maybe don't) want. I appreciate your project, it worked for me for months and was much easier to get going with than raw llama.cpp (which also hasn't saved me 😅).

<!-- gh-comment-id:4187961336 --> @micseydel commented on GitHub (Apr 5, 2026): I didn't try raw llama.cpp before, but got it working on two machines. The 70b models I tried crawled, to the point of being unusable. Prompts used to take 2-3 minutes through Ollama's web API, but I can't say if direct llama.cpp outside Ollama used to work for my configuration or if I'm actually doing things right. Apologies if the double-comment+delete caused confusion earlier, I'd be happy to provide llama.cpp logs if you let me know what you do (and maybe don't) want. I appreciate your project, it worked for me for months and was much easier to get going with than raw llama.cpp (which also hasn't saved me 😅).
Author
Owner

@rick-github commented on GitHub (Apr 5, 2026):

time=2026-04-02T10:11:41.668-07:00 level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="48.0 GiB" default_num_ctx=262144

OLLAMA_CONTEXT_LENGTH is not set so the server is using a default of 256k, based on the availabke VRAM.

time=2026-04-02T10:53:39.295-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072

256k is larger than the model was trained with, so the context size is set to the maximum supported by the model, 128k.

load_tensors: offloaded 30/81 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 40543.11 MiB
load_tensors: Metal_Mapped model buffer size = 39721.13 MiB

Because of the large context, only 30 layers fit in the GPU. The model is being loaded with the llama.cpp engine which sometimes gets the memory estimation wrong, so what I think is happening here is that the runner is trying to allocate more memory than is available on the GPU. Ways to deal with OOMs are shown here, but the easiest thing to do would be to set OLLAMA_CONTEXT_LENGTH to something more reasonable. If you do need a larger context, set OLLAMA_NEW_ENGINE=1 to get more accurate memory estimation.

<!-- gh-comment-id:4187987227 --> @rick-github commented on GitHub (Apr 5, 2026): ``` time=2026-04-02T10:11:41.668-07:00 level=INFO source=routes.go:1850 msg="vram-based default context" total_vram="48.0 GiB" default_num_ctx=262144 ``` `OLLAMA_CONTEXT_LENGTH` is not set so the server is using a default of 256k, based on the availabke VRAM. ``` time=2026-04-02T10:53:39.295-07:00 level=WARN source=server.go:169 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 ``` 256k is larger than the model was trained with, so the context size is set to the maximum supported by the model, 128k. ``` load_tensors: offloaded 30/81 layers to GPU load_tensors: CPU_Mapped model buffer size = 40543.11 MiB load_tensors: Metal_Mapped model buffer size = 39721.13 MiB ``` Because of the large context, only 30 layers fit in the GPU. The model is being loaded with the llama.cpp engine which sometimes gets the memory estimation wrong, so what I think is happening here is that the runner is trying to allocate more memory than is available on the GPU. Ways to deal with OOMs are shown [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288), but the easiest thing to do would be to set `OLLAMA_CONTEXT_LENGTH` to something more reasonable. If you do need a larger context, set `OLLAMA_NEW_ENGINE=1` to get more accurate memory estimation.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15228
Analyzed: 2026-04-18T18:22:53.126160

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274310822 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15228 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15228 **Analyzed**: 2026-04-18T18:22:53.126160 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Author
Owner

@micseydel commented on GitHub (Apr 18, 2026):

Changing the context size in the Ollama GUI got things working again. (I also tinkered more with raw llama.cpp and couldn't get it working.)

If you think this ticket could be resolved by someone new to the codebase, I might be down to volunteer. I'd prefer newcomers have a "just works" experience like I did when I first tried Ollama.

<!-- gh-comment-id:4274351354 --> @micseydel commented on GitHub (Apr 18, 2026): Changing the context size in the Ollama GUI got things working again. (I also tinkered more with raw llama.cpp and couldn't get it working.) If you think this ticket could be resolved by someone new to the codebase, I might be down to volunteer. I'd prefer newcomers have a "just works" experience like I did when I first tried Ollama.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56250