[GH-ISSUE #12521] Hit error 500 when running 0.12.4-rc6 on nvidia jetpack 6 #34070

Closed
opened 2026-04-22 17:18:49 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @kentsuiGitHub on GitHub (Oct 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12521

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Hit error 500 when running 0.12.4-rc6 on nvidia jetpack 6

Traceback (most recent call last):
File "/test/test.py", line 16, in
ollama.chat(model=args.model, messages=[{"role":"user", "content":args.prompt}])
File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 351, in chat
return self._request(
File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 189, in _request
return cls(**self._request_raw(*args, **kwargs).json())
File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 133, in _request_raw
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: do load request: Post "http://127.0.0.1:43111/load": EOF (status code: 500)

Relevant log output


OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.12.4-rc6

Originally created by @kentsuiGitHub on GitHub (Oct 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12521 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Hit error 500 when running 0.12.4-rc6 on nvidia jetpack 6 Traceback (most recent call last): File "/test/test.py", line 16, in <module> ollama.chat(model=args.model, messages=[{"role":"user", "content":args.prompt}]) File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 351, in chat return self._request( File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 189, in _request return cls(**self._request_raw(*args, **kwargs).json()) File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 133, in _request_raw raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: do load request: Post "http://127.0.0.1:43111/load": EOF (status code: 500) ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.12.4-rc6
GiteaMirror added the nvidiabug labels 2026-04-22 17:18:50 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 7, 2025):

Server log may help in debugging.

<!-- gh-comment-id:3375750675 --> @rick-github commented on GitHub (Oct 7, 2025): [Server log]( https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging.
Author
Owner

@kentsuiGitHub commented on GitHub (Oct 7, 2025):

0.12.4-rc6.log

<!-- gh-comment-id:3376968949 --> @kentsuiGitHub commented on GitHub (Oct 7, 2025): [0.12.4-rc6.log](https://github.com/user-attachments/files/22745355/0.12.4-rc6.log)
Author
Owner

@kentsuiGitHub commented on GitHub (Oct 7, 2025):

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJH0HBrdj8sciXEkACu5seTlPxZtaCLyI8F0jYJ9OB9C

time=2025-10-07T13:41:50.163Z level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:256 OLLAMA_MODELS:/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-10-07T13:41:50.172Z level=INFO source=images.go:522 msg="total blobs: 100"
time=2025-10-07T13:41:50.175Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-07T13:41:50.176Z level=INFO source=routes.go:1528 msg="Listening on [::]:11434 (version 0.12.4-rc6)"
time=2025-10-07T13:41:50.177Z level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-07T13:41:50.778Z level=INFO source=types.go:111 msg="inference compute" id=GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 library=CUDA compute=8.7 name=CUDA0 description=Orin libdirs=ollama,cuda_v12 driver=12.6 pci_id=00:00.0 type=iGPU total="61.4 GiB" available="59.2 GiB"
[GIN] 2025/10/07 - 13:42:00 | 200 | 171.577µs | 127.0.0.1 | GET  "/api/version"
[GIN] 2025/10/07 - 13:42:10 | 200 | 68.797µs | 127.0.0.1 | HEAD  "/"
[GIN] 2025/10/07 - 13:42:10 | 200 | 230.212497ms | 127.0.0.1 | POST  "/api/show"
time=2025-10-07T13:42:11.600Z level=INFO source=server.go:216 msg="enabling flash attention"
time=2025-10-07T13:42:11.601Z level=INFO source=server.go:395 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /ollama/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 42003"
time=2025-10-07T13:42:11.602Z level=INFO source=server.go:670 msg="loading model" "model layers"=25 requested=-1
time=2025-10-07T13:42:11.603Z level=INFO source=server.go:676 msg="system memory" total="61.4 GiB" free="59.5 GiB" free_swap="30.6 GiB"
time=2025-10-07T13:42:11.603Z level=INFO source=server.go:684 msg="gpu memory" id=GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 library=CUDA available="58.8 GiB" free="59.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-10-07T13:42:11.620Z level=INFO source=runner.go:1299 msg="starting ollama engine"
time=2025-10-07T13:42:11.625Z level=INFO source=runner.go:1335 msg="Server listening on 127.0.0.1:42003"
time=2025-10-07T13:42:11.626Z level=INFO source=runner.go:1172 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType:q8_0 NumThreads:12 GPULayers:25[ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-07T13:42:11.742Z level=INFO source=ggml.go:133 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes, ID: GPU-39bc44a2-5293-57f9-a322-74b8fccb5150
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-10-07T13:42:11.919Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
CUDA error: the resource allocation failed
current device: 0, in function cublas_handle at //ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041
cublasCreate_v2(&cublas_handles[device])
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error
[New LWP 154]
[New LWP 155]
[New LWP 156]
[New LWP 157]
[New LWP 158]
[New LWP 159]
[New LWP 160]
[New LWP 161]
[New LWP 162]
[New LWP 163]
[New LWP 164]
[New LWP 165]
[New LWP 166]
[New LWP 167]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000aaaab324108c in ?? ()
#0 0x0000aaaab324108c in ?? ()
#1 0x0000000000000080 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
[Inferior 1 (process 153) detached]
SIGABRT: abort
PC=0xffff814a2008 m=8 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 9 gp=0x4000582fc0 m=8 mp=0x4000600008 [syscall]:
runtime.cgocall(0xaaaab3d749a8, 0x400004b0d8)
runtime/cgocall.go:167 +0x44 fp=0x400004b090 sp=0x400004b050 pc=0xaaaab3234da4
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xffff14790c20, 0xffff08a67760)
_cgo_gotypes.go:996 +0x34 fp=0x400004b0d0 sp=0x400004b090 pc=0xaaaab3610d04
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func1(...)
github.com/ollama/ollama/ml/backend/ggml/ggml.go:828
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x4001576e80)
github.com/ollama/ollama/ml/backend/ggml/ggml.go:828 +0x8c fp=0x400004b350 sp=0x400004b0d0 pc=0xaaaab361a7dc
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x400022f0e0)
github.com/ollama/ollama/runner/ollamarunner/runner.go:1068 +0xa18 fp=0x400004b680 sp=0x400004b350 pc=0xaaaab36b6e58
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x400022f0e0, {0xfffff7b95570?, 0x0?}, {0x0, 0xc, {0x40002fe080, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...)
github.com/ollama/ollama/runner/ollamarunner/runner.go:1125 +0x22c fp=0x400004b710 sp=0x400004b680 pc=0xaaaab36b725c
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x400022f0e0, {0xaaaab454ae28, 0x4000232000}, 0x4000178000)
github.com/ollama/ollama/runner/ollamarunner/runner.go:1199 +0x460 fp=0x400004baa0 sp=0x400004b710 pc=0xaaaab36b7b30
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xaaaab454ae28?, 0x4000232000?}, 0x40005a1b28?)
:1 +0x40 fp=0x400004bad0 sp=0x400004baa0 pc=0xaaaab36b9a60
net/http.HandlerFunc.ServeHTTP(0x40000c6c00?, {0xaaaab454ae28?, 0x4000232000?}, 0x40005a1b10?)
net/http/server.go:2294 +0x38 fp=0x400004bb00 sp=0x400004bad0 pc=0xaaaab34f01f8
net/http.(*ServeMux).ServeHTTP(0x10?, {0xaaaab454ae28, 0x4000232000}, 0x4000178000)
net/http/server.go:2822 +0x1b4 fp=0x400004bb50 sp=0x400004bb00 pc=0xaaaab34f1d84
net/http.serverHandler.ServeHTTP({0xaaaab4547430?}, {0xaaaab454ae28?, 0x4000232000?}, 0x1?)
net/http/server.go:3301 +0xbc fp=0x400004bb80 sp=0x400004bb50 pc=0xaaaab350da6c
net/http.(*conn).serve(0x40000e83f0, {0xaaaab454d188, 0x40000e4c00})
net/http/server.go:2102 +0x52c fp=0x400004bfa0 sp=0x400004bb80 pc=0xaaaab34ee99c
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3454 +0x30 fp=0x400004bfd0 sp=0x400004bfa0 pc=0xaaaab34f3b60
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x400004bfd0 sp=0x400004bfd0 pc=0xaaaab323fc34
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3454 +0x3d8

goroutine 1 gp=0x40000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000fb5710 sp=0x4000fb56f0 pc=0xaaaab32382b8
runtime.netpollblock(0x7000000000?, 0x6?, 0x0?)
runtime/netpoll.go:575 +0x158 fp=0x4000fb5750 sp=0x4000fb5710 pc=0xaaaab31fcc28
internal/poll.runtime_pollWait(0xffff398f7de0, 0x72)
runtime/netpoll.go:351 +0xa0 fp=0x4000fb5780 sp=0x4000fb5750 pc=0xaaaab3237470
internal/poll.(*pollDesc).wait(0x40000e2580?, 0xaaaab32bfac8?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000fb57b0 sp=0x4000fb5780 pc=0xaaaab32b9068
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x40000e2580)
internal/poll/fd_unix.go:620 +0x24c fp=0x4000fb5860 sp=0x4000fb57b0 pc=0xaaaab32bd93c
net.(*netFD).accept(0x40000e2580)
net/fd_unix.go:172 +0x28 fp=0x4000fb5920 sp=0x4000fb5860 pc=0xaaaab332c398
net.(*TCPListener).accept(0x400051d580)
net/tcpsock_posix.go:159 +0x24 fp=0x4000fb5970 sp=0x4000fb5920 pc=0xaaaab33412b4
net.(*TCPListener).Accept(0x400051d580)
net/tcpsock.go:380 +0x2c fp=0x4000fb59b0 sp=0x4000fb5970 pc=0xaaaab334024c
net/http.(*onceCloseListener).Accept(0x40000e83f0?)
:1 +0x30 fp=0x4000fb59d0 sp=0x4000fb59b0 pc=0xaaaab351a090
net/http.(*Server).Serve(0x40000ab100, {0xaaaab454ac48, 0x400051d580})
net/http/server.go:3424 +0x290 fp=0x4000fb5b00 sp=0x4000fb59d0 pc=0xaaaab34f37d0
github.com/ollama/ollama/runner/ollamarunner.Execute({0x4000134030, 0x4, 0x4})
github.com/ollama/ollama/runner/ollamarunner/runner.go:1336 +0x824 fp=0x4000fb5ce0 sp=0x4000fb5b00 pc=0xaaaab36b9484
github.com/ollama/ollama/runner.Execute({0x4000134010?, 0x0?, 0x0?})
github.com/ollama/ollama/runner/runner.go:20 +0x120 fp=0x4000fb5d10 sp=0x4000fb5ce0 pc=0xaaaab36b9d80
github.com/ollama/ollama/cmd.NewCLI.func2(0x40000aaf00?, {0xaaaab40410a4?, 0x4?, 0xaaaab40410a8?})
github.com/ollama/ollama/cmd/cmd.go:1769 +0x54 fp=0x4000fb5d40 sp=0x4000fb5d10 pc=0xaaaab3d24924
github.com/spf13/cobra.(*Command).execute(0x40000eb508, {0x4000521540, 0x5, 0x5})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x4000fb5e60 sp=0x4000fb5d40 pc=0xaaaab339b798
github.com/spf13/cobra.(*Command).ExecuteC(0x40000c8908)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x4000fb5f20 sp=0x4000fb5e60 pc=0xaaaab339bee0
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x54 fp=0x4000fb5f40 sp=0x4000fb5f20 pc=0xaaaab3d25464
runtime.main()
runtime/proc.go:283 +0x284 fp=0x4000fb5fd0 sp=0x4000fb5f40 pc=0xaaaab3203fd4
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000fb5fd0 sp=0x4000fb5fd0 pc=0xaaaab323fc34

goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000074f90 sp=0x4000074f70 pc=0xaaaab32382b8
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.forcegchelper()
runtime/proc.go:348 +0xb8 fp=0x4000074fd0 sp=0x4000074f90 pc=0xaaaab3204328
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000074fd0 sp=0x4000074fd0 pc=0xaaaab323fc34
created by runtime.init.7 in goroutine 1
runtime/proc.go:336 +0x24

goroutine 18 gp=0x4000102380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000070760 sp=0x4000070740 pc=0xaaaab32382b8
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.bgsweep(0x4000110000)
runtime/mgcsweep.go:316 +0x108 fp=0x40000707b0 sp=0x4000070760 pc=0xaaaab31eeb58
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x28 fp=0x40000707d0 sp=0x40000707b0 pc=0xaaaab31e2988
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000707d0 sp=0x40000707d0 pc=0xaaaab323fc34
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x6c

goroutine 19 gp=0x4000102540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0xaaaab4202230?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000070f60 sp=0x4000070f40 pc=0xaaaab32382b8
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.(*scavengerState).park(0xaaaab4dafd60)
runtime/mgcscavenge.go:425 +0x5c fp=0x4000070f90 sp=0x4000070f60 pc=0xaaaab31ec61c
runtime.bgscavenge(0x4000110000)
runtime/mgcscavenge.go:658 +0xac fp=0x4000070fb0 sp=0x4000070f90 pc=0xaaaab31ecb9c
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x28 fp=0x4000070fd0 sp=0x4000070fb0 pc=0xaaaab31e2928
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000070fd0 sp=0x4000070fd0 pc=0xaaaab323fc34
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xac

goroutine 20 gp=0x4000102a80 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000071590 sp=0x4000071570 pc=0xaaaab32382b8
runtime.runfinq()
runtime/mfinal.go:196 +0x108 fp=0x40000717d0 sp=0x4000071590 pc=0xaaaab31e1988
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000717d0 sp=0x40000717d0 pc=0xaaaab323fc34
created by runtime.createfing in goroutine 1
runtime/mfinal.go:166 +0x80

goroutine 21 gp=0x4000103500 m=nil [chan receive]:
runtime.gopark(0x4000183a40?, 0x4000172180?, 0x48?, 0x1f?, 0xaaaab3304958?)
runtime/proc.go:435 +0xc8 fp=0x4000071ef0 sp=0x4000071ed0 pc=0xaaaab32382b8
runtime.chanrecv(0x4000118310, 0x0, 0x1)
runtime/chan.go:664 +0x42c fp=0x4000071f70 sp=0x4000071ef0 pc=0xaaaab31d391c
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:506 +0x14 fp=0x4000071fa0 sp=0x4000071f70 pc=0xaaaab31d34b4
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1799 +0x3c fp=0x4000071fd0 sp=0x4000071fa0 pc=0xaaaab31e5bac
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000071fd0 sp=0x4000071fd0 pc=0xaaaab323fc34
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1794 +0x78

goroutine 22 gp=0x4000103880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000072710 sp=0x40000726f0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x40000727b0 sp=0x4000072710 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x40000727d0 sp=0x40000727b0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000727d0 sp=0x40000727d0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 34 gp=0x4000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x400050a710 sp=0x400050a6f0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x400050a7b0 sp=0x400050a710 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x400050a7d0 sp=0x400050a7b0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x400050a7d0 sp=0x400050a7d0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 3 gp=0x4000003500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0xaaaab4d38118?, 0x18?, 0x81?, 0xaaaab4db2540?)
runtime/proc.go:435 +0xc8 fp=0x4000074710 sp=0x40000746f0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x40000747b0 sp=0x4000074710 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x40000747d0 sp=0x40000747b0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000747d0 sp=0x40000747d0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 4 gp=0x40000036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000075710 sp=0x40000756f0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x40000757b0 sp=0x4000075710 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x40000757d0 sp=0x40000757b0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000757d0 sp=0x40000757d0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 35 gp=0x40005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x124249473154?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x400050af10 sp=0x400050aef0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x400050afb0 sp=0x400050af10 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x400050afd0 sp=0x400050afb0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x400050afd0 sp=0x400050afd0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 23 gp=0x4000103a40 m=nil [GC worker (idle)]:
runtime.gopark(0xaaaab4e65ee0?, 0x1?, 0x22?, 0x49?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000072f10 sp=0x4000072ef0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x4000072fb0 sp=0x4000072f10 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x4000072fd0 sp=0x4000072fb0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000072fd0 sp=0x4000072fd0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 5 gp=0x4000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x12424907be08?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000075f10 sp=0x4000075ef0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x4000075fb0 sp=0x4000075f10 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x4000075fd0 sp=0x4000075fb0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000075fd0 sp=0x4000075fd0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x4000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x12424907cb28?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000088f10 sp=0x4000088ef0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x4000088fb0 sp=0x4000088f10 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x4000088fd0 sp=0x4000088fb0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000088fd0 sp=0x4000088fd0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 24 gp=0x4000103c00 m=nil [GC worker (idle)]:
runtime.gopark(0x12424907c808?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000073710 sp=0x40000736f0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x40000737b0 sp=0x4000073710 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x40000737d0 sp=0x40000737b0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000737d0 sp=0x40000737d0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 6 gp=0x4000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x124249530e53?, 0x3?, 0xf2?, 0xa9?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000076710 sp=0x40000766f0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x40000767b0 sp=0x4000076710 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x40000767d0 sp=0x40000767b0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40000767d0 sp=0x40000767d0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x4000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x12424907ca68?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x400050bf10 sp=0x400050bef0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x400050bfb0 sp=0x400050bf10 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x400050bfd0 sp=0x400050bfb0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x400050bfd0 sp=0x400050bfd0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x4000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x124249594281?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000076f10 sp=0x4000076ef0 pc=0xaaaab32382b8
runtime.gcBgMarkWorker(0x4000119730)
runtime/mgc.go:1423 +0xdc fp=0x4000076fb0 sp=0x4000076f10 pc=0xaaaab31e4e1c
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x28 fp=0x4000076fd0 sp=0x4000076fb0 pc=0xaaaab31e4d08
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x4000076fd0 sp=0x4000076fd0 pc=0xaaaab323fc34
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x4000582e00 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0xaaaab4dc0160?, 0xaaaab4304532?, 0xc0?, 0x40?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x400008db30 sp=0x400008db10 pc=0xaaaab32382b8
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.semacquire1(0x400022f198, 0x0, 0x1, 0x0, 0x18)
runtime/sema.go:188 +0x204 fp=0x400008db80 sp=0x400008db30 pc=0xaaaab3218474
sync.runtime_SemacquireWaitGroup(0x0?)
runtime/sema.go:110 +0x2c fp=0x400008dbc0 sp=0x400008db80 pc=0xaaaab3239d5c
sync.(*WaitGroup).Wait(0x400022f190)
sync/waitgroup.go:118 +0x70 fp=0x400008dbe0 sp=0x400008dbc0 pc=0xaaaab324b1f0
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x400022f0e0, {0xaaaab454d1c0, 0x40005215e0})
github.com/ollama/ollama/runner/ollamarunner/runner.go:407 +0x38 fp=0x400008dfa0 sp=0x400008dbe0 pc=0xaaaab36b1cc8
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
github.com/ollama/ollama/runner/ollamarunner/runner.go:1313 +0x30 fp=0x400008dfd0 sp=0x400008dfa0 pc=0xaaaab36b96b0
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x400008dfd0 sp=0x400008dfd0 pc=0xaaaab323fc34
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/ollamarunner/runner.go:1313 +0x470

goroutine 38 gp=0x40004ae380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xc8 fp=0x4000506580 sp=0x4000506560 pc=0xaaaab32382b8
runtime.netpollblock(0x0?, 0xffffffff?, 0xff?)
runtime/netpoll.go:575 +0x158 fp=0x40005065c0 sp=0x4000506580 pc=0xaaaab31fcc28
internal/poll.runtime_pollWait(0xffff398f7cc8, 0x72)
runtime/netpoll.go:351 +0xa0 fp=0x40005065f0 sp=0x40005065c0 pc=0xaaaab3237470
internal/poll.(*pollDesc).wait(0x40000e2600?, 0x4000590041?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000506620 sp=0x40005065f0 pc=0xaaaab32b9068
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x40000e2600, {0x4000590041, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x1fc fp=0x40005066c0 sp=0x4000506620 pc=0xaaaab32ba31c
net.(*netFD).Read(0x40000e2600, {0x4000590041?, 0x0?, 0x0?})
net/fd_posix.go:55 +0x28 fp=0x4000506710 sp=0x40005066c0 pc=0xaaaab332a968
net.(*conn).Read(0x4000122a08, {0x4000590041?, 0x0?, 0x0?})
net/net.go:194 +0x34 fp=0x4000506760 sp=0x4000506710 pc=0xaaaab33380a4
net/http.(*connReader).backgroundRead(0x4000590030)
net/http/server.go:690 +0x40 fp=0x40005067b0 sp=0x4000506760 pc=0xaaaab34e9310
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x28 fp=0x40005067d0 sp=0x40005067b0 pc=0xaaaab34e91f8
runtime.goexit({})
runtime/asm_arm64.s:1223 +0x4 fp=0x40005067d0 sp=0x40005067d0 pc=0xaaaab323fc34
created by net/http.(*connReader).startBackgroundRead in goroutine 9
net/http/server.go:686 +0xc4

r0 0x0
r1 0xa0
r2 0x6
r3 0xffff32fdf0c0
r4 0xffff8199cb58
r5 0xffff32fddc20
r6 0xfffffff8
r7 0xffff32fddc00
r8 0x83
r9 0x0
r10 0x6e6f6974616c7563
r11 0x101010101010101
r12 0x746977735f747874
r13 0x0
r14 0xf0b07021a1e0d06
r15 0x30
r16 0x1
r17 0xffff8144704c
r18 0x2
r19 0xa0
r20 0xffff32fdf0c0
r21 0x6
r22 0xfffefffef000
r23 0x0
r24 0xffff147c99a8
r25 0x4000063e78
r26 0xaaaab4536da0
r27 0x10
r28 0x40006021c0
r29 0xffff32fddb10
lr 0xffff814a1ff4
sp 0xffff32fdda80
pc 0xffff814a2008
fault 0x0
time=2025-10-07T13:42:12.503Z level=INFO source=sched.go:448 msg="Load failed" model=/ollama/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 error="do load request: Post "http://127.0.0.1:42003/load": EOF"
time=2025-10-07T13:42:12.505Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2"
[GIN] 2025/10/07 - 13:42:12 | 500 | 2.086801712s | 127.0.0.1 | POST  "/api/generate"

<!-- gh-comment-id:3376971309 --> @kentsuiGitHub commented on GitHub (Oct 7, 2025): Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJH0HBrdj8sciXEkACu5seTlPxZtaCLyI8F0jYJ9OB9C time=2025-10-07T13:41:50.163Z level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:256 OLLAMA_MODELS:/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-07T13:41:50.172Z level=INFO source=images.go:522 msg="total blobs: 100" time=2025-10-07T13:41:50.175Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-07T13:41:50.176Z level=INFO source=routes.go:1528 msg="Listening on [::]:11434 (version 0.12.4-rc6)" time=2025-10-07T13:41:50.177Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-07T13:41:50.778Z level=INFO source=types.go:111 msg="inference compute" id=GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 library=CUDA compute=8.7 name=CUDA0 description=Orin libdirs=ollama,cuda_v12 driver=12.6 pci_id=00:00.0 type=iGPU total="61.4 GiB" available="59.2 GiB" [GIN] 2025/10/07 - 13:42:00 | 200 | 171.577µs | 127.0.0.1 | GET  "/api/version" [GIN] 2025/10/07 - 13:42:10 | 200 | 68.797µs | 127.0.0.1 | HEAD  "/" [GIN] 2025/10/07 - 13:42:10 | 200 | 230.212497ms | 127.0.0.1 | POST  "/api/show" time=2025-10-07T13:42:11.600Z level=INFO source=server.go:216 msg="enabling flash attention" time=2025-10-07T13:42:11.601Z level=INFO source=server.go:395 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /ollama/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 42003" time=2025-10-07T13:42:11.602Z level=INFO source=server.go:670 msg="loading model" "model layers"=25 requested=-1 time=2025-10-07T13:42:11.603Z level=INFO source=server.go:676 msg="system memory" total="61.4 GiB" free="59.5 GiB" free_swap="30.6 GiB" time=2025-10-07T13:42:11.603Z level=INFO source=server.go:684 msg="gpu memory" id=GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 library=CUDA available="58.8 GiB" free="59.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-10-07T13:42:11.620Z level=INFO source=runner.go:1299 msg="starting ollama engine" time=2025-10-07T13:42:11.625Z level=INFO source=runner.go:1335 msg="Server listening on 127.0.0.1:42003" time=2025-10-07T13:42:11.626Z level=INFO source=runner.go:1172 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType:q8_0 NumThreads:12 GPULayers:25[ID:GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-07T13:42:11.742Z level=INFO source=ggml.go:133 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: Orin, compute capability 8.7, VMM: yes, ID: GPU-39bc44a2-5293-57f9-a322-74b8fccb5150 load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-10-07T13:42:11.919Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) CUDA error: the resource allocation failed current device: 0, in function cublas_handle at //ml/backend/ggml/ggml/src/ggml-cuda/common.cuh:1041 cublasCreate_v2(&cublas_handles[device]) //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:88: CUDA error [New LWP 154] [New LWP 155] [New LWP 156] [New LWP 157] [New LWP 158] [New LWP 159] [New LWP 160] [New LWP 161] [New LWP 162] [New LWP 163] [New LWP 164] [New LWP 165] [New LWP 166] [New LWP 167] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000aaaab324108c in ?? () #0 0x0000aaaab324108c in ?? () #1 0x0000000000000080 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) [Inferior 1 (process 153) detached] SIGABRT: abort PC=0xffff814a2008 m=8 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 9 gp=0x4000582fc0 m=8 mp=0x4000600008 [syscall]: runtime.cgocall(0xaaaab3d749a8, 0x400004b0d8) runtime/cgocall.go:167 +0x44 fp=0x400004b090 sp=0x400004b050 pc=0xaaaab3234da4 github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xffff14790c20, 0xffff08a67760) _cgo_gotypes.go:996 +0x34 fp=0x400004b0d0 sp=0x400004b090 pc=0xaaaab3610d04 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:828 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x4001576e80) github.com/ollama/ollama/ml/backend/ggml/ggml.go:828 +0x8c fp=0x400004b350 sp=0x400004b0d0 pc=0xaaaab361a7dc github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x400022f0e0) github.com/ollama/ollama/runner/ollamarunner/runner.go:1068 +0xa18 fp=0x400004b680 sp=0x400004b350 pc=0xaaaab36b6e58 github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x400022f0e0, {0xfffff7b95570?, 0x0?}, {0x0, 0xc, {0x40002fe080, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1125 +0x22c fp=0x400004b710 sp=0x400004b680 pc=0xaaaab36b725c github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x400022f0e0, {0xaaaab454ae28, 0x4000232000}, 0x4000178000) github.com/ollama/ollama/runner/ollamarunner/runner.go:1199 +0x460 fp=0x400004baa0 sp=0x400004b710 pc=0xaaaab36b7b30 github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xaaaab454ae28?, 0x4000232000?}, 0x40005a1b28?) <autogenerated>:1 +0x40 fp=0x400004bad0 sp=0x400004baa0 pc=0xaaaab36b9a60 net/http.HandlerFunc.ServeHTTP(0x40000c6c00?, {0xaaaab454ae28?, 0x4000232000?}, 0x40005a1b10?) net/http/server.go:2294 +0x38 fp=0x400004bb00 sp=0x400004bad0 pc=0xaaaab34f01f8 net/http.(*ServeMux).ServeHTTP(0x10?, {0xaaaab454ae28, 0x4000232000}, 0x4000178000) net/http/server.go:2822 +0x1b4 fp=0x400004bb50 sp=0x400004bb00 pc=0xaaaab34f1d84 net/http.serverHandler.ServeHTTP({0xaaaab4547430?}, {0xaaaab454ae28?, 0x4000232000?}, 0x1?) net/http/server.go:3301 +0xbc fp=0x400004bb80 sp=0x400004bb50 pc=0xaaaab350da6c net/http.(*conn).serve(0x40000e83f0, {0xaaaab454d188, 0x40000e4c00}) net/http/server.go:2102 +0x52c fp=0x400004bfa0 sp=0x400004bb80 pc=0xaaaab34ee99c net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x30 fp=0x400004bfd0 sp=0x400004bfa0 pc=0xaaaab34f3b60 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400004bfd0 sp=0x400004bfd0 pc=0xaaaab323fc34 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x3d8 goroutine 1 gp=0x40000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000fb5710 sp=0x4000fb56f0 pc=0xaaaab32382b8 runtime.netpollblock(0x7000000000?, 0x6?, 0x0?) runtime/netpoll.go:575 +0x158 fp=0x4000fb5750 sp=0x4000fb5710 pc=0xaaaab31fcc28 internal/poll.runtime_pollWait(0xffff398f7de0, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x4000fb5780 sp=0x4000fb5750 pc=0xaaaab3237470 internal/poll.(*pollDesc).wait(0x40000e2580?, 0xaaaab32bfac8?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000fb57b0 sp=0x4000fb5780 pc=0xaaaab32b9068 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x40000e2580) internal/poll/fd_unix.go:620 +0x24c fp=0x4000fb5860 sp=0x4000fb57b0 pc=0xaaaab32bd93c net.(*netFD).accept(0x40000e2580) net/fd_unix.go:172 +0x28 fp=0x4000fb5920 sp=0x4000fb5860 pc=0xaaaab332c398 net.(*TCPListener).accept(0x400051d580) net/tcpsock_posix.go:159 +0x24 fp=0x4000fb5970 sp=0x4000fb5920 pc=0xaaaab33412b4 net.(*TCPListener).Accept(0x400051d580) net/tcpsock.go:380 +0x2c fp=0x4000fb59b0 sp=0x4000fb5970 pc=0xaaaab334024c net/http.(*onceCloseListener).Accept(0x40000e83f0?) <autogenerated>:1 +0x30 fp=0x4000fb59d0 sp=0x4000fb59b0 pc=0xaaaab351a090 net/http.(*Server).Serve(0x40000ab100, {0xaaaab454ac48, 0x400051d580}) net/http/server.go:3424 +0x290 fp=0x4000fb5b00 sp=0x4000fb59d0 pc=0xaaaab34f37d0 github.com/ollama/ollama/runner/ollamarunner.Execute({0x4000134030, 0x4, 0x4}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1336 +0x824 fp=0x4000fb5ce0 sp=0x4000fb5b00 pc=0xaaaab36b9484 github.com/ollama/ollama/runner.Execute({0x4000134010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0x120 fp=0x4000fb5d10 sp=0x4000fb5ce0 pc=0xaaaab36b9d80 github.com/ollama/ollama/cmd.NewCLI.func2(0x40000aaf00?, {0xaaaab40410a4?, 0x4?, 0xaaaab40410a8?}) github.com/ollama/ollama/cmd/cmd.go:1769 +0x54 fp=0x4000fb5d40 sp=0x4000fb5d10 pc=0xaaaab3d24924 github.com/spf13/cobra.(*Command).execute(0x40000eb508, {0x4000521540, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x4000fb5e60 sp=0x4000fb5d40 pc=0xaaaab339b798 github.com/spf13/cobra.(*Command).ExecuteC(0x40000c8908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x4000fb5f20 sp=0x4000fb5e60 pc=0xaaaab339bee0 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x4000fb5f40 sp=0x4000fb5f20 pc=0xaaaab3d25464 runtime.main() runtime/proc.go:283 +0x284 fp=0x4000fb5fd0 sp=0x4000fb5f40 pc=0xaaaab3203fd4 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000fb5fd0 sp=0x4000fb5fd0 pc=0xaaaab323fc34 goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000074f90 sp=0x4000074f70 pc=0xaaaab32382b8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0x4000074fd0 sp=0x4000074f90 pc=0xaaaab3204328 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000074fd0 sp=0x4000074fd0 pc=0xaaaab323fc34 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x24 goroutine 18 gp=0x4000102380 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000070760 sp=0x4000070740 pc=0xaaaab32382b8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0x4000110000) runtime/mgcsweep.go:316 +0x108 fp=0x40000707b0 sp=0x4000070760 pc=0xaaaab31eeb58 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x28 fp=0x40000707d0 sp=0x40000707b0 pc=0xaaaab31e2988 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000707d0 sp=0x40000707d0 pc=0xaaaab323fc34 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x6c goroutine 19 gp=0x4000102540 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0xaaaab4202230?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000070f60 sp=0x4000070f40 pc=0xaaaab32382b8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0xaaaab4dafd60) runtime/mgcscavenge.go:425 +0x5c fp=0x4000070f90 sp=0x4000070f60 pc=0xaaaab31ec61c runtime.bgscavenge(0x4000110000) runtime/mgcscavenge.go:658 +0xac fp=0x4000070fb0 sp=0x4000070f90 pc=0xaaaab31ecb9c runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x28 fp=0x4000070fd0 sp=0x4000070fb0 pc=0xaaaab31e2928 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000070fd0 sp=0x4000070fd0 pc=0xaaaab323fc34 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xac goroutine 20 gp=0x4000102a80 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000071590 sp=0x4000071570 pc=0xaaaab32382b8 runtime.runfinq() runtime/mfinal.go:196 +0x108 fp=0x40000717d0 sp=0x4000071590 pc=0xaaaab31e1988 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000717d0 sp=0x40000717d0 pc=0xaaaab323fc34 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x80 goroutine 21 gp=0x4000103500 m=nil [chan receive]: runtime.gopark(0x4000183a40?, 0x4000172180?, 0x48?, 0x1f?, 0xaaaab3304958?) runtime/proc.go:435 +0xc8 fp=0x4000071ef0 sp=0x4000071ed0 pc=0xaaaab32382b8 runtime.chanrecv(0x4000118310, 0x0, 0x1) runtime/chan.go:664 +0x42c fp=0x4000071f70 sp=0x4000071ef0 pc=0xaaaab31d391c runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x14 fp=0x4000071fa0 sp=0x4000071f70 pc=0xaaaab31d34b4 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x3c fp=0x4000071fd0 sp=0x4000071fa0 pc=0xaaaab31e5bac runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000071fd0 sp=0x4000071fd0 pc=0xaaaab323fc34 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x78 goroutine 22 gp=0x4000103880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000072710 sp=0x40000726f0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x40000727b0 sp=0x4000072710 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000727d0 sp=0x40000727b0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000727d0 sp=0x40000727d0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x4000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050a710 sp=0x400050a6f0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x400050a7b0 sp=0x400050a710 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050a7d0 sp=0x400050a7b0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050a7d0 sp=0x400050a7d0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 3 gp=0x4000003500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0xaaaab4d38118?, 0x18?, 0x81?, 0xaaaab4db2540?) runtime/proc.go:435 +0xc8 fp=0x4000074710 sp=0x40000746f0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x40000747b0 sp=0x4000074710 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000747d0 sp=0x40000747b0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000747d0 sp=0x40000747d0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 4 gp=0x40000036c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000075710 sp=0x40000756f0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x40000757b0 sp=0x4000075710 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000757d0 sp=0x40000757b0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000757d0 sp=0x40000757d0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x40005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x124249473154?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050af10 sp=0x400050aef0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x400050afb0 sp=0x400050af10 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050afd0 sp=0x400050afb0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050afd0 sp=0x400050afd0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 23 gp=0x4000103a40 m=nil [GC worker (idle)]: runtime.gopark(0xaaaab4e65ee0?, 0x1?, 0x22?, 0x49?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000072f10 sp=0x4000072ef0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x4000072fb0 sp=0x4000072f10 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000072fd0 sp=0x4000072fb0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000072fd0 sp=0x4000072fd0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 5 gp=0x4000003880 m=nil [GC worker (idle)]: runtime.gopark(0x12424907be08?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000075f10 sp=0x4000075ef0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x4000075fb0 sp=0x4000075f10 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000075fd0 sp=0x4000075fb0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000075fd0 sp=0x4000075fd0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x4000504380 m=nil [GC worker (idle)]: runtime.gopark(0x12424907cb28?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000088f10 sp=0x4000088ef0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x4000088fb0 sp=0x4000088f10 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000088fd0 sp=0x4000088fb0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000088fd0 sp=0x4000088fd0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 24 gp=0x4000103c00 m=nil [GC worker (idle)]: runtime.gopark(0x12424907c808?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000073710 sp=0x40000736f0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x40000737b0 sp=0x4000073710 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000737d0 sp=0x40000737b0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000737d0 sp=0x40000737d0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 6 gp=0x4000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x124249530e53?, 0x3?, 0xf2?, 0xa9?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000076710 sp=0x40000766f0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x40000767b0 sp=0x4000076710 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000767d0 sp=0x40000767b0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000767d0 sp=0x40000767d0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x4000504540 m=nil [GC worker (idle)]: runtime.gopark(0x12424907ca68?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050bf10 sp=0x400050bef0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x400050bfb0 sp=0x400050bf10 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050bfd0 sp=0x400050bfb0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050bfd0 sp=0x400050bfd0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x4000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x124249594281?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000076f10 sp=0x4000076ef0 pc=0xaaaab32382b8 runtime.gcBgMarkWorker(0x4000119730) runtime/mgc.go:1423 +0xdc fp=0x4000076fb0 sp=0x4000076f10 pc=0xaaaab31e4e1c runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000076fd0 sp=0x4000076fb0 pc=0xaaaab31e4d08 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000076fd0 sp=0x4000076fd0 pc=0xaaaab323fc34 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x4000582e00 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0xaaaab4dc0160?, 0xaaaab4304532?, 0xc0?, 0x40?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008db30 sp=0x400008db10 pc=0xaaaab32382b8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0x400022f198, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x204 fp=0x400008db80 sp=0x400008db30 pc=0xaaaab3218474 sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x2c fp=0x400008dbc0 sp=0x400008db80 pc=0xaaaab3239d5c sync.(*WaitGroup).Wait(0x400022f190) sync/waitgroup.go:118 +0x70 fp=0x400008dbe0 sp=0x400008dbc0 pc=0xaaaab324b1f0 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x400022f0e0, {0xaaaab454d1c0, 0x40005215e0}) github.com/ollama/ollama/runner/ollamarunner/runner.go:407 +0x38 fp=0x400008dfa0 sp=0x400008dbe0 pc=0xaaaab36b1cc8 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1313 +0x30 fp=0x400008dfd0 sp=0x400008dfa0 pc=0xaaaab36b96b0 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008dfd0 sp=0x400008dfd0 pc=0xaaaab323fc34 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1313 +0x470 goroutine 38 gp=0x40004ae380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000506580 sp=0x4000506560 pc=0xaaaab32382b8 runtime.netpollblock(0x0?, 0xffffffff?, 0xff?) runtime/netpoll.go:575 +0x158 fp=0x40005065c0 sp=0x4000506580 pc=0xaaaab31fcc28 internal/poll.runtime_pollWait(0xffff398f7cc8, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x40005065f0 sp=0x40005065c0 pc=0xaaaab3237470 internal/poll.(*pollDesc).wait(0x40000e2600?, 0x4000590041?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000506620 sp=0x40005065f0 pc=0xaaaab32b9068 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x40000e2600, {0x4000590041, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1fc fp=0x40005066c0 sp=0x4000506620 pc=0xaaaab32ba31c net.(*netFD).Read(0x40000e2600, {0x4000590041?, 0x0?, 0x0?}) net/fd_posix.go:55 +0x28 fp=0x4000506710 sp=0x40005066c0 pc=0xaaaab332a968 net.(*conn).Read(0x4000122a08, {0x4000590041?, 0x0?, 0x0?}) net/net.go:194 +0x34 fp=0x4000506760 sp=0x4000506710 pc=0xaaaab33380a4 net/http.(*connReader).backgroundRead(0x4000590030) net/http/server.go:690 +0x40 fp=0x40005067b0 sp=0x4000506760 pc=0xaaaab34e9310 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x28 fp=0x40005067d0 sp=0x40005067b0 pc=0xaaaab34e91f8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40005067d0 sp=0x40005067d0 pc=0xaaaab323fc34 created by net/http.(*connReader).startBackgroundRead in goroutine 9 net/http/server.go:686 +0xc4 r0 0x0 r1 0xa0 r2 0x6 r3 0xffff32fdf0c0 r4 0xffff8199cb58 r5 0xffff32fddc20 r6 0xfffffff8 r7 0xffff32fddc00 r8 0x83 r9 0x0 r10 0x6e6f6974616c7563 r11 0x101010101010101 r12 0x746977735f747874 r13 0x0 r14 0xf0b07021a1e0d06 r15 0x30 r16 0x1 r17 0xffff8144704c r18 0x2 r19 0xa0 r20 0xffff32fdf0c0 r21 0x6 r22 0xfffefffef000 r23 0x0 r24 0xffff147c99a8 r25 0x4000063e78 r26 0xaaaab4536da0 r27 0x10 r28 0x40006021c0 r29 0xffff32fddb10 lr 0xffff814a1ff4 sp 0xffff32fdda80 pc 0xffff814a2008 fault 0x0 time=2025-10-07T13:42:12.503Z level=INFO source=sched.go:448 msg="Load failed" model=/ollama/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 error="do load request: Post \"http://127.0.0.1:42003/load\": EOF" time=2025-10-07T13:42:12.505Z level=ERROR source=server.go:421 msg="llama runner terminated" error="exit status 2" [GIN] 2025/10/07 - 13:42:12 | 500 | 2.086801712s | 127.0.0.1 | POST  "/api/generate"
Author
Owner

@dhiltgen commented on GitHub (Oct 7, 2025):

One of the fixes in 0.12.4 is switching from using incorrectly reported free VRAM in the CUDA APIs for iGPUs and instead using overall system available memory. Based on the logs, we believe this system has a large amount of available memory. However when we tried to initialize cublas, which should be a fairly small allocation, it fails. Most of what the cublasCreate call does is initialize the GPU, so perhaps the GPU is busy or having trouble on the system? Do you see any interesting logs in dmesg or /var/log/* which might provide more details?

@kentsuiGitHub did 0.12.3 (or a prior release) work on your system? If so, did we fall back to CPU, or did we load on the GPU?

<!-- gh-comment-id:3378376761 --> @dhiltgen commented on GitHub (Oct 7, 2025): One of the fixes in 0.12.4 is switching from using incorrectly reported free VRAM in the CUDA APIs for iGPUs and instead using overall system available memory. Based on the logs, we believe this system has a large amount of available memory. However when we tried to initialize cublas, which should be a fairly small allocation, it fails. Most of what the cublasCreate call does is initialize the GPU, so perhaps the GPU is busy or having trouble on the system? Do you see any interesting logs in `dmesg` or `/var/log/*` which might provide more details? @kentsuiGitHub did 0.12.3 (or a prior release) work on your system? If so, did we fall back to CPU, or did we load on the GPU?
Author
Owner

@dhiltgen commented on GitHub (Oct 7, 2025):

I was mistaken. The problem here is we're loading the wrong library - we're loading cuda_v12 but we should be loading cuda_jetpack6. I'll try to see if I can repro and find the root cause, but if you can run with OLLAMA_DEBUG=2 that will increase the verbosity for the discovery process.

<!-- gh-comment-id:3378741714 --> @dhiltgen commented on GitHub (Oct 7, 2025): I was mistaken. The problem here is we're loading the wrong library - we're loading cuda_v12 but we should be loading cuda_jetpack6. I'll try to see if I can repro and find the root cause, but if you can run with `OLLAMA_DEBUG=2` that will increase the verbosity for the discovery process.
Author
Owner

@kentsuiGitHub commented on GitHub (Oct 8, 2025):

One of the fixes in 0.12.4 is switching from using incorrectly reported free VRAM in the CUDA APIs for iGPUs and instead using overall system available memory. Based on the logs, we believe this system has a large amount of available memory. However when we tried to initialize cublas, which should be a fairly small allocation, it fails. Most of what the cublasCreate call does is initialize the GPU, so perhaps the GPU is busy or having trouble on the system? Do you see any interesting logs in dmesg or /var/log/* which might provide more details?

@kentsuiGitHub did 0.12.3 (or a prior release) work on your system? If so, did we fall back to CPU, or did we load on the GPU?

Yes, it is working on 0.12.3

<!-- gh-comment-id:3379152008 --> @kentsuiGitHub commented on GitHub (Oct 8, 2025): > One of the fixes in 0.12.4 is switching from using incorrectly reported free VRAM in the CUDA APIs for iGPUs and instead using overall system available memory. Based on the logs, we believe this system has a large amount of available memory. However when we tried to initialize cublas, which should be a fairly small allocation, it fails. Most of what the cublasCreate call does is initialize the GPU, so perhaps the GPU is busy or having trouble on the system? Do you see any interesting logs in `dmesg` or `/var/log/*` which might provide more details? > > [@kentsuiGitHub](https://github.com/kentsuiGitHub) did 0.12.3 (or a prior release) work on your system? If so, did we fall back to CPU, or did we load on the GPU? Yes, it is working on 0.12.3
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34070