[GH-ISSUE #15418] gemma4 can't be launched #56370

Open
opened 2026-04-29 10:43:28 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Bastiendsp on GitHub (Apr 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15418

What is the issue?

DGX SPARK latest version

Relevant log output

➜  ollamaOpenUI docker logs ollama -f
time=2026-04-08T10:41:01.103Z level=INFO source=routes.go:1744 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:30 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-04-08T10:41:01.103Z level=INFO source=routes.go:1746 msg="Ollama cloud disabled: false"
time=2026-04-08T10:41:01.172Z level=INFO source=images.go:499 msg="total blobs: 3041"
time=2026-04-08T10:41:01.178Z level=INFO source=images.go:506 msg="total unused blobs removed: 0"
time=2026-04-08T10:41:01.179Z level=INFO source=routes.go:1802 msg="Listening on [::]:11434 (version 0.20.4-rc2)"
time=2026-04-08T10:41:01.179Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-08T10:41:01.180Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42575"
time=2026-04-08T10:41:01.599Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39589"
time=2026-04-08T10:41:02.041Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37319"
time=2026-04-08T10:41:02.041Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35827"
time=2026-04-08T10:41:02.490Z level=INFO source=types.go:42 msg="inference compute" id=GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf filter_id="" library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=000f:01:00.0 type=iGPU total="121.7 GiB" available="116.9 GiB"
time=2026-04-08T10:41:02.490Z level=INFO source=routes.go:1852 msg="vram-based default context" total_vram="121.7 GiB" default_num_ctx=262144
[GIN] 2026/04/08 - 10:41:13 | 200 |    9.567209ms |      172.19.0.3 | GET      "/api/tags"
[GIN] 2026/04/08 - 10:41:13 | 200 |      78.832µs |      172.19.0.3 | GET      "/api/ps"
[GIN] 2026/04/08 - 10:42:28 | 200 |      35.553µs |      172.19.0.3 | GET      "/api/version"
time=2026-04-08T10:42:33.654Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44731"
time=2026-04-08T10:42:34.009Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax"
time=2026-04-08T10:42:34.132Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-08T10:42:34.132Z level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-08T10:42:34.132Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 34097"
time=2026-04-08T10:42:34.133Z level=INFO source=sched.go:484 msg="system memory" total="121.7 GiB" free="121.5 GiB" free_swap="15.8 GiB"
time=2026-04-08T10:42:34.133Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf library=CUDA available="115.8 GiB" free="116.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-08T10:42:34.133Z level=INFO source=server.go:771 msg="loading model" "model layers"=61 requested=-1
time=2026-04-08T10:42:34.141Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-08T10:42:34.141Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:34097"
time=2026-04-08T10:42:34.145Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:30 BatchSize:512 FlashAttention:Enabled KvSize:7864320 KvCacheType: NumThreads:20 GPULayers:61[ID:GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf Layers:61(0..60)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-08T10:42:34.201Z level=INFO source=ggml.go:136 msg="" architecture=gemma4 file_type=Q4_K_M name="" description="" num_tensors=1189 num_key_values=49
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, ID: GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-04-08T10:42:34.480Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-04-08T10:42:34.484Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-08T10:42:34.560Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.207893ms bounds=(0,0)-(2048,2048)
time=2026-04-08T10:42:34.633Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=72.125359ms size="[768 768]"
time=2026-04-08T10:42:34.633Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-08T10:42:34.633Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-08T10:42:34.633Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=73.994343ms shape="[5376 256]"
/ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
/usr/lib/ollama/libggml-base.so.0(+0x23070)[0xe7e4f0103070]
/usr/lib/ollama/libggml-base.so.0(ggml_print_backtrace+0x268)[0xe7e4f010304c]
/usr/lib/ollama/libggml-base.so.0(ggml_abort+0xe0)[0xe7e4f0101fe0]
/usr/lib/ollama/cuda_v13/libggml-cuda.so(_Z13ggml_cuda_cpyR25ggml_backend_cuda_contextPK11ggml_tensorPS1_+0x2e84)[0xe7e49c4a2cdc]
/usr/lib/ollama/cuda_v13/libggml-cuda.so(+0x11cefc)[0xe7e49c4ecefc]
/usr/lib/ollama/cuda_v13/libggml-cuda.so(+0x11e77c)[0xe7e49c4ee77c]
/usr/bin/ollama(+0x135c3d8)[0xbd307e1fc3d8]
/usr/bin/ollama(+0x12cdeb8)[0xbd307e16deb8]
/usr/bin/ollama(+0x40038c)[0xbd307d2a038c]
SIGABRT: abort
PC=0xe7e5389f7608 m=20 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 28 gp=0x4000103340 m=20 mp=0x4025780008 [syscall]:
runtime.cgocall(0xbd307e16de90, 0x40000470a8)
        runtime/cgocall.go:167 +0x44 fp=0x4000047060 sp=0x4000047020 pc=0xbd307d294df4
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xe7e4b50c0b00, 0xe7e25f08d480)
        _cgo_gotypes.go:1012 +0x34 fp=0x40000470a0 sp=0x4000047060 pc=0xbd307d733814
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x40020a2040)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0xe0 fp=0x4000047330 sp=0x40000470a0 pc=0xbd307d73e3e0
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x40005c8000, 0x1)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1169 +0x834 fp=0x4000047660 sp=0x4000047330 pc=0xbd307d8410b4
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x40005c8000, {0xffffcfc57dec?, 0x0?}, {0x0, 0x14, {0x40000b9100, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1232 +0x2e4 fp=0x4000047710 sp=0x4000047660 pc=0xbd307d841784
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x40005c8000, {0xbd307eac9d60, 0x400069b500}, 0x40006c6140)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1317 +0x460 fp=0x4000047aa0 sp=0x4000047710 pc=0xbd307d842150
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xbd307eac9d60?, 0x400069b500?}, 0x40005f5b28?)
        <autogenerated>:1 +0x40 fp=0x4000047ad0 sp=0x4000047aa0 pc=0xbd307d844000
net/http.HandlerFunc.ServeHTTP(0x40001c0540?, {0xbd307eac9d60?, 0x400069b500?}, 0x40005f5b10?)
        net/http/server.go:2294 +0x38 fp=0x4000047b00 sp=0x4000047ad0 pc=0xbd307d5605e8
net/http.(*ServeMux).ServeHTTP(0x10?, {0xbd307eac9d60, 0x400069b500}, 0x40006c6140)
        net/http/server.go:2822 +0x1b4 fp=0x4000047b50 sp=0x4000047b00 pc=0xbd307d562174
net/http.serverHandler.ServeHTTP({0xbd307eac5dd0?}, {0xbd307eac9d60?, 0x400069b500?}, 0x1?)
        net/http/server.go:3301 +0xbc fp=0x4000047b80 sp=0x4000047b50 pc=0xbd307d57de5c
net/http.(*conn).serve(0x400059a5a0, {0xbd307eacc5c8, 0x4000585c80})
        net/http/server.go:2102 +0x52c fp=0x4000047fa0 sp=0x4000047b80 pc=0xbd307d55ed8c
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3454 +0x30 fp=0x4000047fd0 sp=0x4000047fa0 pc=0xbd307d563f50
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000047fd0 sp=0x4000047fd0 pc=0xbd307d2a0594
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3454 +0x3d8

goroutine 1 gp=0x40000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400012f710 sp=0x400012f6f0 pc=0xbd307d298308
runtime.netpollblock(0x7000000000?, 0x6?, 0x0?)
        runtime/netpoll.go:575 +0x158 fp=0x400012f750 sp=0x400012f710 pc=0xbd307d25d0f8
internal/poll.runtime_pollWait(0xe7e4f05f7de0, 0x72)
        runtime/netpoll.go:351 +0xa0 fp=0x400012f780 sp=0x400012f750 pc=0xbd307d2974c0
internal/poll.(*pollDesc).wait(0x4000586900?, 0xbd307d320d98?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x400012f7b0 sp=0x400012f780 pc=0xbd307d31a338
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x4000586900)
        internal/poll/fd_unix.go:620 +0x24c fp=0x400012f860 sp=0x400012f7b0 pc=0xbd307d31ec0c
net.(*netFD).accept(0x4000586900)
        net/fd_unix.go:172 +0x28 fp=0x400012f920 sp=0x400012f860 pc=0xbd307d38db48
net.(*TCPListener).accept(0x4000646a40)
        net/tcpsock_posix.go:159 +0x24 fp=0x400012f970 sp=0x400012f920 pc=0xbd307d3a2fe4
net.(*TCPListener).Accept(0x4000646a40)
        net/tcpsock.go:380 +0x2c fp=0x400012f9b0 sp=0x400012f970 pc=0xbd307d3a1f7c
net/http.(*onceCloseListener).Accept(0x400059a5a0?)
        <autogenerated>:1 +0x30 fp=0x400012f9d0 sp=0x400012f9b0 pc=0xbd307d58a480
net/http.(*Server).Serve(0x40000c5400, {0xbd307eac9b80, 0x4000646a40})
        net/http/server.go:3424 +0x290 fp=0x400012fb00 sp=0x400012f9d0 pc=0xbd307d563bc0
github.com/ollama/ollama/runner/ollamarunner.Execute({0x40001a8030, 0x4, 0x4})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1453 +0x7fc fp=0x400012fcd0 sp=0x400012fb00 pc=0xbd307d843a2c
github.com/ollama/ollama/runner.Execute({0x40001a8010?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:18 +0x14c fp=0x400012fd10 sp=0x400012fcd0 pc=0xbd307d8d72dc
github.com/ollama/ollama/cmd.NewCLI.func3(0x40000c5100?, {0xbd307e481262?, 0x4?, 0xbd307e481266?})
        github.com/ollama/ollama/cmd/cmd.go:2267 +0x54 fp=0x400012fd40 sp=0x400012fd10 pc=0xbd307dff06d4
github.com/spf13/cobra.(*Command).execute(0x40005a3208, {0x40001f1270, 0x5, 0x5})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x400012fe60 sp=0x400012fd40 pc=0xbd307d3fd848
github.com/spf13/cobra.(*Command).ExecuteC(0x400059e008)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x400012ff20 sp=0x400012fe60 pc=0xbd307d3fdf90
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x54 fp=0x400012ff40 sp=0x400012ff20 pc=0xbd307dff1e54
runtime.main()
        runtime/proc.go:283 +0x284 fp=0x400012ffd0 sp=0x400012ff40 pc=0xbd307d2644a4
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400012ffd0 sp=0x400012ffd0 pc=0xbd307d2a0594

goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008af90 sp=0x400008af70 pc=0xbd307d298308
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0x400008afd0 sp=0x400008af90 pc=0xbd307d2647f8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008afd0 sp=0x400008afd0 pc=0xbd307d2a0594
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x24

goroutine 3 gp=0x4000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008b760 sp=0x400008b740 pc=0xbd307d298308
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0x40000b6000)
        runtime/mgcsweep.go:316 +0x108 fp=0x400008b7b0 sp=0x400008b760 pc=0xbd307d24f028
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x28 fp=0x400008b7d0 sp=0x400008b7b0 pc=0xbd307d242e58
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008b7d0 sp=0x400008b7d0 pc=0xbd307d2a0594
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x4000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0xbd307e6a84d0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008bf60 sp=0x400008bf40 pc=0xbd307d298308
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0xbd307f4f3f40)
        runtime/mgcscavenge.go:425 +0x5c fp=0x400008bf90 sp=0x400008bf60 pc=0xbd307d24caec
runtime.bgscavenge(0x40000b6000)
        runtime/mgcscavenge.go:658 +0xac fp=0x400008bfb0 sp=0x400008bf90 pc=0xbd307d24d06c
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x28 fp=0x400008bfd0 sp=0x400008bfb0 pc=0xbd307d242df8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008bfd0 sp=0x400008bfd0 pc=0xbd307d2a0594
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xac

goroutine 18 gp=0x4000186380 m=nil [finalizer wait]:
runtime.gopark(0x18000001b8?, 0xe7e4f05ef468?, 0x78?, 0xa?, 0x1c0?)
        runtime/proc.go:435 +0xc8 fp=0x400008a590 sp=0x400008a570 pc=0xbd307d298308
runtime.runfinq()
        runtime/mfinal.go:196 +0x108 fp=0x400008a7d0 sp=0x400008a590 pc=0xbd307d241e58
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008a7d0 sp=0x400008a7d0 pc=0xbd307d2a0594
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x80

goroutine 19 gp=0x4000186e00 m=nil [chan receive]:
runtime.gopark(0x40001f5b80?, 0x40001e4180?, 0x48?, 0x7f?, 0xbd307d366068?)
        runtime/proc.go:435 +0xc8 fp=0x40005e7ef0 sp=0x40005e7ed0 pc=0xbd307d298308
runtime.chanrecv(0x4000182310, 0x0, 0x1)
        runtime/chan.go:664 +0x42c fp=0x40005e7f70 sp=0x40005e7ef0 pc=0xbd307d233dec
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:506 +0x14 fp=0x40005e7fa0 sp=0x40005e7f70 pc=0xbd307d233984
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x3c fp=0x40005e7fd0 sp=0x40005e7fa0 pc=0xbd307d24607c
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40005e7fd0 sp=0x40005e7fd0 pc=0xbd307d2a0594
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x78

goroutine 20 gp=0x4000187180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000086f10 sp=0x4000086ef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x4000086fb0 sp=0x4000086f10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000086fd0 sp=0x4000086fb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000086fd0 sp=0x4000086fd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 34 gp=0x4000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf7917235f6?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011a710 sp=0x400011a6f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400011a7b0 sp=0x400011a710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011a7d0 sp=0x400011a7b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011a7d0 sp=0x400011a7d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 5 gp=0x4000003880 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf79171b486?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008c710 sp=0x400008c6f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400008c7b0 sp=0x400008c710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008c7d0 sp=0x400008c7b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008c7d0 sp=0x400008c7d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 21 gp=0x4000187340 m=nil [GC worker (idle)]:
runtime.gopark(0xbd307f5d0160?, 0x1?, 0xb1?, 0x92?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000087710 sp=0x40000876f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x40000877b0 sp=0x4000087710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000877d0 sp=0x40000877b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000877d0 sp=0x40000877d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 22 gp=0x4000187500 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf79172b9f6?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000087f10 sp=0x4000087ef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x4000087fb0 sp=0x4000087f10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000087fd0 sp=0x4000087fb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000087fd0 sp=0x4000087fd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 35 gp=0x4000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf79172b9a6?, 0x3?, 0x85?, 0x3?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011af10 sp=0x400011aef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400011afb0 sp=0x400011af10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011afd0 sp=0x400011afb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011afd0 sp=0x400011afd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 6 gp=0x4000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf79172a476?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008cf10 sp=0x400008cef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400008cfb0 sp=0x400008cf10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008cfd0 sp=0x400008cfb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008cfd0 sp=0x400008cfd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 23 gp=0x40001876c0 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791714016?, 0x3?, 0xe8?, 0x84?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000088710 sp=0x40000886f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x40000887b0 sp=0x4000088710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000887d0 sp=0x40000887b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000887d0 sp=0x40000887d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x4000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791731486?, 0x3?, 0x52?, 0x2a?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011b710 sp=0x400011b6f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400011b7b0 sp=0x400011b710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011b7d0 sp=0x400011b7b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011b7d0 sp=0x400011b7d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 7 gp=0x4000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf79172ed56?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008d710 sp=0x400008d6f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400008d7b0 sp=0x400008d710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008d7d0 sp=0x400008d7b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008d7d0 sp=0x400008d7d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 24 gp=0x4000187880 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791725ba6?, 0x3?, 0xe8?, 0x1?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000088f10 sp=0x4000088ef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x4000088fb0 sp=0x4000088f10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000088fd0 sp=0x4000088fb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000088fd0 sp=0x4000088fd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x40001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791728e86?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011bf10 sp=0x400011bef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400011bfb0 sp=0x400011bf10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011bfd0 sp=0x400011bfb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011bfd0 sp=0x400011bfd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x4000003dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791726f36?, 0x3?, 0x30?, 0xe0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008df10 sp=0x400008def0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400008dfb0 sp=0x400008df10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008dfd0 sp=0x400008dfb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008dfd0 sp=0x400008dfd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 9 gp=0x40000c6000 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf7917b1a09?, 0x1?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000116710 sp=0x40001166f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x40001167b0 sp=0x4000116710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40001167d0 sp=0x40001167b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40001167d0 sp=0x40001167d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 25 gp=0x4000187a40 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791735876?, 0x1?, 0x2b?, 0xc5?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000089710 sp=0x40000896f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x40000897b0 sp=0x4000089710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000897d0 sp=0x40000897b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000897d0 sp=0x40000897d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 38 gp=0x4000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf7917ad7f8?, 0x3?, 0xea?, 0xf9?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011c710 sp=0x400011c6f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400011c7b0 sp=0x400011c710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011c7d0 sp=0x400011c7b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011c7d0 sp=0x400011c7d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 10 gp=0x40000c61c0 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791731246?, 0x1?, 0xbd?, 0x17?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000116f10 sp=0x4000116ef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x4000116fb0 sp=0x4000116f10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000116fd0 sp=0x4000116fb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000116fd0 sp=0x4000116fd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 26 gp=0x4000187c00 m=nil [GC worker (idle)]:
runtime.gopark(0xbd307f5d0160?, 0x1?, 0xd0?, 0xc9?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000089f10 sp=0x4000089ef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x4000089fb0 sp=0x4000089f10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000089fd0 sp=0x4000089fb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000089fd0 sp=0x4000089fd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 39 gp=0x4000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf79172dbb6?, 0x3?, 0x51?, 0x8a?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011cf10 sp=0x400011cef0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x400011cfb0 sp=0x400011cf10 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011cfd0 sp=0x400011cfb0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011cfd0 sp=0x400011cfd0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 11 gp=0x40000c6380 m=nil [GC worker (idle)]:
runtime.gopark(0x9bf791707046?, 0x3?, 0x37?, 0xb7?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000117710 sp=0x40001176f0 pc=0xbd307d298308
runtime.gcBgMarkWorker(0x4000183570)
        runtime/mgc.go:1423 +0xdc fp=0x40001177b0 sp=0x4000117710 pc=0xbd307d2452ec
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40001177d0 sp=0x40001177b0 pc=0xbd307d2451d8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40001177d0 sp=0x40001177d0 pc=0xbd307d2a0594
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 27 gp=0x4000103180 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0xbd307f50cf80?, 0x0?, 0x20?, 0x81?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009da90 sp=0x400009da70 pc=0xbd307d298308
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0x40005c80b8, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x204 fp=0x400009dae0 sp=0x400009da90 pc=0xbd307d278944
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x2c fp=0x400009db20 sp=0x400009dae0 pc=0xbd307d299dac
sync.(*WaitGroup).Wait(0x40005c80b0)
        sync/waitgroup.go:118 +0x70 fp=0x400009db40 sp=0x400009db20 pc=0xbd307d2abc70
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x40005c8000, {0xbd307eacc600, 0x40001f1360})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:442 +0x38 fp=0x400009dfa0 sp=0x400009db40 pc=0xbd307d83bd38
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1430 +0x30 fp=0x400009dfd0 sp=0x400009dfa0 pc=0xbd307d843c50
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009dfd0 sp=0x400009dfd0 pc=0xbd307d2a0594
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1430 +0x448

goroutine 12 gp=0x4000582e00 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40005d1d80 sp=0x40005d1d60 pc=0xbd307d298308
runtime.netpollblock(0x0?, 0xffffffff?, 0xff?)
        runtime/netpoll.go:575 +0x158 fp=0x40005d1dc0 sp=0x40005d1d80 pc=0xbd307d25d0f8
internal/poll.runtime_pollWait(0xe7e4f05f7cc8, 0x72)
        runtime/netpoll.go:351 +0xa0 fp=0x40005d1df0 sp=0x40005d1dc0 pc=0xbd307d2974c0
internal/poll.(*pollDesc).wait(0x4000586980?, 0x4000585d81?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40005d1e20 sp=0x40005d1df0 pc=0xbd307d31a338
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x4000586980, {0x4000585d81, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x1fc fp=0x40005d1ec0 sp=0x40005d1e20 pc=0xbd307d31b5ec
net.(*netFD).Read(0x4000586980, {0x4000585d81?, 0x0?, 0x0?})
        net/fd_posix.go:55 +0x28 fp=0x40005d1f10 sp=0x40005d1ec0 pc=0xbd307d38c118
net.(*conn).Read(0x400008e1d0, {0x4000585d81?, 0x0?, 0x0?})
        net/net.go:194 +0x34 fp=0x40005d1f60 sp=0x40005d1f10 pc=0xbd307d399854
net/http.(*connReader).backgroundRead(0x4000585d70)
        net/http/server.go:690 +0x40 fp=0x40005d1fb0 sp=0x40005d1f60 pc=0xbd307d559700
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x28 fp=0x40005d1fd0 sp=0x40005d1fb0 pc=0xbd307d5595e8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40005d1fd0 sp=0x40005d1fd0 pc=0xbd307d2a0594
created by net/http.(*connReader).startBackgroundRead in goroutine 28
        net/http/server.go:686 +0xc4

r0      0x0
r1      0x6b
r2      0x6
r3      0xe7e224dd9140
r4      0xe7e538faeb50
r5      0x1
r6      0x20
r7      0xe7e224dd78d0
r8      0x83
r9      0x0
r10     0xa
r11     0x101010101010101
r12     0xe7e224dd7960
r13     0x0
r14     0x0
r15     0x1
r16     0x1
r17     0xe7e538997d0c
r18     0xe7e49c88c368
r19     0x6b
r20     0xe7e224dd9140
r21     0x6
r22     0xe7e49c973728
r23     0xe7e224dd8690
r24     0xe7e25f2b9290
r25     0xe7e4b50c3298
r26     0xe7e25f2b9290
r27     0xe7e22a2638d0
r28     0xee
r29     0xe7e224dd7860
lr      0xe7e5389f75f4
sp      0xe7e224dd7850
pc      0xe7e5389f7608
fault   0x0
time=2026-04-08T10:43:17.652Z level=ERROR source=server.go:1219 msg="do load request" error="Post \"http://127.0.0.1:34097/load\": EOF"
time=2026-04-08T10:43:17.652Z level=ERROR source=server.go:1219 msg="do load request" error="Post \"http://127.0.0.1:34097/load\": dial tcp 127.0.0.1:34097: connect: connection refused"
time=2026-04-08T10:43:17.652Z level=INFO source=sched.go:511 msg="Load failed" model=/root/.ollama/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-04-08T10:43:17.731Z level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
[GIN] 2026/04/08 - 10:43:17 | 500 | 44.283491447s |      172.19.0.3 | POST     "/api/chat"

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.20.3

Originally created by @Bastiendsp on GitHub (Apr 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15418 ### What is the issue? DGX SPARK latest version ### Relevant log output ```shell ➜ ollamaOpenUI docker logs ollama -f time=2026-04-08T10:41:01.103Z level=INFO source=routes.go:1744 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:30 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-04-08T10:41:01.103Z level=INFO source=routes.go:1746 msg="Ollama cloud disabled: false" time=2026-04-08T10:41:01.172Z level=INFO source=images.go:499 msg="total blobs: 3041" time=2026-04-08T10:41:01.178Z level=INFO source=images.go:506 msg="total unused blobs removed: 0" time=2026-04-08T10:41:01.179Z level=INFO source=routes.go:1802 msg="Listening on [::]:11434 (version 0.20.4-rc2)" time=2026-04-08T10:41:01.179Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-08T10:41:01.180Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42575" time=2026-04-08T10:41:01.599Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39589" time=2026-04-08T10:41:02.041Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37319" time=2026-04-08T10:41:02.041Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35827" time=2026-04-08T10:41:02.490Z level=INFO source=types.go:42 msg="inference compute" id=GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf filter_id="" library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=000f:01:00.0 type=iGPU total="121.7 GiB" available="116.9 GiB" time=2026-04-08T10:41:02.490Z level=INFO source=routes.go:1852 msg="vram-based default context" total_vram="121.7 GiB" default_num_ctx=262144 [GIN] 2026/04/08 - 10:41:13 | 200 | 9.567209ms | 172.19.0.3 | GET "/api/tags" [GIN] 2026/04/08 - 10:41:13 | 200 | 78.832µs | 172.19.0.3 | GET "/api/ps" [GIN] 2026/04/08 - 10:42:28 | 200 | 35.553µs | 172.19.0.3 | GET "/api/version" time=2026-04-08T10:42:33.654Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44731" time=2026-04-08T10:42:34.009Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" time=2026-04-08T10:42:34.132Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-08T10:42:34.132Z level=INFO source=server.go:259 msg="enabling flash attention" time=2026-04-08T10:42:34.132Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 34097" time=2026-04-08T10:42:34.133Z level=INFO source=sched.go:484 msg="system memory" total="121.7 GiB" free="121.5 GiB" free_swap="15.8 GiB" time=2026-04-08T10:42:34.133Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf library=CUDA available="115.8 GiB" free="116.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-08T10:42:34.133Z level=INFO source=server.go:771 msg="loading model" "model layers"=61 requested=-1 time=2026-04-08T10:42:34.141Z level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-08T10:42:34.141Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:34097" time=2026-04-08T10:42:34.145Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:30 BatchSize:512 FlashAttention:Enabled KvSize:7864320 KvCacheType: NumThreads:20 GPULayers:61[ID:GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf Layers:61(0..60)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-08T10:42:34.201Z level=INFO source=ggml.go:136 msg="" architecture=gemma4 file_type=Q4_K_M name="" description="" num_tensors=1189 num_key_values=49 load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, ID: GPU-a20637ba-815b-7e34-0ad0-6433b297b7cf load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-04-08T10:42:34.480Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-04-08T10:42:34.484Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-08T10:42:34.560Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.207893ms bounds=(0,0)-(2048,2048) time=2026-04-08T10:42:34.633Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=72.125359ms size="[768 768]" time=2026-04-08T10:42:34.633Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-08T10:42:34.633Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-08T10:42:34.633Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=73.994343ms shape="[5376 256]" /ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed /usr/lib/ollama/libggml-base.so.0(+0x23070)[0xe7e4f0103070] /usr/lib/ollama/libggml-base.so.0(ggml_print_backtrace+0x268)[0xe7e4f010304c] /usr/lib/ollama/libggml-base.so.0(ggml_abort+0xe0)[0xe7e4f0101fe0] /usr/lib/ollama/cuda_v13/libggml-cuda.so(_Z13ggml_cuda_cpyR25ggml_backend_cuda_contextPK11ggml_tensorPS1_+0x2e84)[0xe7e49c4a2cdc] /usr/lib/ollama/cuda_v13/libggml-cuda.so(+0x11cefc)[0xe7e49c4ecefc] /usr/lib/ollama/cuda_v13/libggml-cuda.so(+0x11e77c)[0xe7e49c4ee77c] /usr/bin/ollama(+0x135c3d8)[0xbd307e1fc3d8] /usr/bin/ollama(+0x12cdeb8)[0xbd307e16deb8] /usr/bin/ollama(+0x40038c)[0xbd307d2a038c] SIGABRT: abort PC=0xe7e5389f7608 m=20 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 28 gp=0x4000103340 m=20 mp=0x4025780008 [syscall]: runtime.cgocall(0xbd307e16de90, 0x40000470a8) runtime/cgocall.go:167 +0x44 fp=0x4000047060 sp=0x4000047020 pc=0xbd307d294df4 github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xe7e4b50c0b00, 0xe7e25f08d480) _cgo_gotypes.go:1012 +0x34 fp=0x40000470a0 sp=0x4000047060 pc=0xbd307d733814 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x40020a2040) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0xe0 fp=0x4000047330 sp=0x40000470a0 pc=0xbd307d73e3e0 github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x40005c8000, 0x1) github.com/ollama/ollama/runner/ollamarunner/runner.go:1169 +0x834 fp=0x4000047660 sp=0x4000047330 pc=0xbd307d8410b4 github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x40005c8000, {0xffffcfc57dec?, 0x0?}, {0x0, 0x14, {0x40000b9100, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1232 +0x2e4 fp=0x4000047710 sp=0x4000047660 pc=0xbd307d841784 github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x40005c8000, {0xbd307eac9d60, 0x400069b500}, 0x40006c6140) github.com/ollama/ollama/runner/ollamarunner/runner.go:1317 +0x460 fp=0x4000047aa0 sp=0x4000047710 pc=0xbd307d842150 github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xbd307eac9d60?, 0x400069b500?}, 0x40005f5b28?) <autogenerated>:1 +0x40 fp=0x4000047ad0 sp=0x4000047aa0 pc=0xbd307d844000 net/http.HandlerFunc.ServeHTTP(0x40001c0540?, {0xbd307eac9d60?, 0x400069b500?}, 0x40005f5b10?) net/http/server.go:2294 +0x38 fp=0x4000047b00 sp=0x4000047ad0 pc=0xbd307d5605e8 net/http.(*ServeMux).ServeHTTP(0x10?, {0xbd307eac9d60, 0x400069b500}, 0x40006c6140) net/http/server.go:2822 +0x1b4 fp=0x4000047b50 sp=0x4000047b00 pc=0xbd307d562174 net/http.serverHandler.ServeHTTP({0xbd307eac5dd0?}, {0xbd307eac9d60?, 0x400069b500?}, 0x1?) net/http/server.go:3301 +0xbc fp=0x4000047b80 sp=0x4000047b50 pc=0xbd307d57de5c net/http.(*conn).serve(0x400059a5a0, {0xbd307eacc5c8, 0x4000585c80}) net/http/server.go:2102 +0x52c fp=0x4000047fa0 sp=0x4000047b80 pc=0xbd307d55ed8c net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x30 fp=0x4000047fd0 sp=0x4000047fa0 pc=0xbd307d563f50 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000047fd0 sp=0x4000047fd0 pc=0xbd307d2a0594 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x3d8 goroutine 1 gp=0x40000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400012f710 sp=0x400012f6f0 pc=0xbd307d298308 runtime.netpollblock(0x7000000000?, 0x6?, 0x0?) runtime/netpoll.go:575 +0x158 fp=0x400012f750 sp=0x400012f710 pc=0xbd307d25d0f8 internal/poll.runtime_pollWait(0xe7e4f05f7de0, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x400012f780 sp=0x400012f750 pc=0xbd307d2974c0 internal/poll.(*pollDesc).wait(0x4000586900?, 0xbd307d320d98?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x400012f7b0 sp=0x400012f780 pc=0xbd307d31a338 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x4000586900) internal/poll/fd_unix.go:620 +0x24c fp=0x400012f860 sp=0x400012f7b0 pc=0xbd307d31ec0c net.(*netFD).accept(0x4000586900) net/fd_unix.go:172 +0x28 fp=0x400012f920 sp=0x400012f860 pc=0xbd307d38db48 net.(*TCPListener).accept(0x4000646a40) net/tcpsock_posix.go:159 +0x24 fp=0x400012f970 sp=0x400012f920 pc=0xbd307d3a2fe4 net.(*TCPListener).Accept(0x4000646a40) net/tcpsock.go:380 +0x2c fp=0x400012f9b0 sp=0x400012f970 pc=0xbd307d3a1f7c net/http.(*onceCloseListener).Accept(0x400059a5a0?) <autogenerated>:1 +0x30 fp=0x400012f9d0 sp=0x400012f9b0 pc=0xbd307d58a480 net/http.(*Server).Serve(0x40000c5400, {0xbd307eac9b80, 0x4000646a40}) net/http/server.go:3424 +0x290 fp=0x400012fb00 sp=0x400012f9d0 pc=0xbd307d563bc0 github.com/ollama/ollama/runner/ollamarunner.Execute({0x40001a8030, 0x4, 0x4}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1453 +0x7fc fp=0x400012fcd0 sp=0x400012fb00 pc=0xbd307d843a2c github.com/ollama/ollama/runner.Execute({0x40001a8010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:18 +0x14c fp=0x400012fd10 sp=0x400012fcd0 pc=0xbd307d8d72dc github.com/ollama/ollama/cmd.NewCLI.func3(0x40000c5100?, {0xbd307e481262?, 0x4?, 0xbd307e481266?}) github.com/ollama/ollama/cmd/cmd.go:2267 +0x54 fp=0x400012fd40 sp=0x400012fd10 pc=0xbd307dff06d4 github.com/spf13/cobra.(*Command).execute(0x40005a3208, {0x40001f1270, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x400012fe60 sp=0x400012fd40 pc=0xbd307d3fd848 github.com/spf13/cobra.(*Command).ExecuteC(0x400059e008) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x400012ff20 sp=0x400012fe60 pc=0xbd307d3fdf90 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x400012ff40 sp=0x400012ff20 pc=0xbd307dff1e54 runtime.main() runtime/proc.go:283 +0x284 fp=0x400012ffd0 sp=0x400012ff40 pc=0xbd307d2644a4 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400012ffd0 sp=0x400012ffd0 pc=0xbd307d2a0594 goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008af90 sp=0x400008af70 pc=0xbd307d298308 runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0x400008afd0 sp=0x400008af90 pc=0xbd307d2647f8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008afd0 sp=0x400008afd0 pc=0xbd307d2a0594 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x24 goroutine 3 gp=0x4000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008b760 sp=0x400008b740 pc=0xbd307d298308 runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0x40000b6000) runtime/mgcsweep.go:316 +0x108 fp=0x400008b7b0 sp=0x400008b760 pc=0xbd307d24f028 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x28 fp=0x400008b7d0 sp=0x400008b7b0 pc=0xbd307d242e58 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008b7d0 sp=0x400008b7d0 pc=0xbd307d2a0594 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x6c goroutine 4 gp=0x4000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0xbd307e6a84d0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008bf60 sp=0x400008bf40 pc=0xbd307d298308 runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0xbd307f4f3f40) runtime/mgcscavenge.go:425 +0x5c fp=0x400008bf90 sp=0x400008bf60 pc=0xbd307d24caec runtime.bgscavenge(0x40000b6000) runtime/mgcscavenge.go:658 +0xac fp=0x400008bfb0 sp=0x400008bf90 pc=0xbd307d24d06c runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x28 fp=0x400008bfd0 sp=0x400008bfb0 pc=0xbd307d242df8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008bfd0 sp=0x400008bfd0 pc=0xbd307d2a0594 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xac goroutine 18 gp=0x4000186380 m=nil [finalizer wait]: runtime.gopark(0x18000001b8?, 0xe7e4f05ef468?, 0x78?, 0xa?, 0x1c0?) runtime/proc.go:435 +0xc8 fp=0x400008a590 sp=0x400008a570 pc=0xbd307d298308 runtime.runfinq() runtime/mfinal.go:196 +0x108 fp=0x400008a7d0 sp=0x400008a590 pc=0xbd307d241e58 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008a7d0 sp=0x400008a7d0 pc=0xbd307d2a0594 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x80 goroutine 19 gp=0x4000186e00 m=nil [chan receive]: runtime.gopark(0x40001f5b80?, 0x40001e4180?, 0x48?, 0x7f?, 0xbd307d366068?) runtime/proc.go:435 +0xc8 fp=0x40005e7ef0 sp=0x40005e7ed0 pc=0xbd307d298308 runtime.chanrecv(0x4000182310, 0x0, 0x1) runtime/chan.go:664 +0x42c fp=0x40005e7f70 sp=0x40005e7ef0 pc=0xbd307d233dec runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x14 fp=0x40005e7fa0 sp=0x40005e7f70 pc=0xbd307d233984 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x3c fp=0x40005e7fd0 sp=0x40005e7fa0 pc=0xbd307d24607c runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40005e7fd0 sp=0x40005e7fd0 pc=0xbd307d2a0594 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x78 goroutine 20 gp=0x4000187180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000086f10 sp=0x4000086ef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x4000086fb0 sp=0x4000086f10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000086fd0 sp=0x4000086fb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000086fd0 sp=0x4000086fd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x4000102380 m=nil [GC worker (idle)]: runtime.gopark(0x9bf7917235f6?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011a710 sp=0x400011a6f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400011a7b0 sp=0x400011a710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011a7d0 sp=0x400011a7b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011a7d0 sp=0x400011a7d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 5 gp=0x4000003880 m=nil [GC worker (idle)]: runtime.gopark(0x9bf79171b486?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008c710 sp=0x400008c6f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400008c7b0 sp=0x400008c710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008c7d0 sp=0x400008c7b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008c7d0 sp=0x400008c7d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x4000187340 m=nil [GC worker (idle)]: runtime.gopark(0xbd307f5d0160?, 0x1?, 0xb1?, 0x92?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000087710 sp=0x40000876f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x40000877b0 sp=0x4000087710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000877d0 sp=0x40000877b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000877d0 sp=0x40000877d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 22 gp=0x4000187500 m=nil [GC worker (idle)]: runtime.gopark(0x9bf79172b9f6?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000087f10 sp=0x4000087ef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x4000087fb0 sp=0x4000087f10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000087fd0 sp=0x4000087fb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000087fd0 sp=0x4000087fd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x4000102540 m=nil [GC worker (idle)]: runtime.gopark(0x9bf79172b9a6?, 0x3?, 0x85?, 0x3?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011af10 sp=0x400011aef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400011afb0 sp=0x400011af10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011afd0 sp=0x400011afb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011afd0 sp=0x400011afd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 6 gp=0x4000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x9bf79172a476?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008cf10 sp=0x400008cef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400008cfb0 sp=0x400008cf10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008cfd0 sp=0x400008cfb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008cfd0 sp=0x400008cfd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 23 gp=0x40001876c0 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791714016?, 0x3?, 0xe8?, 0x84?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000088710 sp=0x40000886f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x40000887b0 sp=0x4000088710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000887d0 sp=0x40000887b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000887d0 sp=0x40000887d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x4000102700 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791731486?, 0x3?, 0x52?, 0x2a?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011b710 sp=0x400011b6f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400011b7b0 sp=0x400011b710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011b7d0 sp=0x400011b7b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011b7d0 sp=0x400011b7d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 7 gp=0x4000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x9bf79172ed56?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008d710 sp=0x400008d6f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400008d7b0 sp=0x400008d710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008d7d0 sp=0x400008d7b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008d7d0 sp=0x400008d7d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 24 gp=0x4000187880 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791725ba6?, 0x3?, 0xe8?, 0x1?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000088f10 sp=0x4000088ef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x4000088fb0 sp=0x4000088f10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000088fd0 sp=0x4000088fb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000088fd0 sp=0x4000088fd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x40001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791728e86?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011bf10 sp=0x400011bef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400011bfb0 sp=0x400011bf10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011bfd0 sp=0x400011bfb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011bfd0 sp=0x400011bfd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x4000003dc0 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791726f36?, 0x3?, 0x30?, 0xe0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008df10 sp=0x400008def0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400008dfb0 sp=0x400008df10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008dfd0 sp=0x400008dfb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008dfd0 sp=0x400008dfd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 9 gp=0x40000c6000 m=nil [GC worker (idle)]: runtime.gopark(0x9bf7917b1a09?, 0x1?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000116710 sp=0x40001166f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x40001167b0 sp=0x4000116710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40001167d0 sp=0x40001167b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40001167d0 sp=0x40001167d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 25 gp=0x4000187a40 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791735876?, 0x1?, 0x2b?, 0xc5?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000089710 sp=0x40000896f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x40000897b0 sp=0x4000089710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000897d0 sp=0x40000897b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000897d0 sp=0x40000897d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 38 gp=0x4000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x9bf7917ad7f8?, 0x3?, 0xea?, 0xf9?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011c710 sp=0x400011c6f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400011c7b0 sp=0x400011c710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011c7d0 sp=0x400011c7b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011c7d0 sp=0x400011c7d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 10 gp=0x40000c61c0 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791731246?, 0x1?, 0xbd?, 0x17?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000116f10 sp=0x4000116ef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x4000116fb0 sp=0x4000116f10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000116fd0 sp=0x4000116fb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000116fd0 sp=0x4000116fd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 26 gp=0x4000187c00 m=nil [GC worker (idle)]: runtime.gopark(0xbd307f5d0160?, 0x1?, 0xd0?, 0xc9?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000089f10 sp=0x4000089ef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x4000089fb0 sp=0x4000089f10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000089fd0 sp=0x4000089fb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000089fd0 sp=0x4000089fd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 39 gp=0x4000102c40 m=nil [GC worker (idle)]: runtime.gopark(0x9bf79172dbb6?, 0x3?, 0x51?, 0x8a?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011cf10 sp=0x400011cef0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x400011cfb0 sp=0x400011cf10 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011cfd0 sp=0x400011cfb0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011cfd0 sp=0x400011cfd0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 11 gp=0x40000c6380 m=nil [GC worker (idle)]: runtime.gopark(0x9bf791707046?, 0x3?, 0x37?, 0xb7?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000117710 sp=0x40001176f0 pc=0xbd307d298308 runtime.gcBgMarkWorker(0x4000183570) runtime/mgc.go:1423 +0xdc fp=0x40001177b0 sp=0x4000117710 pc=0xbd307d2452ec runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40001177d0 sp=0x40001177b0 pc=0xbd307d2451d8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40001177d0 sp=0x40001177d0 pc=0xbd307d2a0594 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 27 gp=0x4000103180 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0xbd307f50cf80?, 0x0?, 0x20?, 0x81?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009da90 sp=0x400009da70 pc=0xbd307d298308 runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0x40005c80b8, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x204 fp=0x400009dae0 sp=0x400009da90 pc=0xbd307d278944 sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x2c fp=0x400009db20 sp=0x400009dae0 pc=0xbd307d299dac sync.(*WaitGroup).Wait(0x40005c80b0) sync/waitgroup.go:118 +0x70 fp=0x400009db40 sp=0x400009db20 pc=0xbd307d2abc70 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x40005c8000, {0xbd307eacc600, 0x40001f1360}) github.com/ollama/ollama/runner/ollamarunner/runner.go:442 +0x38 fp=0x400009dfa0 sp=0x400009db40 pc=0xbd307d83bd38 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1430 +0x30 fp=0x400009dfd0 sp=0x400009dfa0 pc=0xbd307d843c50 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009dfd0 sp=0x400009dfd0 pc=0xbd307d2a0594 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1430 +0x448 goroutine 12 gp=0x4000582e00 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40005d1d80 sp=0x40005d1d60 pc=0xbd307d298308 runtime.netpollblock(0x0?, 0xffffffff?, 0xff?) runtime/netpoll.go:575 +0x158 fp=0x40005d1dc0 sp=0x40005d1d80 pc=0xbd307d25d0f8 internal/poll.runtime_pollWait(0xe7e4f05f7cc8, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x40005d1df0 sp=0x40005d1dc0 pc=0xbd307d2974c0 internal/poll.(*pollDesc).wait(0x4000586980?, 0x4000585d81?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40005d1e20 sp=0x40005d1df0 pc=0xbd307d31a338 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x4000586980, {0x4000585d81, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1fc fp=0x40005d1ec0 sp=0x40005d1e20 pc=0xbd307d31b5ec net.(*netFD).Read(0x4000586980, {0x4000585d81?, 0x0?, 0x0?}) net/fd_posix.go:55 +0x28 fp=0x40005d1f10 sp=0x40005d1ec0 pc=0xbd307d38c118 net.(*conn).Read(0x400008e1d0, {0x4000585d81?, 0x0?, 0x0?}) net/net.go:194 +0x34 fp=0x40005d1f60 sp=0x40005d1f10 pc=0xbd307d399854 net/http.(*connReader).backgroundRead(0x4000585d70) net/http/server.go:690 +0x40 fp=0x40005d1fb0 sp=0x40005d1f60 pc=0xbd307d559700 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x28 fp=0x40005d1fd0 sp=0x40005d1fb0 pc=0xbd307d5595e8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40005d1fd0 sp=0x40005d1fd0 pc=0xbd307d2a0594 created by net/http.(*connReader).startBackgroundRead in goroutine 28 net/http/server.go:686 +0xc4 r0 0x0 r1 0x6b r2 0x6 r3 0xe7e224dd9140 r4 0xe7e538faeb50 r5 0x1 r6 0x20 r7 0xe7e224dd78d0 r8 0x83 r9 0x0 r10 0xa r11 0x101010101010101 r12 0xe7e224dd7960 r13 0x0 r14 0x0 r15 0x1 r16 0x1 r17 0xe7e538997d0c r18 0xe7e49c88c368 r19 0x6b r20 0xe7e224dd9140 r21 0x6 r22 0xe7e49c973728 r23 0xe7e224dd8690 r24 0xe7e25f2b9290 r25 0xe7e4b50c3298 r26 0xe7e25f2b9290 r27 0xe7e22a2638d0 r28 0xee r29 0xe7e224dd7860 lr 0xe7e5389f75f4 sp 0xe7e224dd7850 pc 0xe7e5389f7608 fault 0x0 time=2026-04-08T10:43:17.652Z level=ERROR source=server.go:1219 msg="do load request" error="Post \"http://127.0.0.1:34097/load\": EOF" time=2026-04-08T10:43:17.652Z level=ERROR source=server.go:1219 msg="do load request" error="Post \"http://127.0.0.1:34097/load\": dial tcp 127.0.0.1:34097: connect: connection refused" time=2026-04-08T10:43:17.652Z level=INFO source=sched.go:511 msg="Load failed" model=/root/.ollama/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-04-08T10:43:17.731Z level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2" [GIN] 2026/04/08 - 10:43:17 | 500 | 44.283491447s | 172.19.0.3 | POST "/api/chat" ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.20.3
GiteaMirror added the bug label 2026-04-29 10:43:28 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

Does reducing OLLAMA_NUM_PARALLEL or setting OLLAMA_CONTEXT_LENGTH help?

<!-- gh-comment-id:4206235180 --> @rick-github commented on GitHub (Apr 8, 2026): Does reducing `OLLAMA_NUM_PARALLEL` or setting `OLLAMA_CONTEXT_LENGTH` help?
Author
Owner

@Ketzemot commented on GitHub (Apr 15, 2026):

Hi, I encounter the same problem.

Here the infos for reproducing and potentially fixing this:

Reproducing: Gemma 4 26B crashes on load with OLLAMA_NUM_PARALLEL=35 on dual RTX 6000 Ada (96 GB VRAM)

OS Windows 11
GPU 2x NVIDIA RTX 6000 Ada Generation (48 GB each, 96 GB total)
CUDA 12.8 (Driver 572.60)
Ollama v0.20.5
Model Gemma 4 26B A4B (Q8_0)
Relevant flags OLLAMA_NUM_PARALLEL=35, OLLAMA_NEW_ENGINE=1, OLLAMA_FLASH_ATTENTION=1

Crash Log
cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
msg="do load request" error="Post "http://127.0.0.1:4152/load": read tcp
127.0.0.1:4170->127.0.0.1:4152: wsarecv: An existing connection was forcibly
closed by the remote host."
msg="Load failed" error="model failed to load, this may be due to resource
limitations or an internal error"

Root Cause
This is not a VRAM shortage — there is 82 GB of free VRAM at load time. The crash is caused by a 32-bit integer overflow in the CUDA copy kernel.
The KV cache pre-allocated for 35 parallel slots produces a single tensor larger than INT_MAX (~2 GB). The copy kernel in cpy.cu uses int for the byte count, so any tensor exceeding 2,147,483,647 bytes triggers the assertion and crashes the runner.

With the KV cache near the INT_MAX boundary, VRAM fragmentation at load time can cause partial spill to system RAM, resulting in inconsistent inference speed throughout the day.

Current Workaround
Reducing OLLAMA_NUM_PARALLEL from 35 to 8 keeps each KV cache tensor under 2 GB and resolves both the crash and the performance inconsistency.

Fix is Available Upstream
This bug has already been fixed in llama.cpp. The CUDA copy kernel was updated to use int64_t instead of int, removing the 2 GB tensor size limitation entirely:

  • ggml-org/llama.cpp#18140 — Original bug report (crash at large context sizes on CUDA)
  • ggml-org/llama.cpp#18341 — Fix confirmed and merged (Jan 1, 2026)
    Could this upstream fix be pulled into Ollama's bundled GGML backend? This would unblock high-parallelism deployments on multi-GPU setups that have more than enough VRAM but are currently limited by the 32-bit integer constraint.
<!-- gh-comment-id:4253674629 --> @Ketzemot commented on GitHub (Apr 15, 2026): Hi, I encounter the same problem. Here the infos for reproducing and potentially fixing this: Reproducing: Gemma 4 26B crashes on load with OLLAMA_NUM_PARALLEL=35 on dual RTX 6000 Ada (96 GB VRAM) OS Windows 11 GPU 2x NVIDIA RTX 6000 Ada Generation (48 GB each, 96 GB total) CUDA 12.8 (Driver 572.60) Ollama v0.20.5 Model Gemma 4 26B A4B (Q8_0) Relevant flags OLLAMA_NUM_PARALLEL=35, OLLAMA_NEW_ENGINE=1, OLLAMA_FLASH_ATTENTION=1 Crash Log cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed msg="do load request" error="Post \"http://127.0.0.1:4152/load\": read tcp 127.0.0.1:4170->127.0.0.1:4152: wsarecv: An existing connection was forcibly closed by the remote host." msg="Load failed" error="model failed to load, this may be due to resource limitations or an internal error" Root Cause This is not a VRAM shortage — there is 82 GB of free VRAM at load time. The crash is caused by a 32-bit integer overflow in the CUDA copy kernel. The KV cache pre-allocated for 35 parallel slots produces a single tensor larger than INT_MAX (~2 GB). The copy kernel in cpy.cu uses int for the byte count, so any tensor exceeding 2,147,483,647 bytes triggers the assertion and crashes the runner. With the KV cache near the INT_MAX boundary, VRAM fragmentation at load time can cause partial spill to system RAM, resulting in inconsistent inference speed throughout the day. Current Workaround Reducing OLLAMA_NUM_PARALLEL from 35 to 8 keeps each KV cache tensor under 2 GB and resolves both the crash and the performance inconsistency. Fix is Available Upstream This bug has already been fixed in llama.cpp. **The CUDA copy kernel was updated to use int64_t instead of int**, removing the 2 GB tensor size limitation entirely: - ggml-org/llama.cpp#18140 — Original bug report (crash at large context sizes on CUDA) - ggml-org/llama.cpp#18341 — Fix confirmed and merged (Jan 1, 2026) Could this upstream fix be pulled into Ollama's bundled GGML backend? This would unblock high-parallelism deployments on multi-GPU setups that have more than enough VRAM but are currently limited by the 32-bit integer constraint.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15418
Analyzed: 2026-04-18T18:21:01.909098

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274307606 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15418 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15418 **Analyzed**: 2026-04-18T18:21:01.909098 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Author
Owner

@ivaigult commented on GitHub (Apr 19, 2026):

This seem to be a duplicate of #13887. This should have been fixed by ggml-org/llama.cpp#18433.

<!-- gh-comment-id:4276478922 --> @ivaigult commented on GitHub (Apr 19, 2026): This seem to be a duplicate of #13887. This should have been fixed by ggml-org/llama.cpp#18433.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56370